Amazon Launches Data Warehouse Service Redshift
November 29th, 2012 By: John Rath
Amazon (AMZN) announced a limited preview of Amazon Redshift, a fully managed, petabyte-scale data warehouse service in the cloud. It offers fast query performance when analyzing virtually any size data set using the same SQL-based tools and business intelligence applications in use today.
Designed for Big Data
Amazon CTO Werner Vogels announced Redshift at Amazon’s first re: Invent conference this week, and will take on established offerings from Oracle, IBM and Teradata. AWS built Redshift based on technology licensed from Paraccel, of which Amazon is an investor. As an advanced service on the AWS cloud a business can launch a Redshift cluster with a few hundred gigabytes of data and scale to a petabyte or more, for under $1,000 per terabyte per year. At re: Invent AWS presented the pay-as-you-go incentive, by calculating that it would cost between $19,000 and $25,000 per terabyte per year at list prices to build and run a good-sized data warehouse on premise.
Redshift has been certified by business intelligence tools from Jaspersoft and MicroStrategy, and will have additional tools coming soon. Customers can connect your SQL client or business intelligence tool to your Amazon Redshift data warehouse cluster using standard PostgreSQL JDBC or ODBC drivers. Like other AWS services the user can create and manipulate a Redshift cluster using a set of web service APIs. AWS customers Netflix, JPL, and Flipboard have been testing the new service as a part of a private beta.
To achieve high query performance demanded by big data sets Redshift uses a dynamically scalable massively parallel processing (MPP) architecture, columnar storage and data compression, and optimized hardware with local attached storage and 10 GigE network connections between nodes. When building the cluster, two node types can be selected – either an extra large (XL) node with 2TB of compressed storage, or an eight extra large node with 16TB of compressed storage. XL clusters can contain 1 to 32 nodes while 8XL clusters can contain 2 to 100 nodes.