background preloader

Tools

Facebook Twitter

AWS | Amazon Redshift – Cloud Data Warehouse Solution. It’s never been easier to get file data into Amazon Redshift, using AWS Lambda. You simply push files into a variety of locations on Amazon S3 and have them automatically loaded into your Amazon Redshift clusters. Read more in A Zero-Administration Amazon Redshift Database Loader (April 2015). Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Amazon Redshift has custom JDBC and ODBC drivers that you can download from the Connect Client tab of our Console, allowing you to use a wide range of familiar SQL clients. You can also use standard PostgreSQL JDBC and ODBC drivers.

Amazon Redshift’s data warehouse architecture allows you to automate most of the common administrative tasks associated with provisioning, configuring and monitoring a cloud data warehouse. Security is built-in. You pay only for the resources you provision. Hypertable. Drill. Drill Overview Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets.

Drill is the open source version of Google's Dremel system which is available as an IaaS service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. Currently, Drill is incubating at Apache. High Level Concept There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing.

The Apache Drill team uses Chronon for testing. Google BigQuery. Apache Drill. Speed is Key Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization. Liberate Nested Data Perform interactive analysis on all of your data, including nested and schema-less. Drill supports querying against many different schema-less data sources including HBase, Cassandra and MongoDB. Naturally flat records are included as a special case of nested data. Flexibility Strongly defined tiers and APIs for straightforward integration with a wide array of technologies.

Disclaimer Apache Drill is an effort undergoing incubation at The Apache Software Foundation sponsored by the Apache Incubator PMC.