background preloader

Hypertable: An Open Source, High Performance, Scalable Database

Hypertable: An Open Source, High Performance, Scalable Database

SimpleDB Amazon SimpleDB est un stockage de données non relationnel combinant flexibilité et haute disponibilité, et déchargeant le client des tâches d'administration de base de données. Les développeurs stockent et récupèrent simplement leurs éléments de données en effectuant des requêtes auprès des services Web, et Amazon SimpleDB fait le reste. Libéré des exigences strictes des bases de données relationnelles, Amazon SimpleDB est optimisé pour offrir une disponibilité et une flexibilité élevées, avec peu ou pas de tâches d'administration. En coulisses, Amazon SimpleDB crée et gère automatiquement plusieurs réplicas de vos données diffusés géographiquement pour permettre une haute disponibilité et une durabilité des données. Le service ne vous facture que les ressources réellement consommées lors du stockage de vos données et du traitement de vos demandes. Vous pouvez changer votre modèle de données n'importe quand, et les données sont automatiquement indexées pour vous.

Drill Drill Overview Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an IaaS service called Google BigQuery. High Level Concept There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. Like Dremel, Drill supports a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers. The Apache Drill team uses Chronon for testing. The Apache Drill team also uses the YourKit Profiling tools in development.

Hadoop – The Power of the Elephant — eBay Tech Blog In a previous post, Junling discussed data mining and our need to process petabytes of data to gain insights from information. We use several tools and systems to help us with this task; the one I’ll discuss here is Apache Hadoop. Created by Doug Cutting in 2006 who named it after his son’s stuffed yellow elephant, and based on Google’s MapReduce paper in 2004, Hadoop is an open source framework for fault tolerant, scalable, distributed computing on commodity hardware. MapReduce is a flexible programming model for processing large data sets:Map takes key/value pairs as input and generates an intermediate output of another type of key/value pairs, while Reduce takes the keys produced in the Map step along with a list of values associated with the same key to produce the final output of key/value pairs. Map (key1, value1) -> list (key2, value2)Reduce (key2, list (value2)) -> list (key3, value3) Ecosystem Athena, our first large cluster was put in use earlier this year. Infrastructure

Apache Drill Speed is Key Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization. Liberate Nested Data Perform interactive analysis on all of your data, including nested and schema-less. Flexibility Strongly defined tiers and APIs for straightforward integration with a wide array of technologies. Disclaimer Apache Drill is an effort undergoing incubation at The Apache Software Foundation sponsored by the Apache Incubator PMC.

HDFS AWS | Amazon Redshift – Cloud Data Warehouse Solution It’s never been easier to get file data into Amazon Redshift, using AWS Lambda. You simply push files into a variety of locations on Amazon S3 and have them automatically loaded into your Amazon Redshift clusters. Read more in A Zero-Administration Amazon Redshift Database Loader (April 2015). Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Amazon Redshift’s data warehouse architecture allows you to automate most of the common administrative tasks associated with provisioning, configuring and monitoring a cloud data warehouse. Security is built-in. Amazon Redshift uses a variety of innovations to obtain very high query performance on datasets ranging in size from a hundred gigabytes to a petabyte or more. You pay only for the resources you provision. Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster.

Big Data Analytics, MapReduce for High Performance In-Database Analytics, Deep Data Mining – Aster Data

Related: