Big data

Facebook Twitter
TripleMap About TripleMap is a powerful next-knowledge discovery & collaboration software framework which, using advanced proprietary "SDC" technology, enables organizations to integrate/index/search massive interconnected networks of external & internal data. TripleMap users search, analyze, visualize and collaborate by creating knowledge maps from the interconnected SDC information network. TripleMap
data resources

Hadoop doesn’t have to be so hard, just ask Etsy, Airbnb and the Climate Corporation. All three, it turns out, are using the Cascading framework atop Amazon Web Services’ Elastic MapReduce service to make creating and running big data jobs simpler than is possible using Hadoop alone. Cascading is an open source Java framework that acts as an intermediary between users and Hadoop. Meet the combo powering Hadoop at Etsy, Airbnb and Climate Corp. — Data | GigaOM Meet the combo powering Hadoop at Etsy, Airbnb and Climate Corp. — Data | GigaOM
SQLstream
Data Science Toolkit Steal this server! Grab this entire site as a free, self-contained, ready-to-run VM Independence - Never worry about the provider going offline, or charging once you're hooked. Security - Run on your intranet, so customer information stays within the firewall. Scalability - No API limits. Run a cluster of as many instances as you need.

Data Science Toolkit

Advanced Reporting & Analysis for Big Data :: Jaspersoft Business Intelligence Software
Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison :: KKovacs (Yes it's a long title, since people kept asking me to write about this and that too :) I do when it has a point.) While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.) But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another.

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison :: KKovacs

Welcome to Apache™ Hadoop™!

Welcome to Apache™ Hadoop™!

What Is Apache Hadoop? The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The project includes these modules:
Home - Apache Hive Home - Apache Hive The Apache HiveTM data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache HadoopTM, it provides Tools to enable easy data extract/transform/load (ETL)A mechanism to impose structure on a variety of data formatsAccess to files stored either directly in Apache HDFSTM or in other data storage systems such as Apache HBaseTM Query execution via MapReduce Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language.
maui-indexer - Maui - Multi-purpose automatic topic indexing maui-indexer - Maui - Multi-purpose automatic topic indexing Summary Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles.
Using Revolution R Enterprise With Apache Hadoop for 'Big Analytics' Using Revolution R Enterprise With Apache Hadoop for 'Big Analytics' Parallel Performance Without Parallel Complexity Big Data drives optimum value when it yields fast insights. Adopting MPP data warehouses or Hadoop clusters alone to store Big Data isn’t enough. As data grows, so does complexity and computational workload analyzing Big Data. Big Data Analytics Cripples Legacy Tools
Chorus: Productivity engine for Data Science Teams | Greenplum