background preloader

Processing

Facebook Twitter

Hanborq Improved Hadoop MapReduce. Major Features: 1.

Hanborq Improved Hadoop MapReduce

Worker Pool Does not spawn new JVM processes for each job/task, but instead start these slot/worker processes at initialization phase and keep them running constantly. The small world of big data. February 02, 2012, 1:34 PM — When we talk about big data and data warehousing, it is almost inevitable that Hadoop will be mentioned.

The small world of big data

But Hadoop didn't come in from a vacuum -- like most big data technologies, it bears a close relationship with other technologies in this sector. In this case, Hadoop, which uses map/reduce technologies to form a data framework on which data is stored and applications to get at that data can run, can trace its origins back to another kind of data warehouse technology: enterprise search. Enterprise search -- also known as realtime search -- is a method of data storage that takes the concept of searching and applies it to at times very large collections of unstructured or partially structured data, such as documents. The best document storage system will utilize some sort of XML or SGML-based tagging to keep those documents' content nice and organized. But in reality, documents will fall quite a bit short of that ideal mark. Pentaho open sources 'big data' integration tools under Apache 2.0. News By Chris Kanaracus January 30, 2012 09:05 AM ET IDG News Service - Business intelligence vendor Pentaho is releasing as open source a number of tools related to "big data" in the 4.3 release of its Kettle data-integration platform and has moved the project overall to the Apache 2.0 license, the company announced Monday.

Pentaho open sources 'big data' integration tools under Apache 2.0

While Kettle had always been available in a community edition at no charge, the tools being open sourced were previously only available in the company's commercialized edition.

NoSQL

OLAP. OLTP.