Faq

> > > > > >

Meet Big Data equivalent of the LAMP Stack. Many Fortune 500 and mid-size enterprises are funding Hadoop test/dev projects for Big Data analytics, but question how to integrate Hadoop into their standard enterprise architecture.

For example, Joe Cunningham, head of technology strategy and innovation at credit card giant Visa, told the audience at last year’s Hadoop World that he would like to see Hadoop evolve from an alpha/beta environment into mainstream use for transaction analysis, but has concerns about integration and operations management. What’s been missing for Big Data analytics has been a LAMP (Linux, Apache HTTP Server, MySQL and PHP) equivalent.

Fortunately, there’s an emerging LAMP-like stack for Big Data aggregation, processing and analytics that includes: While that’s still a lot of moving parts for an enterprise to install and manage, we’re almost to a point where there’s an end-to-end “hello world” for analytical data management. Brett Sheppard is an executive director at Zettaforce. FAQ. Hadoop FAQ Contents 1.1.

What is Hadoop? Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the Google File System and of MapReduce. 1.2. Java 1.6.x or higher, preferably from Sun -see HadoopJavaVersions Linux and Windows are the supported operating systems, but BSD, Mac OS/X, and OpenSolaris are known to work. 1.3. Hadoop has been demonstrated on clusters of up to 4000 nodes. Dfs.block.size = 134217728 dfs.namenode.handler.count = 40 mapred.reduce.parallel.copies = 20 mapred.child.java.opts = -Xmx512m fs.inmemory.size.mb = 200 io.sort.factor = 100 io.sort.mb = 200 io.file.buffer.size = 131072 Sort performances on 1400 nodes and 2000 nodes are pretty good too - sorting 14TB of data on a 1400-node cluster takes 2.2 hours; sorting 20TB on a 2000-node cluster takes 2.5 hours.

Considerations for Hadoop and BI (part 2 of 2) « Cloudera » Apache Hadoop for the Enterprise. Just today we heard another question about integrating Apache Hadoop with Business Intelligence tools.

This is one of the most common questions we receive from enterprises adopting or evaluating Hadoop. In the early stages of their projects, customers are generally not sure how to connect their BI tools to Hadoop, and when it makes sense to do so. As I wrote in BI Considerations and Hadoop Part 1, Cloudera encourages you to use your existing infrastructure wherever possible, and this includes your investments in Business Intelligence. BI tools traditionally were designed for small volumes of structured data where Hadoop generally stores data in complex formats at scale and processes data on read using MapReduce.

We give our customers recommendations for when and how to integrate Hadoop with their existing Business Intelligence environment, as well as when organizations should look to new tools to solve a new class of problem.