Big Data

> > > >

MapReduce and MPP: Two sides of the Big Data coin? When the Big Data moniker is applied to a discussion, it’s often assumed that Hadoop is, or should be, involved.

But perhaps that’s just doctrinaire. Hadoop, at its core, consists of HDFS (the Hadoop Distributed File System) and MapReduce. The latter is a computational approach that involves breaking large volumes of data down into smaller batches, and processing them separately. A cluster of computing nodes, each one built on commodity hardware, will scan the batches and aggregate their data.

Then the multiple nodes’ output gets merged to generate the final result data. But Big Data's not all about MapReduce. But, for a variety of reasons, MPP and MapReduce are used in rather different scenarios. Fujitsu Technology Puts Big Data to Use in Minutes. 10 ways big data is remaking energy — Cleantech News and Analysis.

Market for Big Data Getting, Well, Big. Posted March 12, 2012 By Pedro HernandezFeedback IDC predicts the market for Big Data solutions will hit $16.9 billion by 2015.

Big Data market growth is outpacing general IT market growth, but staffing challenges loom. The market for Big Data solutions will grow from $3.2 billion in 2010 to $16.9 billion in 2015, according to a new forecast from IDC. In terms of momentum, Big Data will outpace the general information and communications technology (ICT) market, with a compound annual growth rate (CAGR) of 40 percent, about 7 times that of ICT. Taken individually, certain IT segments will grow faster than others. Big Data has had a profound effect on the IT landscape, according to IDC's Dan Vesset, program vice president of the research firm's Business Analytics Solutions unit.

It has sparked a competitive business climate that encompasses both established IT vendors and startups. Competition Brews as Titans Clash. Making Big Data into Small Data. It isn’t every day you get to write about Immanuel Kant and Big Data in the same blog post but last week Gartner Analyst Will Cappelli did just that. As you’ll see in his post, “ AI and IAM: Will Two-Tier Analytics Become the Norm for IAM? ” context is the key. Cappelli addresses Kant’s conclusion that human reasoning is a two-tier process that first involves what is—the contextual lens in which we view our existence—and how the pieces all relate to each other.

From this standpoint, we reason and make decisions. In technology, our view and approach to something like Big Data is impacted by the context of our approach. Cappelli writes, “I am inclined to think that Kant and the cognitive scientists have hit on something which is not just true of the processes that govern human cognition but rather reflects the deep structure of any process that seeks to turn volumes of raw, noisy data into information capable of grounding action taken by human beings or machines.” Drawn to Scale raises money to make SQL big-data-ready — Cloud Computing News. Nquering Big Data with stream computing. There is big data, and then there is mind-bogglingly enormous data; the latter is the scale at which Mahmoud Mahmoud has been focusing his research on for the last three years.

And he says his work will be a "paradigm shift" in the way businesses use big data in the future. The AUT University computer scientist has been teaching on and off for the better part of a decade, and is currently working on finishing his doctorate. He originally came to New Zealand in 1994, from Kuwait where he was raised and educated. Mahmoud started his career as a graphic designer, but followed a childhood passion for computers to his current position.

"I have always been a computer geek, even as a little child I remember while the other kids were doing their reports on ancient Egypt using colouring pencils and paper, I did mine using a word processor on my computer," recalls Mahmoud. "With stream-computing, rather than storing the data we store the queries we want to apply to it.

Designing for big data: the new architectural stack. The right architectural stack might look something like this: Ubuntu, Hadoop Distributed File System (HDFS), MapReduce, Cassandra or HBase, Hive, Flume, JDBC and ODBC drivers, Hue, Pig, Oozie, Avro, and Zookeeper, as well as some Chef configuration management tools.

The list goes on and on. How does an IT shop modify and integrate these different software components and hardware into a single big data solution? Overall, IT pros need to understand what problem they are trying to solve, be ready to pick the best components that fit together, and look for vendors that are pulling pieces of the stack together as a solution, taking much of the pain and time out of the integration work. For big data to succeed in mainstream enterprises, it must be as easy to install and use as a Microsoft Excel spreadsheet.