background preloader

Big Data

Facebook Twitter

NOSQL Databases. Home - Apache Hive. The Apache HiveTM data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache HadoopTM, it provides Tools to enable easy data extract/transform/load (ETL)A mechanism to impose structure on a variety of data formatsAccess to files stored either directly in Apache HDFSTM or in other data storage systems such as Apache HBaseTM Query execution via MapReduce Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. QL can also be extended with custom scalar functions (UDF's), aggregations (UDAF's), and table functions (UDTF's).

Components of Hive include HCatalog and WebHCat. The Definition of Enterprise Big Data. With David Vellante With the inaugural O'Reilly Media Strata conference, the topic of is coming into sharper focus. When O'Reilly initiates coverage of a topic through an event like Strata, you can be sure the content will be well-thought-out, rich, relevant and visionary in nature.

A key theme that emerged from the event was that Big Data is not just about cool technologies and Web 2.0 companies experimenting with gigantic data sets. Rather it's defining new value streams based on leveraging information. Big-data Background Big Data is emerging from the realms of science projects at Web companies to help companies like telecommunication giants understand exactly which customers are unhappy with service and what processes caused the dissatisfaction, and predict which customers are going to change carriers. The IT techniques and tools to execute big data processing are new, very important and exciting. Enterprise Big Data Big-data Definition1 Big data has the following characteristics: Big data - ReadWriteCloud. Welcome to Apache™ Hadoop™! Blog. Company - Report - Big data: The next frontier for innovation, competition, and productivity - May 2011.

The amount of data in our world has been exploding, and analyzing large data sets—so-called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey's Business Technology Office. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future. MGI studied big data in five domains—healthcare in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal-location data globally. Big data can generate value in each. 1. 2. Podcast Distilling value and driving productivity from mountains of data 3. 4. 5. 6. 7.