background preloader

Hadoop

Facebook Twitter

BYOH – Hadoop’s a Platform. Get Used To It. When is a technology offering a platform?

BYOH – Hadoop’s a Platform. Get Used To It.

Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us. Specific support for components such as Pig and Hive vary, as do capabilities and levels of partnership in development, integration and co-marketing. Analytic Platforms: Kognitio - an analytics-focused in-memory database with SQL 2011 support – offers significant Hadoop support. Application Performance Management – longtime stalwart Compuware is bringing its portfolio to both Hadoop and leading NoSQL offerings. BI Tools: specialists like Alpine Data Labs, Datameer, Karmasphere and Platfora position themselves as targeted for Hadoop environments. Database: some vendors, like IBM and Teradata offer their own distributions, and even appliances.

Search: numerous approaches and players here. Stack Up Hadoop to Find Its Place in Your Architecture. 2013 promises to be a banner year for Apache Hadoop, platform providers, related technologies – and analysts who try to sort it out.

Stack Up Hadoop to Find Its Place in Your Architecture

I’ve been wrestling with ways to make sense of it for Gartner clients bewildered by a new set of choices, and for them and myself, I’ve built a stack diagram that describes possible functional layers of a Hadoop-based model. The model is not exhaustive, and it continually evolves. Introduction of HBase Architecture. Big data, fast: Avoiding Hadoop performance bottlenecks. Hadoop shows a lot of promise as a relatively inexpensive landing place for the streams of big data coursing through organizations.

Big data, fast: Avoiding Hadoop performance bottlenecks

The open source technology provides a distributed framework, built around highly scalable clusters of commodity servers, for processing, storing and managing data that fuels advanced analytics applications. But there's no such thing as a free lunch: In production use, achieving high levels of Hadoop performance can be a challenge. Introducing Apache Hadoop: The Modern Data Operating System. Handling the hoopla: When to use Hadoop, and when not to.

In the past few years, Hadoop has earned a lofty reputation as the go-to big data analytics engine.

Handling the hoopla: When to use Hadoop, and when not to

To many, it's synonymous with big data technology. But the open source distributed processing framework isn't the right answer to every big data problem, and companies looking to deploy it need to carefully evaluate when to use Hadoop -- and when to turn to something else. There's so much hype around [Hadoop] now that people think it does pretty much anything.Kelly Stirman, director of product marketing, 10gen Inc. For example, Hadoop has ample power for processing large amounts of unstructured or semi-structured data. But it isn't known for its speed in dealing with smaller data sets.

Metamarkets CEO Michael Driscoll said the company uses Hadoop for large, distributed data processing tasks where time isn't a constraint. But when it comes to running the real-time analytics processes that are at the heart of what Metamarkets offers to its clients, Hadoop isn't involved. Big data architecture adds integration options. Big data technologies open up new options for storing and managing data -- potentially in concert with data warehouse systems, not as an alternative to them.

Big data architecture adds integration options

That in turn creates new data integration opportunities, which might require additional tools to effectively support a big data architecture. Big data, fast: Avoiding Hadoop performance bottlenecks. Hadoop shows a lot of promise as a relatively inexpensive landing place for the streams of big data coursing through organizations.

Big data, fast: Avoiding Hadoop performance bottlenecks

The open source technology provides a distributed framework, built around highly scalable clusters of commodity servers, for processing, storing and managing data that fuels advanced analytics applications. But there's no such thing as a free lunch: In production use, achieving high levels of Hadoop performance can be a challenge. Despite all the attention it's getting, Hadoop is still a relatively young technology -- it only reached Version 1.0 status in December 2011. As a result, much of the work being done with Hadoop by users remains somewhat experimental in nature, especially outside of the large Internet companies that helped to create it and that are replete with Java programmers and systems administrators versed in deploying the technology. GraphChi: How a Mac Mini Outperformed a 1,636 Node Hadoop Cluster - Christian Prokopp. Lucene - Apache Solr.

Impala and Shark Benchmark. We have created Impala and Shark cluster on Amazon EC2 m3.2xlarge machine with 30GB RAM.

Impala and Shark Benchmark

We loaded a 50 GB dataset into the system and run queries on top of it to benchmark the performance between the two systems: Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive’s query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. Cloudera Impala is an open source Massively Parallel Processing (MPP) query engine that runs natively on Apache Hadoop.

The video contains details of the performance statistics of both systems. Impala Benchmark. 5 Reasons Hadoop Is Kicking Can and Taking Names. Splunk Seeks Non-Geeks with Platform Upgrade, Gives Away Storm to Developers. In full swing this week, Splunk’s fourth annual .conf gathering in Las Vegas kicked off this morning with two noteworthy announcements from the data management platform.

Splunk Seeks Non-Geeks with Platform Upgrade, Gives Away Storm to Developers

The first is a refresh on its Enterprise suite, making version 6 of its platform available today. The second is a complete cloud version of its Enterprise suite, delivering a fully virtualized offering to appease the SaaS crowd. Splunk Chairman and CEO Godfrey Sullivan will provide the first public demonstration of Splunk Enterprise 6 during his keynote session at .conf2013. Democratizing Data It seems Splunk is seizing an opportunity to democratize data, approaching the Business Intelligence sector with enterprise-grade, cloud-ready analytics.

Your complete guide to Splunk .conf2013 “Too many organizations are still struggling with a data divide between IT and the business,” said Sullivan. So what’s new in version 6? Watch LIVE coverage of Splunk .conf on SiliconANGLE.tv Bigger than IT. BMC Launches Big Data Management Solution for Hadoop. BMC launches a big data management solution for Hadoop, Dataguise raises $13 million for expansion, and IBM acquires big data analytics company The Now Factory.

BMC Launches Big Data Management Solution for Hadoop

BMC launches big data management solution for Hadoop BMC announced the availability of BMC Control-M for Hadoop, a new big data management solution that dramatically reduces the batch processing time for extremely large collections of data sets, simplifying and automating Hadoop batch processing and connected enterprise workflows. The new solution is a purpose-built version of the company’s Control-M workload automation offering.

BMC Software specifically designed Control-M for Hadoop to improve service delivery by detecting slowdowns and failures with predictive analytics and intelligent monitoring of Hadoop application workflows. “BMC Control-M provides MetaScale with a control point that allows us to integrate all those big spokes to our hub which is big data. Dataguise closes $13 million Series B funding.