background preloader

Bigdata

Facebook Twitter

Impact Analytix: Business Intelligence, Predictive Analytics & Excel. Decision making and the techniques and technologies to support and automate it will be the next competitive battleground for organizations. Those who are using business rules, data mining, analytics and optimization today are the shock troops of this next wave of business innovation. - Tom Davenport, Competing on Analytics Additional articles available on Tech Target's BeyeNETWORK and SQL Server Pro Magazine. January 2014 Tableau with R Part 2: Clustering As promised, this is the next article in Getting Started with Tableau 8.1 & R.

If you have not read the first article and/or have not already installed and configured R and RServe, I suggest that you read the first article before continuing. If you are new to R, there are many free sources to get you up and running with the basics. In the first article I covered the programming classic, Hello World, introduced parameters and R arguments for passing values. There are a few base Clustering packages available in R by default such as kmeans. 42 Big Data Startups – Vote for the Top 10. Update: The roundup of the 10 finalists is now available on CIO.com. The Big Data space is heating up, and unlike some over-hyped trends (cloud, I’m looking at you), it’s pretty easy to pinpoint the ROI with these tools.

When I put out calls for nominees through my Story Source Newsletter, HARO, Twitter, etc., for my upcoming CIO.com story, “10 Big Data Startups to Watch,” I received more than 100 recommendations. Usually, when I get that many recommendations, a good chunk of them can be dismissed out of hand. Some are clearly science projects; others have zero funding, no management pedigree and a dubious value proposition, and a few are clearly the products of fevered malarial hallucinations. Not so this time. Very few of the startups left off this list of 42 nominees were whacky long shots. Most were decent ideas, but were left out because they were too old, too new, or, in a few cases, just not convincing. I’m after those Goldilocks startups. Now comes the hard work. 1. 2. 3. 4. 5. 6. Mesos: Dynamic Resource Sharing for Clusters. Progress on Apache Drill « Big Data Craft. By Camuel Gilyadov, on September 4th, 2012 We are continuing our efforts in contributing our OpenDremel code to Apache Drill project and look forward to be active with it right after that.

Right now the efforts are being put into our ANTLR-based parser, we want to make it work with the new grammar of BigQuery language. That should be done within a few days, the parser will be committed to the new Drill repository as a first phase of the OpenDremel-Drill merge. Next on, we plan to refactor and contribute the Semantic Analyzer, which processes the output of the parser into an intermediate form, resolving references and rewriting (flattening) the query into single full table scan operation. That is expected within a week or two, it would depend when the Drill architecture doc will be published. We still don’t know what will be the schema language/format. The final phase of OpenDremel – Drill merge, will be the contribution of the code generator based on the Apache Velocity templates. NGDATA - Lily - Smart Data, at Scale, made Easy.

Lily is Smart Data, at Scale, made Easy. Lily is a data management platform combining planet-sized data storage, indexing and search with on-line, real-time usage tracking, audience analytics and content recommendations. It's a one-stop-platform for any organization confronted with Big Data challenges that seeks rapid implementation, rock-solid performance at scale, and efficiency at management. Lily unifies Apache HBase, Hadoop and Solr into a comprehensively integrated, interactive data platform with easy-to-use access APIs, a high-level data model and schema language, flexible, real-time indexing and the expressive search power of Apache Solr.

Best of all, Lily is open source - allowing anyone to explore and learn what Lily can do. Features Lily adds the missing bits any Big Data engineer will encounter when trying to combine Apache HBase and Solr into an interactive data management environment: Spark | Lightning-Fast Cluster Computing.