background preloader

Lightning-Fast Cluster Computing

Lightning-Fast Cluster Computing

NGDATA - Lily - Smart Data, at Scale, made Easy Lily is Smart Data, at Scale, made Easy. Lily is a data management platform combining planet-sized data storage, indexing and search with on-line, real-time usage tracking, audience analytics and content recommendations. It's a one-stop-platform for any organization confronted with Big Data challenges that seeks rapid implementation, rock-solid performance at scale, and efficiency at management. Lily unifies Apache HBase, Hadoop and Solr into a comprehensively integrated, interactive data platform with easy-to-use access APIs, a high-level data model and schema language, flexible, real-time indexing and the expressive search power of Apache Solr. Features Lily adds the missing bits any Big Data engineer will encounter when trying to combine Apache HBase and Solr into an interactive data management environment: Under the hood, Lily operates a high-performant yet robust queuing mechanism, the Lily SEP, that allows for additional integration of Apache HBase with external processes.

Progress on Apache Drill « Big Data Craft By Camuel Gilyadov, on September 4th, 2012 We are continuing our efforts in contributing our OpenDremel code to Apache Drill project and look forward to be active with it right after that. Right now the efforts are being put into our ANTLR-based parser, we want to make it work with the new grammar of BigQuery language. That should be done within a few days, the parser will be committed to the new Drill repository as a first phase of the OpenDremel-Drill merge. Next on, we plan to refactor and contribute the Semantic Analyzer, which processes the output of the parser into an intermediate form, resolving references and rewriting (flattening) the query into single full table scan operation. The final phase of OpenDremel – Drill merge, will be the contribution of the code generator based on the Apache Velocity templates. Everyone who wishes to help is welcome. We also continue work on our generic execution backend built on top of OpenStack Swift and integrated with ZeroVM.

42 Big Data Startups – Vote for the Top 10 Update: The roundup of the 10 finalists is now available on CIO.com. The Big Data space is heating up, and unlike some over-hyped trends (cloud, I’m looking at you), it’s pretty easy to pinpoint the ROI with these tools. When I put out calls for nominees through my Story Source Newsletter, HARO, Twitter, etc., for my upcoming CIO.com story, “10 Big Data Startups to Watch,” I received more than 100 recommendations. Usually, when I get that many recommendations, a good chunk of them can be dismissed out of hand. Some are clearly science projects; others have zero funding, no management pedigree and a dubious value proposition, and a few are clearly the products of fevered malarial hallucinations. Not so this time. I’m after those Goldilocks startups. Now comes the hard work. There is one big wrinkle to the voting this time around: this list of 42 isn’t locked. I’m going to leave the voting open for a week or so. Voting closes Monday, June 3 at 5 PM PT. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 12. 12.

Impact Analytix: Business Intelligence, Predictive Analytics & Excel Decision making and the techniques and technologies to support and automate it will be the next competitive battleground for organizations. Those who are using business rules, data mining, analytics and optimization today are the shock troops of this next wave of business innovation. - Tom Davenport, Competing on Analytics Additional articles available on Tech Target's BeyeNETWORK and SQL Server Pro Magazine. January 2014 Tableau with R Part 2: Clustering As promised, this is the next article in Getting Started with Tableau 8.1 & R. In Tableau 8.1, a connection to RServe was added in Help > Settings and Performance > Manage R Connection along with several new Calculated Field functions called SCRIPT_STR, SCRIPT_REAL, SCRIPT_BOOL, and SCRIPT_INT to ease integration and R function calls in Tableau. In the first article I covered the programming classic, Hello World, introduced parameters and R arguments for passing values. > help(kmeans) > BikeBuyers <- read.csv("~/BikeBuyers.csv")

Related: