background preloader

Big Data Tools

Facebook Twitter

Driving Sales on Ebay, Amazon with Actionable Data. Ht. Dave Gray » Visual thinking. Hadoop Tutorial for Beginners -1. Hadoop Tutorial for Beginners -1. A Statistical Analysis of the Work of Bob Ross. Big Data and Customer Experience. Creating Business Intelligence through Machine Learning: An Effective Business Decision Making Tool. Creating Business Intelligence through Machine Learning: An Effective Business Decision Making Tool. What executives should know about open data. Man Versus Machine: When It Comes to Scale, It's Advantage Computers. Master List 2 (A Wiki of Social Media Marketing Examples) Master List (A Wiki of Social Media Marketing Examples)

Predictive Analytics Innovation Summit, Chicago. Bytes Sized. @WalmartLabs Blog. Factual. JavaScript InfoVis Toolkit. Gephi, an open source graph visualization and manipulation software. Big Data won't solve your company's problems. The reams of data available to companies are only as useful as the people working with them.

Big Data won't solve your company's problems

By Ethan Rouen, contributor FORTUNE -- "Oh, people can come up with statistics to prove anything. Fourteen percent of people know that. " – Homer Simpson The era of big data is here, the nerds proclaim. Computers are powerful enough to gather and synthesize terabytes of information to answer questions ranging from how best to compensate employees to how risky is that mortgage-backed security. But while the numbers don't lie, how people use them is extremely subjective. "Making the decision at the end of the day can be aided by data, but the thought that computers will make all the important decisions is just not true," says Shvetank Shah, executive director of the Corporate Executive Board (CEB), which recently published a study titled Overcoming the Insight Deficit: Big Judgment in an Era of Big Data.

MORE: Are advertisers the new record labels? What Does Big Data Mean to Infrastructure Professionals? Big data means the amount of data you’re working with today will look trivial within five years.Huge amounts of data will be kept longer and have way more value than today’s archived data.Business people will covet a new breed of alpha geeks.

What Does Big Data Mean to Infrastructure Professionals?

You will need new skills around data science, new types of programming, more math and statistics skills and data hackers…lots of data hackers.You are going to have to develop new techniques to access, secure, move, analyze, process, visualize and enhance data; in near real time.You will be minimizing data movement wherever possible by moving function to the data instead of data to function. You will be leveraging or inventing specialized capabilities to do certain types of processing- e.g. early recognition of images or content types – so you can do some processing close to the head.The cloud will become the compute and storage platform for big data which will be populated by mobile devices and social networks. What is Big Data? Transform your business with data. Visualizing Big-Data Trends - Brett Sheppard.

Big Data Platform. D3.js - Data-Driven Documents. Golden Orb. Objectivity. Welcome To Apache Incubator Giraph. The Julia Language. Elasticsearch - - Open Source, Distributed, RESTful, Search Engine. Welcome to Hive! Welcome to Apache Pig! Research Publication: Sawzall. Interpreting the Data: Parallel Analysis with Sawzall Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan Abstract Very large data sets often have a flat but regular structure and span multiple disks and machines.

Research Publication: Sawzall

Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses.

Published in:Scientific Programming Journal Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure 13:4, pp. 227-298. Download: PDF Version URL (Final): Journal link. For fast, interactive Hadoop queries, Drill may be the answer — Cloud Computing News.

Kafka. Prior releases: 0.7.x, 0.8.0. 1.


Getting Started 1.1 Introduction Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design. What does all that mean? First let's review some basic messaging terminology: Kafka maintains feeds of messages in categories called topics. Communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol. Topics and Logs Let's first dive into the high-level abstraction Kafka provides—the topic. A topic is a category or feed name to which messages are published.

Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. In fact the only metadata retained on a per-consumer basis is the position of the consumer in in the log, called the "offset". Distribution. Storm, distributed and fault-tolerant realtime computation. BigQuery - Google BigQuery.