background preloader

BigData

Facebook Twitter

Real-time data analysis using Spark. Big Data is a hot topic these days, and one aspect of that problem space is processing streams of data in near-real time.

Real-time data analysis using Spark

One of the applications that can help you do this is Spark, which is produced at UC Berkeley’s AMP (Algorithms, Machines and People) Lab. The first thing you need when you’re looking at data stream analysis techniques is a stream of data to analyse. I’m using a JSON/WebSocket representation of SIX Financial Information’s real time market data feed. The next thing I need is a problem to solve using my data stream. Now, I’m no great financial wizard, but I’m going to suggest that there might be something useful to be gained from knowing what sectors are currently “trending” - where trending means showing an overall trend in a positive direction, through lots of price changes.

Spark - a quick introduction For those of you that haven’t heard of Spark before, it’s a project written by the folks over at Berkeley, and is a key component of their Berkeley Data Analytics Stack. What is Lumify? It seems fitting that the first post to the Lumify blog spend some additional time explaining what Lumify is.

What is Lumify?

Summarized in one sentence, Lumify is an open source big data analysis and visualization platform. That high-level nebulous sentence doesn't do the project justice or provide great clarity on just what you can do with Lumify. So let's start from the top. What problem does Lumify address? It's no secret the world is producing a lot of information. There are countless vendors in today's marketplace offering a variety of solutions to solve your big data woes. What are the key Lumify concepts? Understanding a couple key concepts will greatly help in making sense of what Lumify does and how it does it. Ontology - An ontology is the structure for organizing information you care to analyze in Lumify.

What can you do with Lumify? Search Lumify provides a full-text search over everything in your graph. Graph Visualization The primary feature of Lumify is the graph visualization. Link Analysis. Big Data – Technologies, Platforms & Products. Mupd8 – The @WalmartLabs Real-time Platform. In recent years, the world has seen an explosive growth in the volume of real-time data streams.

Mupd8 – The @WalmartLabs Real-time Platform

Once the preserve of stock markets and day traders, real-time data is now ubiquitously available to consumers through popular services such as Twitter and Facebook. With the availability and growth of real-time data comes the inevitable problem of real-time data overload, and the need for systems that can separate the signal from the noise. As we began working with firehoses from various social media sites, we recognized the need for a general-purpose real-time stream processing platform that could address the issues of scale and performance -- and enable our stream processing applications to focus on the quality of their generated content.

“Mupd8” came into existence to fulfill that need. Big Data Slide decks. A Formula for Data Gravity « Data Gravity. Background Before creating DataGravity.org I first blogged about Data Gravity on my personal blog in December of 2010 and several times since then.

A Formula for Data Gravity « Data Gravity

I have watched the concept of Data Gravity grow beyond anything that I ever expected. I have also watched as a startup-company decided to name itself DataGravity. As I began to speak about Data Gravity to others and answer questions, I realized that maybe it was something more than simply a novel concept describing an effect. This began my quest for a formula that allows Data Gravity to be calculated. The Search I started out by doing what everyone does , I Googled Gravity Formula and I Googled Data Gravity Formula and something caught my eye, the first hit from Data Gravity Formula returned the Gravity model of trade on wikipedia I found this fascinating. The first thing that I learned was that in order to have Gravity, you must calculate Mass. SNA & ONA Projects, Cases & Research by Orgnet, LLC.

We have participated in 500+ diverse consulting projects applying social network analysis [SNA] and organizational network analysis [ONA].

SNA & ONA Projects, Cases & Research by Orgnet, LLC

We have worked with large, medium, and small businesses, governments, universities, not-for-profits and their funders, and many consulting firms. Organizations, Projects, & Teams Human Capital + Social Capital = [PDF] Managing the 21st Century Organization [PDF] Networks of Adaptive/Agile Organizations [PDF] Human Relationships & Organizational Performance [PDF] Best Practice: Organizational Network Mapping [PDF] A More Accurate Way to Measure Diversity [PDF]Discovering Communities of Practice [Read...] 3+ Alternatives to Apache Hadoop. Next week the SiliconAngle team is heading to the HadoopWorld event in New York City.

3+ Alternatives to Apache Hadoop

We’ll be broadcasting theCube live and covering all the latest developments in the Apache Hadoop ecosystem. But it’s important to remember that Hadoop isn’t the only game in town. As we ramp up our coverage of Hadoop in advance of the event, here are some other big data projects to keep in mind. Update: I just wrote about another alternative: Spark. HPCC Systems The most obvious and direct competitor to Hadoop is HPCC Systems, an open source spin-off from LexisNexis Risk Solutions. What it doesn’t have yet is a developer ecosystem on par with Hadoop. Spark Cluster Computing Framework. Data: Where can I get large datasets open to the public.

Visualization

Graph processing. Hadoop.