background preloader

Big Data

Facebook Twitter

Probability Theory — A Primer. It is a wonder that we have yet to officially write about probability theory on this blog. Probability theory underlies a huge portion of artificial intelligence, machine learning, and statistics, and a number of our future posts will rely on the ideas and terminology we lay out in this post. Our first formal theory of machine learning will be deeply ingrained in probability theory, we will derive and analyze probabilistic learning algorithms, and our entire treatment of mathematical finance will be framed in terms of random variables. And so it’s about time we got to the bottom of probability theory. In this post, we will begin with a naive version of probability theory. That is, everything will be finite and framed in terms of naive set theory without the aid of measure theory.

This has the benefit of making the analysis and definitions simple. So let us begin with probability spaces and random variables. Finite Probability Spaces Definition: A finite set equipped with a function and . . First job in Hadoop using Syncsort DMExpress (3/3) Install Cloudera CDH4 on Linux VM cluster (2/3) Install Linux for a small Hadoop cluster in VMWare - Debian version (1/3) Install Hadoop (CDH4) on 5 nodes with VMWare, CDH4, Cloudera Manager 4. These Smart, Social Apps Bring Big Data Down to Size. Social Media News. Cloudera Debuts Real-Time Hadoop Query - Software - Information. Cloudera says Project Impala real-time engine overcomes Hadoop batch delays, opens platform to relational databases and business intelligence tools. Amazon's 7 Cloud Advantages: Hype Vs. Reality (click image for larger view and for slideshow) Adding a new component for real-time querying to its Hadoop software distribution, Cloudera introduced Cloudera Impala on Wednesday at the Strata Conference in New York.

Developed in stealth mode and now in public beta, the software takes on one of Hadoop's biggest flaws: batch-oriented processing delays and poor access to data. Impala is an interactive-speed SQL query engine that runs on existing Hadoop infrastructure. It makes all the data in the Hadoop Distributed File System (HDFS) and Apache HBase database tables accessible for real-time querying.

. [ Want more on this week's Hadoop announcements? Impala is a two-part product. Cloudera has a number of customers beta testing Impala, two of which are going public. More Insights. Big Data Meets BI: Beyond The Hype - Software - Business Intelligence. With Hadoop quickly gaining adoption, Cloudera, Platfora, SiSense and others are introducing new options for gaining business intelligence from this big data platform.

Big Data was the big news in New York last week at the sold out Strata Conference. I was lured to Strata by both the traditional vendors I cover, such as SAP, SAS, and Tableau exhibiting there, as well as by big data analytics startups such as Datameer and Karmasphere. There's a high degree of hype around big data, but there's also a high degree of innovation, tangible benefits, and venture capital backing. Coming from the perspective of the established business intelligence world, here's the skinny on where big data meets BI. First, big data is more than Hadoop, the open source distributed file system capable of scaling to handle petabytes of data. .

[ Want more of Cindi Howson's expert BI analysis? That brings me to the first big announcement at the conference: Cloudera Impala, a new real-time query engine for Hadoop. Emerging Technologies - jStart - Portfolio - USC Annenberg School of Journalism. Emerging Technologies - jStart - News - Using Big Data to aid medical providers. Emerging Technologies - jStart - Portfolio - Hertz Corporation. 10 ways big data changes everything. Case study: ING Direct taps big data to understand customers. ING Direct wanted to get into the heads of customers, so the bank started a data-collection initiative to gain a deeper understanding of how it was interacting with customers.

Greg Nichelsen(Credit: ING Direct) Now, years later, ING Direct faces the problem of having too much data, and is trying to make sense of all of the information in a useful and cost-effective way. Thus far, ING Direct has already spent in the range of AU$4 to AU$5 million on data analytics alone. "We have collected quite granular and detailed information on exactly how customers interact with the bank," ING Direct head of business intelligence (BI) Greg Nichelsen told ZDNet.

This has prompted ING Direct to dabble in big-data solutions to expedite the process of using all of the collected data to help make business decisions. ING Direct's BI team is not contained within the IT department. The BI team has a stake in marketing, the brand, customer experience, and the banking products themselves.