background preloader

Big Data

Facebook Twitter

Uber uses data science to predict where its riders want to go | VentureBeat | Big Data | by Jordan Novet. Uber is at it again, improving its alternative-cab service app after analyzing data on usage from its customers. This time, data scientists have come up with ways to figure out where exactly riders are headed, even in a densely packed city. The researchers determined the accuracy of their model by comparing its predictions with anonymized information on more than 3,000 Uber passengers’ rides in San Francisco this year, according to a blog post today from Uber’s Ren Lu. The system makes certain assumptions, taking into consideration a rider’s previous destinations, places Uber riders have gone to, and other factors, Lu wrote.

Uber has previously used data science and even artificial intelligence, to optimize its operations. Recently, for instance, the company concluded that drivers wishing to optimize their earnings would do better than to stay put in one place than to drive around. That observation came from a simulation from the company’s science team. Powered by VBProfiles.

Immersion: a people-centric view of your email life. Mining Twitter for Airline Consumer Sentiment. Airlines, Consumers, and Twitter Anyone who travels regularly recognizes that airlines struggle to deliver a consistent, positive customer experience. Through extensive interview and survey work, the American Customer Satisfaction Index ( quantifies this impression. As a group, airlines falls at the bottom of their industry rankings, below the Post Office and insurance companies: Meanwhile, the immediacy and accessibility of Twitter provides a real-time glimpse into consumer's frustration: This tutorial demonstrates how to use R to collect tweets and apply a (very) naive algorithm to estimate their emotional sentiment. This tutorial was originally presented as a first-time introduction to R for the savvy audience of the Boston Predictive Analytics Meetup Group.

This work is also featured in Elsevier's forthcoming book Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications by Gary Miner et al. Loading Data into R The twitteR package M.A. Campaigns Mine Personal Lives to Get Out Vote. In the weeks before Election Day, millions of voters will hear from callers with surprisingly detailed knowledge of their lives. These callers — friends of friends or long-lost work colleagues — will identify themselves as volunteers for the campaigns or independent political groups.

The callers will be guided by scripts and call lists compiled by people — or computers — with access to details like whether voters may have visited pornography Web sites, have homes in foreclosure, are more prone to drink Michelob Ultra than Corona or have gay friends or enjoy expensive vacations. The callers are likely to ask detailed questions about how the voters plan to spend Election Day, according to professionals with both presidential campaigns. What time will they vote? What route will they drive to the polls? Simply asking such questions, experiments show, is likely to increase turnout. In statements, both campaigns emphasized their dedication to voters’ privacy. Open Data Handbook version 1.0. Image by opensource.com How will you use the Open Data Handbook? To define open data 8.8% (10 votes) To understand why open data is useful 25.4% (29 votes) To make open data 7.9% (9 votes) To share with policy makers 57.9% (66 votes) The Handbook discusses the ‘why, what and how’ of open data – why to go open, what open is, how to make data open and how to do useful things with it.

Read on to find out more about what’s in the Handbook, who it’s for, and how you can get involved – for example by adding to and improving the Handbook, or by translating it into more languages. The Open Knowledge Foundation are proud to announce the launch of version 1.0 of the Open Data Handbook (formerly the Open Data Manual): Read the Open Data Handbook now! What is the Open Data Handbook? The Open Data Handbook is a valuable resource for everyone interested in open data. The Open Data Handbook is targeted towards a broad audience. Finally, the Handbook is intended to be an organic project. Where did it come from? [New Data] The Science of Christmas. If you like social media data and science like this, buy my latest book: “Zarrella’s Hierarchy of Contagiousness. Science makes everything better. Seriously, it’s a proven fact. So of course I did some analysis about Christmas and found some surprising insights.

Don’t get fooled by the unicorns-and-rainbows myths about the holidays anymore. Common Crawl Corpus : Public Data Sets. Data.gov. White House to open source Data.gov as open government data platform. As 2011 comes to an end, there are 28 international open data platforms in the open government community. By the end of 2012, code from new “Data.gov-in-a-box” may help many more countries to stand up their own platforms.

A partnership between the United States and India on open government has borne fruit: progress on making the open data platform Data.gov open source. In a post this morning at the WhiteHouse.gov blog, federal CIO Steven VanRoekel (@StevenVDC) and federal CTO Aneesh Chopra (@AneeshChopra) explained more about how Data.gov is going global: As part of a joint effort by the United States and India to build an open government platform, the U.S. team has deposited open source code — an important benchmark in developing the Open Government Platform that will enable governments around the world to stand up their own open government data sites. The U.S. What’s next for open government data in the United States has yet to be written. Drupal as an open government platform? Related: AppsforItaly | Microsoft drops Dryad; puts its big-data bets on Hadoop. Just a month after insisting there was still a place for its own Hadoop competitor, Microsoft officials have decided to discontinue work on LINQ to HPC, codenamed "Dryad.

" In a November 11 post on the Windows HPC Team Blog, officials said that Microsoft had provided a minor update to the latest test build of the Dryad code as part of Windows High Performance Computing (HPC) Pack 2008 R2 Service Pack (SP) 3. But they also noted that "this will be the final (Dryad) preview and we do not plan to move forward with a production release. " Dryad was supposed to provide a way for running big-data jobs across clusters of Windows servers. It was designed to provide a platform for developers to build applications that can process large amounts of unstructured data.

Just a month ago, Microsoft updated its near-final test build of Dryad. But it now appears Microsoft is putting all its big-data eggs in the Hadoop framework basket. From the November 11 HPC Team blog post: Oracle does NoSQL « Max Schireson's blog. Recently there have been rumors of Oracle introducing a NoSQL database at Open World next week. As a 9-year Oracle veteran as well as an 8-year veteran of the alternative database world (most recently as President of 10gen, the sponsor of MongoDB), this is an exciting development for me. I don’t think NoSQL means the end of SQL or relational – its one of the reasons I don’t love the name “NoSQL”.

I do, however, think that relational is not the answer to everything. When I left Oracle eight years ago to work on alternatives to relational database, I felt that there was space for an alternative that was more agile for developers and more scalable for deployment on a large number of commodity servers. How do I feel about Oracle introducing a “NoSQL” database”?

In my opinion this is a good thing for alternative database vendors. In my time at Oracle, I found Larry Ellison to have a great sense of what markets were important and to be a fierce competitor. Like this: Like Loading... Forecasting Ace : Aggregating the Predictions of Experts. Influence Report.

Change

NoSQL. How Yahoo Spawned Hadoop, the Future of Big Data | Wired Enterprise. Eric Baldeschwieler, aka Eric14, CEO of Hortonworks The email went to Eric14. His real name is Eric Baldeschwieler, but no one calls him that. At fourteen letters, Baldeschwieler is a mouthful, and he works in a world where a name takes a backseat to an online handle. The sender was Rob Bearden, a serial entrepreneur from Atlanta, Georgia, famous for actually making money from open source software.

The two met for dinner at a Vietnamese restaurant in Palo Alto, California, just down the road from Yahoo’s Sunnyvale headquarters. Dubbed Hortonworks, the new venture is by no means guaranteed success, but it certainly has its hands on the right technology. “There’s a change happening, driven by unprecedented volumes and velocities of unstructured data.

Today, Hadoop underpins not only Yahoo, but Facebook, Twitter, eBay, and dozens of other high-profile web outfits. Last year, eBay erected a Hadoop cluster spanning 530 servers. Open Source Déjà Vu Rob Bearden.