background preloader

Big Data

Facebook Twitter

InfluxDB. Scidb. Connecting R with Amazon Redshift - AWS Big Data Blog. Scalable Machine Learning. 7 command-line tools for data science. Update (05-02-2017) My new company Data Science Workshops provides in-company training and coaching on this exciting topic.

7 command-line tools for data science

Update (7-17-2014) You may be interested in my book Data Science at the Command Line, which contains over 70 command-line tools for doing data science. Data science is OSEMN (pronounced as awesome). That is, it involves Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data. As a data scientist, I spend quite a bit of time on the command-line, especially when there’s data to be obtained, scrubbed, or explored. And I’m not alone in this. I would like to continue this discussion by sharing seven command-line tools that I have found useful in my day-to-day work. 1. jq - sed for JSON JSON is becoming an increasingly common data format, especially as APIs are appearing everywhere.

Imagine we’re interested in the candidate totals of the 2008 presidential election. Curl -s ' > nyt.json where -s puts curl in silent mode. Into nicely indented and colored output: <! Indonesia is mapping Jakarta floods in real time using Twitter. MongoDB University. How to build your own Facebook Sentiment Analysis Tool. Machine Learning & Statistics Online Marketing Programming 94Share 14Share 491Share In this article we will discuss how you can build easily a simple Facebook Sentiment Analysis tool capable of classifying public posts (both from users and from pages) as positive, negative and neutral.

How to build your own Facebook Sentiment Analysis Tool

We are going to use Facebook’s Graph API Search and the Datumbox API 1.0v. The complete PHP code of the tool can be found on Github. How Facebook Sentiment Analysis works? As we discussed in previous articles, performing Sentiment Analysis requires using advanced Machine Learning and Natural Language Processing techniques. Performing Sentiment Analysis on Facebook does not differ significantly to what we discussed in the past. The above process is significantly simplified by using the Datumbox’s Machine Learning API. Building the Facebook Sentiment Analysis tool Creating your own Facebook Application Unfortunately Facebook made it mandatory to authenticate before accessing their Graph Search API. How to Build A Blog Recommender. Today is the fourth day of my challenge to learn 30 technologies in 30 days.

How to Build A Blog Recommender

So far I am enjoying it and getting good response from fellow developers. I am more than motivated to do it for full 30 days. In this blog, I will cover how we can very easily build blog recommendation engine using PredictionIO. I did not find much documentation around using PredictionIO with Java. So, this blog might help people looking for end-to-end PredictionIO Java tutorial. What is PredictionIO? PredictionIO is an open source machine learning server application written in Scala. As a user, we do not have to worry about all these details. Why should I care? I decided to learn PredictionIO because I wanted to use a library which can help me add machine learning capabilities.

Installing PredictionIO. Implementing Big Data Analysis. Mortardata/mortar-recsys. MIT Professional Education MOOC on Big Data. CSAIL Researchers are lined up to teach a MOOC with a difference on the edX platform.

MIT Professional Education MOOC on Big Data

The difference is that Online X courses, offered through MIT Professional Education, will charge a fee to all participants. Until now all edX courses have been available free of charge. In the case of courses such as CS50 which offers a certificate to paying students it has provided the alternative of "auditing" the course, an option whereby you have complete access to all the course materials, and of obtaining an Honor Code Certificate by successfully completing it.

Now MIT Professional Education has announced a new line of professional programs called Online X Programs with courses that: will provide companies and organizations the ability to offer training and education to their employees on a topic that confronts most industries today. Payment from individuals who will be collected on enrollment while groups of 15 or more students can be invoiced. A full course outline can be found on its web page. Pdf/1404.4140v1.pdf. GraphBuilder. GraphBuilder Kushal Datta ABOUT GraphBuilder GraphBuilder is a Java library for constructing graphs out of large datasets for data analytics and structured machine learning applications that exploit relationships in data. The library offloads many of the complexities of graph construction, such as graph formation, tabulation, compression, transformation, partitioning, output formatting, and serialization. It scales using the MapReduce parallel programming model.

GraphBuilder implements a pipeline that is analogous to the extract , transform, and load sequence commonly used to populate relational databases.

Graphlab

Hadoop.