background preloader

Data Science

Facebook Twitter

R project. The Open Source Data Science Masters by datasciencemasters. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Data Science. A Gentle Introduction to Random Forests, Ensembles, and Performance Metrics in a Commercial System. This is the first in a series of posts that illustrate what our data team is up to, experimenting with, and building ‘under the hood’ at CitizenNet.

A Gentle Introduction to Random Forests, Ensembles, and Performance Metrics in a Commercial System

Dr. Arshavir Blackwell is CitizenNet’s resident Data Scientist. He has been involved in web-scale machine learning and information retrieval for over 10 years. Hot tip: Click on the images for a larger view. One of the first posts we published spoke at a high level of the technical problem CitizenNet is trying to solve. On the CitizenNet platform, a user would create a project that would define (broadly) the target audience, the pieces of Facebook content they are looking to promote, and other campaign and financial information.

Behind the scenes, a robust prediction system builds the targets for the project. Neural Network Neural networks have long been used in problems such as this, with a lot of data, many variables, and the possibility of noise in the data. Each input point is a high-dimensional vector. Strengths and weaknesses. Note that: A List of Data Science and Machine Learning Resources. Every now and then I get asked for some help or for some pointers on a machine learning/data science topic. I tend respond with links to resources by folks that I consider to be experts in the topic area. Over time my list has gotten a little larger so I decided to put it all together in a blog post.

Since it is based mostly on the questions I have received, it is by no means complete, or even close to a complete list, but hopefully it will be of some use. Perhaps I will keep it updated, or even better yet, feel free to comment with anything you think might be of help. Also, when I think of data science, I tend to focus on Machine Learning rather than the hardware or coding aspects. Neo Makes Cheese Before you do anything else, start boning up on your linear (matrix) algebra. Painful! Kinda nuts, and it really didn’t make total sense, but his point was, you have got to have the basics down before you can actually make anything useful.

Where to start? General Machine Learning. Machine Learning Surveys. Hacking Tools / Downloads / Scripts & Codes. The Modern Data Nerd Isn't as Nerdy as You Think. Data science is turning math nerds into rock stars — and it turns out they aren’t as nerdy as you might think.

The Modern Data Nerd Isn't as Nerdy as You Think

Photo: Enzo Varriale Data scientists are fast becoming the rock stars of the 21st century. Thanks in part to Nate Silver’s eerily accurate election predictions and Paul DePodesta’s baseball-revolutionizing Moneyball techniques, math nerds have become celebrities. It’s debatable how much their work differs from what statisticians have done for years, but it’s a growing field, and many companies are desperate to hire their own data scientists. The irony is that many of these math nerds aren’t as math nerdy as you might expect. Some of the best minds in the field lack the sort of heavy math or science training you might expect.

Data scientist John Candido agrees. Candido has a master’s degree in psychology, but not a PhD in math or physics. “If you have a PhD, you’ll come to a problem with more background, but you’ll still need to get your hands dirty to solve it,” Candido says. One Page R: A Survival Guide to Data Science with R. A collection of useful one-page resources for a data miner, data scientist, and/or a decision scientist.

One Page R: A Survival Guide to Data Science with R

The modules include code, lectures, and one-page recipes for getting things done. Graham Williams, the founder of togaware, the developer of Rattle, free and open-source data mining software based on R, and author of Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R) book, has summarized useful R resources in One Page R: A Survival Guide to Data Science with R The include tools for the data miner, or the data scientist, and or the decision scientist. Machine learning. Datamining.