background preloader

Data Science

Facebook Twitter

How to choose the best charts for your infographic - Venngage. One of the most important steps in creating infographics is choosing the right charts to tell your story.

How to choose the best charts for your infographic - Venngage

How do you pick the best charts to represent your data in a unique and eye-catching way to successfully deliver your message? What are the techniques you can use to visualize your information so that your data speaks for itself? Here are some tried and true tips from the frontlines: 1. 100 open source Big Data architecture papers for data professionals. Amazon. Amazon. Top 10 data mining algorithms in plain English. Today, I’m going to explain in plain English the top 10 most influential data mining algorithms as voted on by 3 separate panels in this survey paper.

Top 10 data mining algorithms in plain English

Once you know what they are, how they work, what they do and where you can find them, my hope is you’ll have this blog post as a springboard to learn even more about data mining. What are we waiting for? Let’s get started! Update 16-May-2015: Thanks to Yuval Merhav and Oliver Keyes for their suggestions which I’ve incorporated into the post. Learn From the Industry's Best - Big Data University. Fogs, logs and cogs: The newer, bigger shape of big data in the Internet of Things. Big data is becoming the next best thing to true magic.

Fogs, logs and cogs: The newer, bigger shape of big data in the Internet of Things

It is everywhere and, increasingly, nowhere specific. Every node in the known computing universe is becoming a component in a vast, distributed, pervasive big data cloud. As we transition to a world where clouds penetrate every facet of our lives, we need to wrap our heads around the thought that every edge node, no matter how resource-constrained, can be interconnected, intelligent and integral to the performance of the whole.

What I’m sketching out is the vision of a world in which the Internet of Things (IoT) increasingly drives the evolution of cloud computing architectures. In an IoT-centric world, nobody needs to know that your cloud’s processing, storage and other functions have been virtualized to endpoints of every size, configuration and capability. As the IoT cloud evolves in this direction, so will big data. This is the vision of "fog computing. " 6 dataset lists curated by data scientists. Docs Blog 6 dataset lists curated by data scientists November 21, 2013 Scott Haylon Since we do a lot of experimenting with data, we’re always excited to find new datasets to use with Mortar.

6 dataset lists curated by data scientists

We’re saving bookmarks and sharing datasets with our team on a nearly-daily basis. There are tons of resources throughout the web, but given our love for the data scientist community, we thought we’d pick out a few of the best dataset lists curated by data scientists. D3.js - Data-Driven Documents. Plotly. 66 job interview questions for data scientists. We are now at 91 questions.

66 job interview questions for data scientists

We've also added 50 new ones here, and started to provide answers to these questions here. These are mostly open-ended questions, to assess the technical horizontal knowledge of a senior candidate for a rather high level position, e.g. director. What is the biggest data set that you processed, and how did you process it, what were the results? Tell me two success stories about your analytic or computer science projects? Free Infographic Maker - Venngage.

Vizualize.me: Visualize your resume in one click. In-depth introduction to machine learning in 15 hours of expert videos. In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR).

In-depth introduction to machine learning in 15 hours of expert videos

I found it to be an excellent course in statistical learning (also known as "machine learning"), largely due to the high quality of both the textbook and the video lectures. And as an R user, it was extremely helpful that they included R code to demonstrate most of the techniques described in the book. (Update: The course will be offered again in January 2016!) If you are new to machine learning (and even if you are not an R user), I highly recommend reading ISLR from cover-to-cover to gain both a theoretical and practical understanding of many important methods for regression and classification. It is available as a free PDF download from the authors' website. P.S. Chapter 1: Introduction (slides, playlist) Top 10 data mining algorithms in plain English.

How to determine the quality and correctness of classification models? Part 2 - Quantitative quality indicators. Basic quantitative quality indicators In the last part of the tutorial we introduced the basic qualitative model quality indicators.

How to determine the quality and correctness of classification models? Part 2 - Quantitative quality indicators

Let us recall them now: Derived quality indicators We will now discuss derived variants of these indicators. TPR (True Positive Rate) – reflects the classifier’s ability to detect members of the positive class (pathological state) TNR (True Negative Rate) – reflects the classifier’s ability to detect members of the negative class (normal state) FPR (False Positive Rate) – reflects the frequency with which the classifier makes a mistake by classifying normal state as pathological FNR (False Negative Rate) – reflects the frequency with which the classifier makes a mistake by classifying pathological state as normal.

60 new resources and articles about data science, IoT, machine learning, R, Python, big data. Probability Cheatsheet. A collection of links for streaming algorithms and data structures. Cinemas NOS. The Open Source Data Science Masters. Information Is Beautiful. CS109 Data Science. Learning from data in order to gain useful predictions and insights.

CS109 Data Science

This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries. We will be using Python for all programming assignments and projects. All lectures will be posted here and should be available 24 hours after meeting time. The course is also listed as AC209, STAT121, and E-109.