background preloader

Data science

Facebook Twitter

Swirl: Learn R, in R. Statistical Learning. About This Course This is an introductory-level course in supervised learning, with a focus on regression and classification methods.

Statistical Learning

The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical). This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics.

We focus on what we consider to be the important elements of modern data analysis. The lectures cover all the material in An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). Course Staff. Learn R. Upload mybringback Loading...

Learn R

Working... ► Play all Learn R mybringback23 videos17,852 viewsLast updated on Jun 26, 2014. Quick-R: Home Page. Why R is Hard to Learn. By Bob Muenchen R has a reputation of being hard to learn.

Why R is Hard to Learn

Some of that is due to the fact that it is radically different from other analytics software. Some is an unavoidable byproduct of its extreme power and flexibility. And, as with any software, some is due to design decisions that, in hindsight, could have been better. If you have experience with other analytics tools, you may at first find R very alien. Below is a list of complaints about R that I commonly hear from people taking my R workshops.

Unhelpful Help. Code School - Try R. DataVisualization. Online R tutorials and Data Science Courses - DataCamp. An R Introduction to Statistics. The Data Science Toolkit - taking your first steps towards becoming a Data Scientist. When I stumbled upon the phrase "Data Scientist" 3 years ago, I immediately recognized it as my best prospect for a productive career.

The Data Science Toolkit - taking your first steps towards becoming a Data Scientist

How to start? What are the tools of the trade? This is the blog post I wish I could have read back then. Many of the things I list here didn't exist or were unstable until recently. I discovered the "predictive analytics" rabbit hole and started to read and watch whatever I could find on the subject. PostgreSQL MongoDB HBase Cassandra ElasticSearch Redis Tokyo Tyrant Chef/Puppet/Ansible Every aspiring "Big Data" worker should watch his interview. Which of these databases will I need for my DS career? A Flip Kromer quote driving my current project is "We need a Mechanical Turk that slides up the talent scale. " Visualizing data is the most glamorous of the DS skills, and most of us are dazzled with d3.js and feel love at first sight. Learning NodeJS (closures, promises..) dumps even more tools on the list and my learning curve is turning into a wall.

Wait! Our Data Science Apprenticeship is Now Live. Updated on 12/22/1014: Click here for updated application process.

Our Data Science Apprenticeship is Now Live

Updated on 5/18/2014: Click here to check the most recent list of projects offered to candidates. Our textbook is now published, new data sets and new tutorials added, and the data science cheat sheet will soon be available in its final format. Our program is for practitioners interested in being mentored by Dr Granville. Participants work on real-life data science projects, to gain professional experience, knowledge and visibility in the data science community. So what does it mean for you and how to get started? First, we remind you that this is still a program for self-learners, presenting original, core, modern, applied, useful pioneering data science material not found in traditional programs. How to get started?

Read the steps required to complete the program (see below), and if you are still interested, proceed to step #1. The program in 7 steps Email us at vincentg@datasciencecentral to show your interest. My Data Science Book - Table of Contents. The book is now published!

My Data Science Book - Table of Contents

The book is also part of our apprenticeship. Part of the content as well as new content is in a separate document called Addendum. Click here to download the addendum. The book is available on the Wiley website. Also, read our article on strong correlations to see how various sections of our book apply to modern data science. About the Author Dr. Data Science Apprenticeship.

Our Data Science Apprenticeship is Now Live. 20 short tutorials all data scientists should read (and practice) Data Science Cheat Sheet. I will update this article regularly.

Data Science Cheat Sheet

An old version can be found here and has many interesting links. All the material presented here is not in the old version. This article is divided into 11 sections. 1. Hardware A laptop is the ideal device. Even if you work heavily on the cloud (AWS, or in my case, access to a few remote servers mostly to store data, receive data from clients and backups), your laptop is you core device to connect to all external services (via the Internet). 2. Once you installed Cygwin, you can type commands or execute programs in the Cygwin console. Figure 1: Cygwin (Linux) console on Windows laptop You can open multuple Cygwin windows on your screen(s).

To connect to an external server for file transfers, I use the Windows FileZilla freeware rather than the command-line ftp offered by Cygwin. You can run commands in the background using the & operator. . $ notepad VR3.txt & A few more things about files Other extensions include File management 3. Examples. Data Science Apprenticeship.

Awesome Data Science Repository. One Page R: A Survival Guide to Data Science with R.