background preloader

Data Analysis

Facebook Twitter

10 Books for Data Enthusiasts. Over the last few years, I've invested a lot of time exploring various areas of data analysis and software development.

10 Books for Data Enthusiasts

Going down the proverbial coding rabbit hole, I've quietly accumulated a lot of books on various subjects. This is a post about 10 data books that I've gotten a lot of milage out of and that really have legs. Programming Collective Intelligence by Toby Segaran Synopsis An overview of machine learning and the key algorithms in use today. Each chapter outlines a problem, defines an approach to solving it using a particular algorithm, and then gives you all the sample code you need to solve it.

Becoming a Data Scientist - Curriculum via Metromap ← Pragmatic Perspectives. Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!).

Becoming a Data Scientist - Curriculum via Metromap ← Pragmatic Perspectives

One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? T-files. Getting Started With Python For Data Science. Who is this for and what will I learn?

Getting Started With Python For Data Science

This tutorial assumes some knowledge of Python and programming, but no knowledge whatsoever of data science, machine learning, or predictive modeling (or, heck, even statistics). To the extent there is a target audience, it's probably hacker types who learn best by doing. All the code from this tutorial is available on github .

You might encounter terms you're not familiar with, but that shouldn't stop you from completing the tutorial. Data Analysis: How do I get started with basics of data analysis and visualization? FUN with FACEBOOK in Neo4j. Ever since Facebook promoted its “graph search” methodology, lots of people in our industry have been waking up to the fact that graphs are über-cool.

FUN with FACEBOOK in Neo4j

Thanks to the powerful query possibilities, people like Facebook, Twitter, LinkedIn, and let us not forget, Google have been providing us with some of the most amazing technologies. Specifically, the power of the “social network” is tempting many people to get their feet wet, and to start using graph technology. And they should: graphs are fantastic at storing, querying and exploiting social structures, stored in a graph database. So how would that really work? I am a curious, “want to know” but “not very technical” kind of guy, and I decided to get my hands dirty (again), and try some of this out by storing my own little part of Facebook - in neo4j. The first step to take was to get access to my own facebook data.

Deep Learning - The Biggest Data Science Breakthrough of the Decade. Duration: Approximately 60 minutes.

Deep Learning - The Biggest Data Science Breakthrough of the Decade

Cost: Free Machine learning and AI have appeared on the front page of the New York Times three times in recent memory: 1) When a computer beat the world's #1 chess player 2) When Watson beat the world's best Jeopardy players 3) When deep learning algorithms won a chemo-informatics Kaggle competition. We all know about the first two... but what's that deep learning thing about? This happened in November of last year, and it represents a critical breakthrough in data science that every executive will need to know about and react to in the coming years. The NY Times said that these advances "hold implications not just for drug development, but for an array of applications, including marketing and law enforcement".

About Jeremy Howard Jeremy Howard is the President and Chief Scientist at Kaggle. Why becoming a data scientist is NOT actually easier than you think - josephmisiti's posterous. Getting Started with Python for Data Scientists. With the R Users DC Meetup broadening its topic base to include other statistical programming tools, it seemed only reasonable to write a meta post highlighting some of the best Python tutorials and resources available for data science and statistics.

Getting Started with Python for Data Scientists

What you don’t know is often the hardest part of picking up a new skill, so hopefully these resources will help make learning Python a little easier. Prepare yourself for code indentation heaven. Python is such an incredible language because it can do practically anything, from high performance scientific computing to web frameworks such as Django or Flask. Python is heavily used at Google so the language must be doing something right. And, similar to R, Python has a fantastic community around it and, luckily for you, this community can write. Distributions Python is available for free from and there are two popular versions, 2.7 or 3.x.

Python Developer Tools Sublime Text2 - If you have never used it, you should try this editor. NumPy SciPy. How to Interview a Data Scientist. 66 job interview questions for data scientists. We are now at 91 questions.

66 job interview questions for data scientists

We've also added 50 new ones here, and started to provide answers to these questions here. These are mostly open-ended questions, to assess the technical horizontal knowledge of a senior candidate for a rather high level position, e.g. director. What is the biggest data set that you processed, and how did you process it, what were the results? Tell me two success stories about your analytic or computer science projects? How was lift (or success) measured? Related articles: Data Science Recipes - Current best solutions to common data problems. (143) What are some good introductory resources for exploratory data analysis? (143) Big Data: What are some of the books/reference materials to familiarize myself with Big Data and Web Analytics? OpenIntro.