background preloader

Data Analysis

Facebook Twitter

10 Books for Data Enthusiasts. Over the last few years, I've invested a lot of time exploring various areas of data analysis and software development. Going down the proverbial coding rabbit hole, I've quietly accumulated a lot of books on various subjects. This is a post about 10 data books that I've gotten a lot of milage out of and that really have legs. Programming Collective Intelligence by Toby Segaran Synopsis An overview of machine learning and the key algorithms in use today. Each chapter outlines a problem, defines an approach to solving it using a particular algorithm, and then gives you all the sample code you need to solve it.

Why you should read it One of my favorite books (non-techincal and technical). Synopsis This book provides a gentle overview to statistics and a nice tutorial on using Python as well. Becoming a Data Scientist - Curriculum via Metromap ← Pragmatic Perspectives. Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!).

One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap? Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. FundamentalsStatisticsProgrammingMachine LearningText Mining / Natural Language ProcessingData VisualizationBig DataData IngestionData MungingToolbox Each area / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. T-files. Getting Started With Python For Data Science. Who is this for and what will I learn? This tutorial assumes some knowledge of Python and programming, but no knowledge whatsoever of data science, machine learning, or predictive modeling (or, heck, even statistics).

To the extent there is a target audience, it's probably hacker types who learn best by doing. All the code from this tutorial is available on github . You might encounter terms you're not familiar with, but that shouldn't stop you from completing the tutorial. By the end, you won't know a heck of a lot more about data science per se , but you'll have a nice environment set up where you can easily play with lots of different data science tools and even make credible entries to Kaggle competitions.

Most importantly you'll be in a great position to experiment and learn more data science. Here's what you'll learn: Excited? 1. First thing, we'll need a Python environment suitable for scientific and statistical computing. Numpy - (pronounced num-pie ) Powerful numerical arrays. 2. Data Analysis: How do I get started with basics of data analysis and visualization? FUN with FACEBOOK in Neo4j. Ever since Facebook promoted its “graph search” methodology, lots of people in our industry have been waking up to the fact that graphs are über-cool. Thanks to the powerful query possibilities, people like Facebook, Twitter, LinkedIn, and let us not forget, Google have been providing us with some of the most amazing technologies.

Specifically, the power of the “social network” is tempting many people to get their feet wet, and to start using graph technology. And they should: graphs are fantastic at storing, querying and exploiting social structures, stored in a graph database. So how would that really work? The first step to take was to get access to my own facebook data.

After a few seconds, the app presents me with a csv document that I could copy and paste into a text file: and in the spreadsheet I then used some spreadsheet wizardry to generate the Cypher statements that would allow us to import the data. I ended up with a cypher statement that looks like And that’s about it. Deep Learning - The Biggest Data Science Breakthrough of the Decade. Duration: Approximately 60 minutes. Cost: Free Machine learning and AI have appeared on the front page of the New York Times three times in recent memory: 1) When a computer beat the world's #1 chess player 2) When Watson beat the world's best Jeopardy players 3) When deep learning algorithms won a chemo-informatics Kaggle competition. We all know about the first two... but what's that deep learning thing about?

This happened in November of last year, and it represents a critical breakthrough in data science that every executive will need to know about and react to in the coming years. The NY Times said that these advances "hold implications not just for drug development, but for an array of applications, including marketing and law enforcement". About Jeremy Howard Jeremy Howard is the President and Chief Scientist at Kaggle. Jeremy's passion is applying algorithms to data. Why becoming a data scientist is NOT actually easier than you think - josephmisiti's posterous. Getting Started with Python for Data Scientists. With the R Users DC Meetup broadening its topic base to include other statistical programming tools, it seemed only reasonable to write a meta post highlighting some of the best Python tutorials and resources available for data science and statistics.

What you don’t know is often the hardest part of picking up a new skill, so hopefully these resources will help make learning Python a little easier. Prepare yourself for code indentation heaven. Python is such an incredible language because it can do practically anything, from high performance scientific computing to web frameworks such as Django or Flask. Python is heavily used at Google so the language must be doing something right. Distributions Python is available for free from and there are two popular versions, 2.7 or 3.x. Commercial distributions are also available that have included and tested various useful packages such as the Enthought Python Distribution. Python Developer Tools Learning Python NumPy SciPy. How to Interview a Data Scientist. 66 job interview questions for data scientists. We are now at 91 questions. We've also added 50 new ones here, and started to provide answers to these questions here.

These are mostly open-ended questions, to assess the technical horizontal knowledge of a senior candidate for a rather high level position, e.g. director. What is the biggest data set that you processed, and how did you process it, what were the results? Tell me two success stories about your analytic or computer science projects? Related articles: Previous digest | Recent jobs | Top Links | Data Science eBook. Data Science Recipes - Current best solutions to common data problems.

(143) What are some good introductory resources for exploratory data analysis? (143) Big Data: What are some of the books/reference materials to familiarize myself with Big Data and Web Analytics? OpenIntro.