background preloader

Becoming a Data Scientist - Curriculum via Metromap ← Pragmatic Perspectives

Becoming a Data Scientist - Curriculum via Metromap ← Pragmatic Perspectives
Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!). One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. FundamentalsStatisticsProgrammingMachine LearningText Mining / Natural Language ProcessingData VisualizationBig DataData IngestionData MungingToolbox Each area / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. PS: I did not want to impose the use of any commercial tools in this plan.

Related:  Machine LearningData ScienceData AnalysisData scientistTechnical Tips

Practical Machine Learning Problems - Machine Learning Mastery What is Machine Learning? We can read authoritative definitions of machine learning, but really, machine learning is defined by the problem being solved. Therefore the best way to understand machine learning is to look at some example problems. In this post we will first look at some well known and understood examples of machine learning problems in the real world. 10 Books for Data Enthusiasts Over the last few years, I've invested a lot of time exploring various areas of data analysis and software development. Going down the proverbial coding rabbit hole, I've quietly accumulated a lot of books on various subjects. This is a post about 10 data books that I've gotten a lot of milage out of and that really have legs. Programming Collective Intelligence by Toby Segaran Synopsis An overview of machine learning and the key algorithms in use today. Each chapter outlines a problem, defines an approach to solving it using a particular algorithm, and then gives you all the sample code you need to solve it.

Mean Shift Clustering Overview Mean shift clustering is one of my favorite algorithms. It’s a simple and flexible clustering technique that has several nice advantages over other approaches. In this post I’ll provide an overview of mean shift and discuss some of its strengths and weaknesses. All of the code used in this blog post can be found on github. Stanford Large Network Dataset Collection Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs

66 job interview questions for data scientists We are now at 86 questions. These are mostly open-ended questions, to assess the technical horizontal knowledge of a senior candidate for a rather high level position, e.g. director. What is the biggest data set that you processed, and how did you process it, what were the results?Tell me two success stories about your analytic or computer science projects? How was lift (or success) measured?What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?

Getting Started With Python For Data Science Who is this for and what will I learn? This tutorial assumes some knowledge of Python and programming, but no knowledge whatsoever of data science, machine learning, or predictive modeling (or, heck, even statistics). To the extent there is a target audience, it's probably hacker types who learn best by doing.

Place Autocomplete - Google Places API Looking to use this service in a JavaScript application? Check out the Places Library of the Google Maps API v3. The Place Autocomplete service is a web service that returns place predictions in response to an HTTP request. Metacademy - Level-Up Your Machine Learning Since launching Metacademy, I've had a number of people ask , What should I do if I want to get 'better' at machine learning, but I don't know what I want to learn? Excellent question! My answer: consistently work your way through textbooks. I then watch as they grimace in the same way an out-of-shape person grimaces when a healthy friend responds with, "Oh, I watch what I eat and consistently exercise."

The Data Science Equation I present here the results of a data science study about data science. Based on LinkedIn data (top people listed when you do a people search for data science, from a LinkedIn account with 8,000+ data science connections), we identified the fields most frequently associated with data science, as well as top data scientists on LinkedIn. The statistical validity of data science related fields is strong, while validity is weak for top data scientists. The reason being that you need to have at least 10 endorsements for your LinkedIn data science in the skills section, to be listed as a top data scientist in the following list.

FUN with FACEBOOK in Neo4j Ever since Facebook promoted its “graph search” methodology, lots of people in our industry have been waking up to the fact that graphs are über-cool. Thanks to the powerful query possibilities, people like Facebook, Twitter, LinkedIn, and let us not forget, Google have been providing us with some of the most amazing technologies. Specifically, the power of the “social network” is tempting many people to get their feet wet, and to start using graph technology. And they should: graphs are fantastic at storing, querying and exploiting social structures, stored in a graph database.

K'th Smallest/Largest Element in Unsorted Array Given an array and a number k where k is smaller than size of array, we need to find the k’th smallest element in the given array. It is given that ll array elements are distinct. Examples: We have discussed a similar problem to print k largest elements. Method 1 (Simple Solution) A Simple Solution is to sort the given array using a O(nlogn) sorting algorithm like Merge Sort, Heap Sort, etc and return the element at index k-1 in the sorted array.

Learning from the best Guest contributor David Kofoed Wind is a PhD student in Cognitive Systems at The Technical University of Denmark (DTU): As a part of my master's thesis on competitive machine learning, I talked to a series of Kaggle Masters to try to understand how they were consistently performing well in competitions. What I learned was a mixture of rather well-known tactics, and less obvious tricks-of-the-trade. In this blog post, I have picked some of their answers to my questions in an attempt to outline some of the strategies which are useful for performing well on Kaggle. As the name of this blog suggests, there is no free hunch, and reading this blog post will not make you a Kaggle Master overnight.