Blog Archive » A Taxonomy of Data Science
Posted: September 25th, 2010 | Author: Hilary Mason | Filed under: Philosophy of Data | Tags: data, data science, osemn, taxonomy | 31 Comments Both within the academy and within tech startups, we’ve been hearing some similar questions lately: Where can I find a good data scientist? What do I need to learn to become a data scientist? Or more succinctly: What is data science? We’ve variously heard it said that data science requires some command-line fu for data procurement and preprocessing, or that one needs to know some machine learning or stats, or that one should know how to `look at data’. All of these are partially true, so we thought it would be useful to propose one possible taxonomy — we call it the Snice* taxonomy — of what a data scientist does, in roughly chronological order: Obtain, Scrub, Explore, Model, and iNterpret (or, if you like, OSEMN, which rhymes with possum). We describe each one of these steps briefly below: Obtain: pointing and clicking does not scale.