background preloader

Simply Statistics

Related:  Data science blogs

Intro to pandas data structures A while back I claimed I was going to write a couple of posts on translating pandas to SQL. I never followed up. However, the other week a couple of coworkers expressed their interest in learning a bit more about it - this seemed like a good reason to revisit the topic. What follows is a fairly thorough introduction to the library. Intro to Data Science for Managers [Mindmap] Data science has become an integral part of many modern projects and businesses, with an increasing number of decisions now based on data analysis. The data science industry is experiencing an acute shortage of talents, not only of data scientists but also of managers, having some understanding of analytics and data science. As a manager, you can ultimately become the company's expert in data usage, creating opportunities for the evolution of your organization.

DatasFrame This is part one in a multipart series on writing idiomatic pandas code. This post is available as a Jupyter notebook There are many great resources for learning pandas. For beginners, I typically recommend Greg Reda's 3-part introduction, especially if you're familiar with SQL. Of course, there's the pandas documentation itself. I gave a talk at PyData Seattle targeted as an introduction if you prefer video form.

The Third Wave: Democratization of Data Science/Algorithms Everybody wants to talk about AI today. AI is the new black. AI is the new platform. AI is going to change everything, blah, blah, blah… Hardly a day goes by without hearing somebody talking about AI these days. But AI is actually not new. Life Is Study: Python for Data Analysis Part 1: Setup The end of the world has long been the domain priests and poets, but if modern media has taught us anything, it’s that doomsday could be just around the corner. Whether you fear rogue meteors, climate change or beasts from the center of the earth, it’s no small miracle that we’ve made it this far. If tool making is what separates us from the animals, making machines capable of deflecting comets, flying to Mars and perhaps even battling toe to toe with Kaiju is what will separate us from a species that goes extinct in the blink of the cosmic eye.

Data Science: Challenges and Directions By Longbing Cao Communications of the ACM, Vol. 60 No. 8, Pages 59-68 10.1145/3015456Comments While data science has emerged as an ambitious new scientific field, related debates and discussions have sought to address why science in general needs data science and what even makes data science a science. However, few such discussions concern the intrinsic complexities and intelligence in data science problems and the gaps in and opportunities for data science research. Following a comprehensive literature review,5,6,10,11,12,15,18 I offer a number of observations concerning big data and the data science debate. For example, discussion has covered not only data-related disciplines and domains like statistics, computing, and informatics but traditionally less data-related fields and areas like social science and business management as well.

Aonghus' Blog I recently came across this little data challenge, which was posted by Zalando (one of the top fashion retailers in Europe) as a teaser for data scientists/analysts. The challenge is quite straightforward and is a good opportunity to show how to deal with this kind of analysis using the standard tools of python and the interactive notebook. For data analysis, the community is in two minds between between python and R, but for spatial data it looks like the ecosystem has taken a bet on python. There are useful python libraries for all stages of a geoprocessing pipeline, from data handling (shapely, GDAL/ogr, pyproj, ...) to analysis (shapely, (geo)pandas, PySal, numpy/scipy, sklearn, etc) to plotting and visualisation (matplotlib, descartes, cartopy, pyQGIS). I will use Shapely for dealing with the geographic data, pyproj for projections and scipy for optimisation routines. On to the challenge.

The Data Science Venn Diagram — Drew Conway On Monday I—humbly—joined a group of NYC's most sophisticated thinkers on all things data for a half-day unconference to help O'Reily organize their upcoming Strata conference. The break out sessions were fantastic, and the number of people in each allowed for outstanding, expert driven, discussions. One of the best sessions I attended focused on issues related to teaching data science, which inevitably led to a discussion on the skills needed to be a fully competent data scientist. As I have said before, I think the term "data science" is a bit of a misnomer, but I was very hopeful after this discussion; mostly because of the utter lack of agreement on what a curriculum on this subject would look like. The difficulty in defining these skills is that the split between substance and methodology is ambiguous, and as such it is unclear how to distinguish among hackers, statisticians, subject matter experts, their overlaps and where data science fits.