background preloader

Simply Statistics

Related:  Data science blogs

Intro to pandas data structures A while back I claimed I was going to write a couple of posts on translating pandas to SQL. I never followed up. However, the other week a couple of coworkers expressed their interest in learning a bit more about it - this seemed like a good reason to revisit the topic. What follows is a fairly thorough introduction to the library. Violeta Migallón Con ayuda de los diagramas de Venn podemos dar los primeros pasos para la comprensión del cálculo de probabilidades de distintos sucesos de un espacio muestral. El siguiente geogebra se ha realizado con dicho propósito. En él trabajaremos en términos de porcentajes y en caso de querer calcular probabilidades sólo habrá que dividir entre cien los resultados obtenidos. Para trabajar estos conceptos se puede proponer un ejercicio similar al siguiente. En una ciudad se publican 3 revistas sobre tecnología y videojuegos A, B y C. Mediante una encuesta se estima que el 30% lee la revista A el 20% la revista B, el 15% lee la C, el 10% lee A y B, el 6% lee A y C, el 5% lee B y C, y el 3% lee las tres revistas.

DatasFrame This is part one in a multipart series on writing idiomatic pandas code. This post is available as a Jupyter notebook There are many great resources for learning pandas. For beginners, I typically recommend Greg Reda's 3-part introduction, especially if you're familiar with SQL. Of course, there's the pandas documentation itself. I gave a talk at PyData Seattle targeted as an introduction if you prefer video form.

Life Is Study: Python for Data Analysis Part 1: Setup The end of the world has long been the domain priests and poets, but if modern media has taught us anything, it’s that doomsday could be just around the corner. Whether you fear rogue meteors, climate change or beasts from the center of the earth, it’s no small miracle that we’ve made it this far. If tool making is what separates us from the animals, making machines capable of deflecting comets, flying to Mars and perhaps even battling toe to toe with Kaiju is what will separate us from a species that goes extinct in the blink of the cosmic eye.

Aonghus' Blog I recently came across this little data challenge, which was posted by Zalando (one of the top fashion retailers in Europe) as a teaser for data scientists/analysts. The challenge is quite straightforward and is a good opportunity to show how to deal with this kind of analysis using the standard tools of python and the interactive notebook. For data analysis, the community is in two minds between between python and R, but for spatial data it looks like the ecosystem has taken a bet on python. There are useful python libraries for all stages of a geoprocessing pipeline, from data handling (shapely, GDAL/ogr, pyproj, ...) to analysis (shapely, (geo)pandas, PySal, numpy/scipy, sklearn, etc) to plotting and visualisation (matplotlib, descartes, cartopy, pyQGIS). I will use Shapely for dealing with the geographic data, pyproj for projections and scipy for optimisation routines. On to the challenge.

Linguistics and Data Science Home Will Stanton's Data Science Blog –