background preloader

Data science blogs

Facebook Twitter

Brain Inspired - Podcast Addict. Cloudera Fast Forward Blog. Simply Statistics. Dr. Juan Camilo Orduz.


Linguistics and Data Science. Transmediale 2019 - Unboxing Social Data Algorithms - Blog - GameAnalytics. Home. Chris Albon - Data Science, Machine Learning, and Artificial Intelligence. Documenting my path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist" Intro to pandas data structures. A while back I claimed I was going to write a couple of posts on translating pandas to SQL.

Intro to pandas data structures

I never followed up. However, the other week a couple of coworkers expressed their interest in learning a bit more about it - this seemed like a good reason to revisit the topic. What follows is a fairly thorough introduction to the library. I chose to break it into three parts as I felt it was too long and daunting as one. Part 1: Intro to pandas data structures, covers the basics of the library's two main data structures - Series and DataFrames.Part 2: Working with DataFrames, dives a bit deeper into the functionality of DataFrames. If you'd like to follow along, you can find the necessary CSV files here and the MovieLens dataset here. My goal for this tutorial is to teach the basics of pandas by comparing and contrasting its syntax with SQL. Will Stanton's Data Science Blog – Data analysis tutorial - DatasFrame. This is part one in a multipart series on writing idiomatic pandas code.


This post is available as a Jupyter notebook There are many great resources for learning pandas. For beginners, I typically recommend Greg Reda's 3-part introduction, especially if you're familiar with SQL. Of course, there's the pandas documentation itself. I gave a talk at PyData Seattle targeted as an introduction if you prefer video form.

With all those resources (and many more that I've slighted through omission), why write another? We'll be working with flight delay data from the BTS (R users can install Hadley's NYCFlights13 dataset for similar data). You can download the full notebook backing this post here. import zipfile import requestsimport numpy as npimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as plt That download returned a ZIP file. 5 rows × 37 columns Indexing Or, explicit is better than implicit. By indexing, we mean the selection of subsets of a DataFrame or Series.. SettingWithCopy. Aonghus' Blog. I recently came across this little data challenge, which was posted by Zalando (one of the top fashion retailers in Europe) as a teaser for data scientists/analysts.

Aonghus' Blog

The challenge is quite straightforward and is a good opportunity to show how to deal with this kind of analysis using the standard tools of python and the interactive notebook. For data analysis, the community is in two minds between between python and R, but for spatial data it looks like the ecosystem has taken a bet on python. There are useful python libraries for all stages of a geoprocessing pipeline, from data handling (shapely, GDAL/ogr, pyproj, ...) to analysis (shapely, (geo)pandas, PySal, numpy/scipy, sklearn, etc) to plotting and visualisation (matplotlib, descartes, cartopy, pyQGIS). I will use Shapely for dealing with the geographic data, pyproj for projections and scipy for optimisation routines. On to the challenge.

Problem Statement¶ Life Is Study: Python for Data Analysis Part 1: Setup. The end of the world has long been the domain priests and poets, but if modern media has taught us anything, it’s that doomsday could be just around the corner.

Life Is Study: Python for Data Analysis Part 1: Setup

Whether you fear rogue meteors, climate change or beasts from the center of the earth, it’s no small miracle that we’ve made it this far. If tool making is what separates us from the animals, making machines capable of deflecting comets, flying to Mars and perhaps even battling toe to toe with Kaiju is what will separate us from a species that goes extinct in the blink of the cosmic eye.

Then again, what if our trusty tools are the root of our demise? Artificial intelligence has been among the most common threats to earth’s existence on the silver screen since Arnold Schwarzenegger’s first appeared as living flesh over a metal endoskeleton. Arguably the two most influential sci-fi films of the past 30 years—Terminator 2: Judgement Day and The Matrix—both feature man’s struggle for survival against intelligent machines.