background preloader

Python Analytics & Visualization

Facebook Twitter

Petl - Extract, Transform and Load (Tables of Data) — petl 0.17-SNAPSHOT documentation. Petl is a Python package for extracting, transforming and loading tables of data.

petl - Extract, Transform and Load (Tables of Data) — petl 0.17-SNAPSHOT documentation

Introduction¶ Installation¶ This module is available from the Python Package Index. On Linux distributions you should be able to do easy_install petl or pip install petl. On other platforms you can download manually, extract and run install. Dependencies and extensions¶ This package has been written with no dependencies other than the Python core modules, for ease of installation and maintenance. Conventions - row containers and row iterators¶ This package defines the following convention for objects acting as containers of tabular data and supporting row-oriented iteration over the data. A row container (also referred to here informally as a table) is any object which satisfies the following: implements the __iter__ method__iter__ returns a row iterator (see below)all row iterators returned by __iter__ are independent, i.e., consuming items from one iterator will not affect any other iterators.

Case Study 1 - Comparing Tables — petl 0.17-SNAPSHOT documentation. This case study illustrates some of the petl functions available for doing some simple profiling and comparison of data from two tables.

Case Study 1 - Comparing Tables — petl 0.17-SNAPSHOT documentation

Introduction¶ The files used in this case study can be downloaded from the following link: Download and unzip the files: $ wget $ unzip The first file (snpdata.csv) contains a list of locations in the genome of the malaria parasite P. falciparum, along with some basic data about genetic variations found at those locations. The second file (popdata.csv) is supposed to contain the same list of genome locations, along with some additional data such as allele frequencies in different populations. The main point for this case study is that the first file (snpdata.csv) contains the canonical list of genome locations, and the second file (popdata.csv) contains some additional data about the same genome locations and therefore should be consistent with the first file.

Preparing the data¶ Inspect the data: Statistical Data Analysis in Python, SciPy2013 Tutorial, Part 1 of 4. Scikit-learn. "We use scikit-learn to support leading-edge basic research [...]


" "I think it's the most well-designed ML package I've seen so far. " "scikit-learn's ease-of-use, performance and overall variety of algorithms implemented has proved invaluable [...]. " "For these tasks, we relied on the excellent scikit-learn package for Python. " "The great benefit of scikit-learn is its fast learning curve [...] " "It allows us to do AWesome stuff we would not otherwise accomplish" "scikit-learn makes doing advanced analysis in Python accessible to anyone.

" Python data tools just keep getting better. Here are a few observations inspired by conversations I had during the just concluded PyData conference 1 .

Python data tools just keep getting better

The Python data community is well-organized: Besides conferences ( PyData , SciPy, EuroSciPy ), there is a new non-profit ( NumFOCUS ) dedicated to supporting scientific computing and data analytics projects. The list of supported projects are currently Python-based, but in principle NumFOCUS is an entity that can be used to support related efforts from other communities. It’s getting easier to use the Python data stack: There are tools that facilitate the dissemination and sharing of code and programming environments. IPython 2 notebooks allow Python code and markup in the same document. Machine Vision made Easy - SimpleCV. Blaze - Continuum Analytics. Pyplot tutorial. Matplotlib.pyplot is a collection of command style functions that make matplotlib work like MATLAB.

Pyplot tutorial

Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. In matplotlib.pyplot various states are preserved across function calls, so that it keeps track of things like the current figure and plotting area, and the plotting functions are directed to the current axes (please note that “axes” here and in most places in the documentation refers to the axespart of a figure and not the strict mathematical term for more than one axis). import matplotlib.pyplot as pltplt.plot([1,2,3,4])plt.ylabel('some numbers') (Source code, png, hires.png, pdf) Neuroimaging in Python — NIPY Documentation.