background preloader

Welcome to Bokeh — Bokeh 0.12.2 documentation

Welcome to Bokeh — Bokeh 0.12.2 documentation
Related:  Dataviz - PythonbigdataJupyter

StatsModels: Statistics in Python — statsmodels 0.8.0 documentation statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org. Since version 0.5.0 of statsmodels, you can use R-style formulas together with pandas data frames to fit your models. You can also use numpy arrays instead of formulas: Have a look at dir(results) to see available results.

Interactive Plotting in IPython Notebook (Part 1/2): Bokeh | PyScience Summary In this post I will talk about interactive plotting packages that support the IPython Notebook and allow you to zoom, pan, resize, or even hover and get values off your plots directly from an IPython Notebook. This post will focus on Bokeh while the next post will be about Plotly. I will also provide some very rudimentary examples that should allow to get started straight away. Interactive Plots: +1 for convenience Anyone who’s delved into ‘exploratory’ data analysis requiring a depiction of their results would have inevitably come to the point where they would need to fiddle with plotting settings just to make the result legible (much more work required to make it attractive). Well, with interactive plotting the days of the static plot are dwindling. Bokeh Bokeh is a package by Continuum Analytics, authors of the Anaconda distribution of which I spoke in this previous post. Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Usage

List of Physical Visualizations In his experiments with visualizing data from Tripit to look back at his own (and other people's) travel, Cemre Güngör came up with a new system to create data sculptures from one's travels over time. This work is of a particular interest, in that it shows an excellent example of how a physical visualization design process unfolds, with many questions unique to physicalization add up to the challenge of designing an effective visual representation, i.e., contrasting materials and treatments […] In his experiments with visualizing data from Tripit to look back at his own (and other people's) travel, Cemre Güngör came up with a new system to create data sculptures from one's travels over time. BitTorrent visualization in processing.js You can use the 's' and 'p' keys to add new seeds and peers to the swarm. Hitting 'r' will remove one of the seeds or peers at random, but it takes a few seconds before it activates, just like the original. This visualization originally started off at aphid.org but that site doesn't seem to be up anymore. Jeff Atwood updated it to work with more recent versions of Processing (notably the 1.0 release which is theoretically API-stable and compatible with future versions). Works in pretty much any modern browser except IE (because IE doesn't have support for the <canvas> tag). Changes, if you're curious: processing.js was missing ArrayList.contains() so I implemented it processing.js didn't implement screenX() or screenY() so I had to calculate things manually with cos() and sin()

Getting Started with Plotly for Python Plotly for Python can be configured to render locally inside Jupyter (IPython) notebooks, locally inside your web browser, or remotely in your online Plotly account. Remote hosting on Plotly is free for public use. For private use, view our paid plans. Offline Use Standalone HTML Offline mode will save an HTML file locally and open it inside your web browser. Learn more by calling help: import plotly help(plotly.offline.plot) Copy to clipboard! Inside Jupyter/IPython Notebooks Learn more about offline mode Hosting on Plotly Plotly provides a web-service for hosting graphs. In the terminal, copy and paste the following to install the Plotly library and set your user credentials. $ pip install plotly $ python -c "import plotly; plotly.tools.set_credentials_file(username='DemoAccount', api_key='lr1c37zw81')"Copy to clipboard! You'll need to replace 'DemoAccount' and 'lr1c37zw81' with your Plotly username and API key. Find my API key. To host your graph in your plotly account: In your Terminal, enter:

Numpy and Scipy Documentation Creating interactive crime maps with Folium You can see this Domino project here I get very excited about a nice map. But when it comes to creating maps in Python, I have struggled to find the right library in the ever changing jungle of Python libraries. After some research I discovered Folium, which makes it easy to create Leaflet maps in Python. This blog post outlines how I used Folium to visualize a data set about crime in San Francisco. Folium Folium is a powerful Python library that helps you create several types of Leaflet maps. By default, Folium creates a map in a separate HTML file. Data For this example I needed some interesting data that contains locations. Script My Jupyter notebook contains only a few lines of code. When running this, it creates a map with location markers that are clustered (clustered_marker = True) if close together. You save a map as an html file by using map.create_map(path='map.html') instead of display(map) Choropleth map Well, that was fun! Wikipedia: Building a self-service reporting tool

Concevoir sa plateforme Big Data Introduction Les Entreprises évoluent dans un contexte économique difficile leur imposant de maximiser leurs profits et de réduire leurs dépenses. Elles ont besoin de cibler au mieux leur clientèle, de comprendre les canaux de distribution, de réussir à vendre leurs offres, ainsi que de satisfaire leurs actionnaires. Il est important de comprendre comment un DW traditionnel fonctionne. La question est toutefois de savoir si ces DW sont aptes à faire face au phénomène Big Data. Limitations des Data Warehouse traditionnels Les solutions traditionnelles de bases de données relationnelles ne sont pas forcément plus adaptées que les DW pour traiter la plupart des ensembles de données. De toute évidence, les environnements DW existants, qui ont été conçus il y a des déjà plusieurs décennies, n'ont pas la capacité de capturer et de traiter les nouveaux formats de données dans un temps de traitement acceptable. Les différentes approches pour construire une plateforme Big Data Conclusion

Recent Changes Map 28 Jupyter Notebook tips, tricks and shortcuts Jupyter Notebook Jupyter notebook, formerly known as the IPython notebook, is a flexible tool that helps you create readable analyses, as you can keep code, images, comments, formulae and plots together. In this post, we’ve collected some of the top Jupyter notebook tips to quickly turn you into a Jupyter power user! (This post is based on a post that originally appeared on Alex Rogozhnikov’s blog, ‘Brilliantly Wrong’. Jupyter is quite extensible, supports many programming languages and is easily hosted on your computer or on almost any server — you only need to have ssh or http access. The Jupyter interface. Project Jupyter was born out of the IPython project as the project evolved to become a notebook that could support multiple languages – hence its historical name as the IPython notebook. When working with Python in Jupyter, the IPython kernel is used, which gives us some handy access to IPython features from within our Jupyter notebooks (more on that later!) 1. The command palette. 2.

SymPy Step by step Kaggle competition tutorial – Datanice Kaggle is a Data Science community where thousands of Data Scientists compete to solve complex data problems. In this article we are going to see how to go through a Kaggle competition step by step. The contest explored here is the San Francisco Crime Classification contest. Here, the objectives are fixed by Kaggle. For data exploration I like to use IPython Notebook which allows you to run your scripts line by line: import pandas as pd df = pd.read_csv('train.csv') len(df) #884262 df.head() We have 800k data points in our training set covering about ten years of crime. A simple pandas function which allows to find outliers in the data is : df.describe() Looking at this description we can think that we have some outliers in the data. plt.plot(df.X,df.Y,'o', markersize=7) Here we don’t have any missing data , but it’s very important to look for missing values in your data. empty = df.apply(lambda col: pd.isnull(col)) In the visualization below, every line represents a category of crime.

Mettre en place un projet Big Data en entreprise Le Big Data est une opportunité pour l’entreprise. En utilisant toutes les données issues de ses réseaux sociaux, de ses sites et de ses bases de données, l’entreprise peut améliorer sa connaissance des clients et des prospects. Elle peut aussi optimiser ses coûts ou innover. Mais pour mettre en place un projet Big Data, l’entreprise doit aussi repenser son fonctionnement, adopter des solutions techniques adaptées et être prête à suivre une nouvelle stratégie. Le Big Data : quels enjeux pour l’entreprise ? Le Big Data (ou Smart Data, Analytics…) désigne les données numériques qui circulent sur les réseaux sociaux et sur l’ensemble des supports web. Tous les secteurs sont concernés : le commerce et le e-commerce bien sûr, mais aussi la santé, les transports, les collectivités, le sport… Pour l’entreprise, les enjeux du Big Data sont notables : Le Big Data est pourtant encore peu mis à profit par les entreprises. Mettre en place une stratégie Big Data : étapes En savoir plus Forge of Empires

12 Facts you didn't know about computers 12 Things You Probably Didn’t Know About Computers. They book our hotels. They buy our books. They bring us the latest TV. 01 Over 6,000 new computer viruses are released every month. 02 The first computer mouse-waaaay back in 1964- was designed by a chap called Doug Engelbert, and constructed out of wood. 03 The average human being blinks around 20 times in a minute. 04 The very first electro-mechanical computer was called the Z2, and was developed back in 1939. 05 It’s estimated that by the end of 2012, there will be a rather ridiculous 17 billion devices that connect to the internet. 06 The internet might be a veritable feast of information, but in actuality five in six online pages are focused with, er, adults in mind. 07 Over one million domain names per month are registered online. 08 Facebook boasts over 800 million users. 11 Organised crime syndicates are responsible for 20.70% of online viruses. 12 The original engineers who designed the IBM PB were nicknamed ‘The Dirty Dozen’.

Related: