background preloader

Bokeh Docs

Bokeh Docs
Related:  Dataviz - PythonbigdataJupyter

Interactive Plotting in IPython Notebook (Part 1/2): Bokeh | PyScience Summary In this post I will talk about interactive plotting packages that support the IPython Notebook and allow you to zoom, pan, resize, or even hover and get values off your plots directly from an IPython Notebook. This post will focus on Bokeh while the next post will be about Plotly. I will also provide some very rudimentary examples that should allow to get started straight away. Interactive Plots: +1 for convenience Anyone who’s delved into ‘exploratory’ data analysis requiring a depiction of their results would have inevitably come to the point where they would need to fiddle with plotting settings just to make the result legible (much more work required to make it attractive). Well, with interactive plotting the days of the static plot are dwindling. Bokeh Bokeh is a package by Continuum Analytics, authors of the Anaconda distribution of which I spoke in this previous post. Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Usage

List of Physical Visualizations In his experiments with visualizing data from Tripit to look back at his own (and other people's) travel, Cemre Güngör came up with a new system to create data sculptures from one's travels over time. This work is of a particular interest, in that it shows an excellent example of how a physical visualization design process unfolds, with many questions unique to physicalization add up to the challenge of designing an effective visual representation, i.e., contrasting materials and treatments […] In his experiments with visualizing data from Tripit to look back at his own (and other people's) travel, Cemre Güngör came up with a new system to create data sculptures from one's travels over time. Getting Started with Plotly for Python Plotly for Python can be configured to render locally inside Jupyter (IPython) notebooks, locally inside your web browser, or remotely in your online Plotly account. Remote hosting on Plotly is free for public use. For private use, view our paid plans. Offline Use Standalone HTML Offline mode will save an HTML file locally and open it inside your web browser. Learn more by calling help: import plotly help(plotly.offline.plot) Copy to clipboard! Inside Jupyter/IPython Notebooks Learn more about offline mode Hosting on Plotly Plotly provides a web-service for hosting graphs. In the terminal, copy and paste the following to install the Plotly library and set your user credentials. $ pip install plotly $ python -c "import plotly; plotly.tools.set_credentials_file(username='DemoAccount', api_key='lr1c37zw81')"Copy to clipboard! You'll need to replace 'DemoAccount' and 'lr1c37zw81' with your Plotly username and API key. Find my API key. To host your graph in your plotly account: In your Terminal, enter:

Creating interactive crime maps with Folium You can see this Domino project here I get very excited about a nice map. But when it comes to creating maps in Python, I have struggled to find the right library in the ever changing jungle of Python libraries. After some research I discovered Folium, which makes it easy to create Leaflet maps in Python. This blog post outlines how I used Folium to visualize a data set about crime in San Francisco. Folium Folium is a powerful Python library that helps you create several types of Leaflet maps. By default, Folium creates a map in a separate HTML file. Data For this example I needed some interesting data that contains locations. Script My Jupyter notebook contains only a few lines of code. When running this, it creates a map with location markers that are clustered (clustered_marker = True) if close together. You save a map as an html file by using map.create_map(path='map.html') instead of display(map) Choropleth map Well, that was fun! Wikipedia: Building a self-service reporting tool

Concevoir sa plateforme Big Data Introduction Les Entreprises évoluent dans un contexte économique difficile leur imposant de maximiser leurs profits et de réduire leurs dépenses. Elles ont besoin de cibler au mieux leur clientèle, de comprendre les canaux de distribution, de réussir à vendre leurs offres, ainsi que de satisfaire leurs actionnaires. Il est important de comprendre comment un DW traditionnel fonctionne. La question est toutefois de savoir si ces DW sont aptes à faire face au phénomène Big Data. Limitations des Data Warehouse traditionnels Les solutions traditionnelles de bases de données relationnelles ne sont pas forcément plus adaptées que les DW pour traiter la plupart des ensembles de données. De toute évidence, les environnements DW existants, qui ont été conçus il y a des déjà plusieurs décennies, n'ont pas la capacité de capturer et de traiter les nouveaux formats de données dans un temps de traitement acceptable. Les différentes approches pour construire une plateforme Big Data Conclusion

28 Jupyter Notebook tips, tricks and shortcuts Jupyter Notebook Jupyter notebook, formerly known as the IPython notebook, is a flexible tool that helps you create readable analyses, as you can keep code, images, comments, formulae and plots together. In this post, we’ve collected some of the top Jupyter notebook tips to quickly turn you into a Jupyter power user! (This post is based on a post that originally appeared on Alex Rogozhnikov’s blog, ‘Brilliantly Wrong’. Jupyter is quite extensible, supports many programming languages and is easily hosted on your computer or on almost any server — you only need to have ssh or http access. The Jupyter interface. Project Jupyter was born out of the IPython project as the project evolved to become a notebook that could support multiple languages – hence its historical name as the IPython notebook. When working with Python in Jupyter, the IPython kernel is used, which gives us some handy access to IPython features from within our Jupyter notebooks (more on that later!) 1. The command palette. 2.

Step by step Kaggle competition tutorial – Datanice Kaggle is a Data Science community where thousands of Data Scientists compete to solve complex data problems. In this article we are going to see how to go through a Kaggle competition step by step. The contest explored here is the San Francisco Crime Classification contest. Here, the objectives are fixed by Kaggle. For data exploration I like to use IPython Notebook which allows you to run your scripts line by line: import pandas as pd df = pd.read_csv('train.csv') len(df) #884262 df.head() We have 800k data points in our training set covering about ten years of crime. A simple pandas function which allows to find outliers in the data is : df.describe() Looking at this description we can think that we have some outliers in the data. plt.plot(df.X,df.Y,'o', markersize=7) Here we don’t have any missing data , but it’s very important to look for missing values in your data. empty = df.apply(lambda col: pd.isnull(col)) In the visualization below, every line represents a category of crime.

Mettre en place un projet Big Data en entreprise Le Big Data est une opportunité pour l’entreprise. En utilisant toutes les données issues de ses réseaux sociaux, de ses sites et de ses bases de données, l’entreprise peut améliorer sa connaissance des clients et des prospects. Elle peut aussi optimiser ses coûts ou innover. Mais pour mettre en place un projet Big Data, l’entreprise doit aussi repenser son fonctionnement, adopter des solutions techniques adaptées et être prête à suivre une nouvelle stratégie. Le Big Data : quels enjeux pour l’entreprise ? Le Big Data (ou Smart Data, Analytics…) désigne les données numériques qui circulent sur les réseaux sociaux et sur l’ensemble des supports web. Tous les secteurs sont concernés : le commerce et le e-commerce bien sûr, mais aussi la santé, les transports, les collectivités, le sport… Pour l’entreprise, les enjeux du Big Data sont notables : Le Big Data est pourtant encore peu mis à profit par les entreprises. Mettre en place une stratégie Big Data : étapes En savoir plus Forge of Empires

Data visualization with Seaborn Welcome back. Please sign in. Welcome back. {* #userInformationForm *} {* traditionalSignIn_emailAddress *} {* traditionalSignIn_password *} {* traditionalSignIn_signInButton *} {* /userInformationForm *} Please confirm the information below before signing in. {* #socialRegistrationForm *} {* socialRegistration_firstName *} {* socialRegistration_lastName *} {* socialRegistration_displayName *} {* socialRegistration_emailAddress *} {* providerName *} {* profileURL *} {* profilePreferredUsername *} {* profileIdentifier *} {* /socialRegistrationForm *} You're now signed in to O'Reilly.com. Please confirm the information below to create a new account. {* #registrationForm *} {* traditionalRegistration_firstName *} {* traditionalRegistration_lastName *} {* traditionalRegistration_displayName *} {* traditionalRegistration_emailAddress *} {* traditionalRegistration_password *} {* traditionalRegistration_passwordConfirm *} {* /registrationForm *} We'll send you a link to reset your password.

Visualizing Summer Travels - Geoff Boeing This is a series of posts about visualizing spatial data. I spent a couple of months traveling in Europe this summer and collected GPS location data throughout the trip with the OpenPaths app. I explored different web mapping technologies such as CartoDB, Leaflet, Mapbox, and Tilemill to plot my travels. Here is the series of posts: My Python code is available in this GitHub repo. This series serves as an introduction and tutorial for these various technologies and methods. Interactive maps Here are some brief highlights. I also visualized this spatial data as an interactive map using the Leaflet javascript library, and by rolling my own set of web map tiles then rendering them with Tilemill and Mapbox. Exploring the data set Beyond mapping my GPS location points, I also visualized other aspects of this data set. I also calculated the most isolated locations in my data set.

L’évolution des architectures décisionnelles avec Big Data Nous vivons une époque formidable. En revenant un peu sur l’histoire de l’informatique, on apprend que les capacités que cela soit de RAM, disque ou CPU sont de grands sponsors de la loi de Moore au sens commun du terme (« quelque chose » qui double tous les dix-huit mois). Ces efforts seraient vains si les prix ne suivaient pas le phénomène inverse (divisés par 200 000 en 30 ans pour le disque par exemple). Exposé comme cela, on se dit que nos envies ne peuvent connaitre de limite et qu’il suffit de changer la RAM, le disque ou le CPU pour prendre en charge l’explosion du volume de données à traiter qui globalement suit bien la loi de Moore aussi. Figure 1 Evolutions hardware, 2011, Alors où est le problème, qu’est qui fait que nos architectures décisionnelles aujourd’hui, non contentes de coûter de plus en plus chères, sont aussi en incapacité à se projeter sur des Tera ou des Peta de données. Architecture In Memory

Advanced Jupyter Notebook Tricks — Part I I love Jupyter notebooks! They’re great for experimenting with new ideas or data sets, and although my notebook “playgrounds” start out as a mess, I use them to crystallize a clear idea for building my final projects. Jupyter is so great for interactive exploratory analysis that it’s easy to overlook some of its other powerful features and use cases. I wanted to write a blog post on some of the lesser known ways of using Jupyter — but there are so many that I broke the post into two parts. In Part 1, today, I describe how to use Jupyter to create pipelines and reports. The accompanying project to this blog can be found here on the Domino data science platform. You probably know that you can start notebooks with different kernels (e.g., R, Julia) — not just Python. By running % lsmagic in a cell you get a list of all the available magics. Some of my favorites are: % env to list your environment variables.! You can even use magics to mix languages in a single notebook. Pipelines Related

How to Generate FiveThirtyEight Graphs in Python If you read data science articles, you may have already stumbled upon FiveThirtyEight's content. Naturally, you were impressed by their awesome visualizations. You wanted to make your own awesome visualizations and so asked Quora and Reddit how to do it. You received some answers, but they were rather vague. In this post, we'll help you. We'll start here: And, at the end of the tutorial, arrive here: To follow along, you'll need at least some basic knowledge of Python. Introducing the dataset We'll work with data describing the percentages of Bachelors conferred to women in the US from 1970 to 2011. If you want to follow along by writing code yourself, you can download the data from Randal's blog. Import the pandas module.Assign the direct link toward the dataset as a string to a variable named direct_link.Read in the data by using read_csv(), and assign the content to women_majors.Print information about the dataset by using the info() method. women_majors.tail() Source: Matplotlib.org

Operationalizing Spark Streaming (Part 1) | Expedia Engineering Blog Operationalizing Spark Streaming (Part 1) For those looking to run Spark Streaming in production, this two-part article contains tips and best practices collected from the front lines during a recent exercise in taking Spark Streaming to production. For my use case, Spark Streaming serves as the core processing engine for a new real time Lodging Market Intelligence system used across the Lodging Shopping stack on Expedia.com, Hotels.com and other brands. The system integrates with Kafka, S3, Aurora and Redshift and processes 500 msg/sec average with spikes up to 2000 msg/sec. The topics discussed are: Sections in Part 1 Spark Overview and Standalone ClustersDesign Patterns for PerformanceSpark Cluster and App Stability Sections in Part 2 (coming soon) Direct Kafka IntegrationGuaranteed message processing (at-least-once semantics, strong and weak microbatch transactionality)Operational monitoring So, let’s begin with Part 1 Spark Overview and Standalone Clusters Design Patterns for Performance

Related: