Interactive Plotting in IPython Notebook (Part 1/2): Bokeh Summary In this post I will talk about interactive plotting packages that support the IPython Notebook and allow you to zoom, pan, resize, or even hover and get values off your plots directly from an IPython Notebook. This post will focus on Bokeh while the next post will be about Plotly. I will also provide some very rudimentary examples that should allow to get started straight away. Interactive Plots: +1 for convenience Anyone who’s delved into ‘exploratory’ data analysis requiring a depiction of their results would have inevitably come to the point where they would need to fiddle with plotting settings just to make the result legible (much more work required to make it attractive).
Is Big Data Still a Thing? (The 2016 Big Data Landscape) – Matt Turck In a tech startup industry that loves its shiny new objects, the term “Big Data” is in the unenviable position of sounding increasingly “3 years ago”. While Hadoop was created in 2006, interest in the concept of “Big Data” reached fever pitch sometime between 2011 and 2014. This was the period when, at least in the press and on industry panels, Big Data was the new “black”, “gold” or “oil”. However, at least in my conversations with people in the industry, there’s an increasing sense of having reached some kind of plateau. 2015 was probably the year when the cool kids in the data world (to the extent there is such a thing) moved on to obsessing over AI and its many related concepts and flavors: machine intelligence, deep learning, etc. Beyond semantics and the inevitable hype cycle, our fourth annual “Big Data Landscape” (scroll down) is a great opportunity to take a step back, reflect on what’s happened over the last year or so and ponder the future of this industry.
StatsModels: Statistics in Python — statsmodels 0.8.0 documentation statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.
Creating interactive crime maps with Folium You can see this Domino project here I get very excited about a nice map. But when it comes to creating maps in Python, I have struggled to find the right library in the ever changing jungle of Python libraries. After some research I discovered Folium, which makes it easy to create Leaflet maps in Python. This blog post outlines how I used Folium to visualize a data set about crime in San Francisco. Your First Machine Learning Project in Python Step-By-Step Do you want to do machine learning using Python, but you’re having trouble getting started? In this post you will complete your first machine learning project using Python. In this step-by-step tutorial you will: Download and install Python SciPy and get the most useful package for machine learning in Python.Load a dataset and understand it’s structure using statistical summaries and data visualization.Create 6 machine learning models, pick the best and build confidence that the accuracy is reliable.
Plotly : Python Reference scatter import plotly.graph_objs as gogo.Scatter A Scatter trace is a graph object with any of the named arguments or attributes listed below. The scatter trace type encompasses line charts, scatter charts, text charts, and bubble charts. Step by step Kaggle competition tutorial – Datanice Kaggle is a Data Science community where thousands of Data Scientists compete to solve complex data problems. In this article we are going to see how to go through a Kaggle competition step by step. The contest explored here is the San Francisco Crime Classification contest. List of Physical Visualizations This list currently has 254 entries. Recent additions: While data sculptures date back from the 1990s, the very first sculptures were Venus figurines: A Venus figurine is any Upper Paleolithic statuette portraying a woman with exaggerated physical features.
Visualizing Summer Travels - Geoff Boeing This is a series of posts about visualizing spatial data. I spent a couple of months traveling in Europe this summer and collected GPS location data throughout the trip with the OpenPaths app. I explored different web mapping technologies such as CartoDB, Leaflet, Mapbox, and Tilemill to plot my travels. Getting Started with Plotly for Python Plotly for Python can be configured to render locally inside Jupyter (IPython) notebooks, locally inside your web browser, or remotely in your online Plotly account. Remote hosting on Plotly is free for public use. For private use, view our paid plans. Offline Use Standalone HTML Offline mode will save an HTML file locally and open it inside your web browser. Advanced Jupyter Notebook Tricks — Part I - Data Science Blog by Domino by roos on November 3rd, 2015 I love Jupyter notebooks! They’re great for experimenting with new ideas or data sets, and although my notebook “playgrounds” start out as a mess, I use them to crystallize a clear idea for building my final projects. Jupyter is so great for interactive exploratory analysis that it’s easy to overlook some of its other powerful features and use cases.
28 Jupyter Notebook tips, tricks and shortcuts This post is based on a post that originally appeared on Alex Rogozhnikov’s blog, ‘Brilliantly Wrong’. We have expanded the post and will continue to do so over time - if you have a suggestion please let us know in the comments. Thanks to Alex for graciously letting us republish his work here. Operationalizing Spark Streaming (Part 1) Operationalizing Spark Streaming (Part 1) For those looking to run Spark Streaming in production, this two-part article contains tips and best practices collected from the front lines during a recent exercise in taking Spark Streaming to production. For my use case, Spark Streaming serves as the core processing engine for a new real time Lodging Market Intelligence system used across the Lodging Shopping stack on Expedia.com, Hotels.com and other brands. The system integrates with Kafka, S3, Aurora and Redshift and processes 500 msg/sec average with spikes up to 2000 msg/sec. The topics discussed are: Sections in Part 1
20 Big Data Repositories You Should Check Out Data Science Central 20 Big Data Repositories You Should Check Out by Mirko Krivanek Aug 4, 2015 This is an interesting listing created by Bernard Marr. I would add the following great sources: