background preloader

Python Pandas: Tricks & Features You May Not Know

Python Pandas: Tricks & Features You May Not Know
Pandas is a foundational library for analytics, data processing, and data science. It’s a huge project with tons of optionality and depth. This tutorial will cover some lesser-used but idiomatic Pandas capabilities that lend your code better readability, versatility, and speed, à la the Buzzfeed listicle. If you feel comfortable with the core concepts of Python’s Pandas library, hopefully you’ll find a trick or two in this article that you haven’t stumbled across previously. Note: The examples in this article are tested with Pandas version 0.23.2 and Python 3.6.6. 1. You may have run across Pandas’ rich options and settings system before. It’s a huge productivity saver to set customized Pandas options at interpreter startup, especially if you work in a scripting environment. The options use a dot notation such as pd.set_option('display.max_colwidth', 25), which lends itself well to a nested dictionary of options: >>> pd. >>> url = (' 2. 3. >>> pd.Series. Related:  PythonPython Stackpandas

Interactive Data Visualization in Python With Bokeh Bokeh prides itself on being a library for interactive data visualization. Unlike popular counterparts in the Python visualization space, like Matplotlib and Seaborn, Bokeh renders its graphics using HTML and JavaScript. This makes it a great candidate for building web-based dashboards and applications. However, it’s an equally powerful tool for exploring and understanding your data or creating beautiful custom charts for a project or report. Using a number of examples on a real-world dataset, the goal of this tutorial is to get you up and running with Bokeh. You’ll learn how to: Transform your data into visualizations, using BokehCustomize and organize your visualizations Add interactivity to your visualizations So let’s jump in. From Data to Visualization Building a visualization with Bokeh involves the following steps: Let’s explore each step in more detail. Prepare the Data Any good data visualization starts with—you guessed it—data. Determine Where the Visualization Will Be Rendered

101 NumPy Exercises for Data Analysis (Python) - Machine Learning Plus The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest. If you want a quick refresher on numpy, the numpy basics and the advanced numpy tutorials might be what you are looking for. 1. Difficulty Level: L1 Q. Show Solution import numpy as np print(np. You must import numpy as np for the rest of the codes in this exercise to work. To install numpy its recommended to use the installation provided by anaconda. 2. Q. Desired output: #> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) arr = np.arange(10) arr #> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 3. Q. np.full((3, 3), True, dtype=bool) #> array([[ True, True, True], #> [ True, True, True], #> [ True, True, True]], dtype=bool) # Alternate method: np.ones((3,3), dtype=bool) 4. Q. Input: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])` #> array([1, 3, 5, 7, 9]) 5. Q. arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Pandas Tricks - Combine Data In Different Ways | CODE FORESTS Introduction If you have used pandas for your data analysis work, you may already get some idea on how powerful and flexible it is in terms of data processing. Many times there are more than one way to solve your problem, and choosing the best approach become another tough decision. For instance, in one of my previous article, I tried to summarize the 20 ways to filter records in pandas which definitely is not a complete list for all the possible solutions. In this article, I will be discussing about the different ways to merge/combine data in pandas and when you shall use them since combining data probably is one of the necessary step you shall perform before starting your data analysis. Prerequisites If you have not yet installed pandas, you may use the below command to install it from PyPI: And import the module at the beginning of your code: Let’s dive into the code examples. Combine Data with Append vs Concat df1.append(df2, ignore_index=True) You would see the output as per below:

Beginner's Guide: creating clean Python development environments · Tjelvar Olsson 09 May 2015 Introduction Code interacts with its environment. For example, you can only run a Python script if you have Python installed on the system. It therefore becomes important for you as a developer / computational scientist to understand and control the environment in which your code operates. In this post I will illustrate a work flow for creating clean Python development environments. Example: developing a Python package In the previous post I illustrated how you could use a static code generator (cookiecutter) to create a basic template to develop a Python package. Now suppose that we wanted to develop a Python package named “awesome”. $ cookiecutter gh:tjelvar-olsson/cookiecutter-pypackage Cloning into 'cookiecutter-pypackage'... remote: Counting objects: 48, done. remote: Compressing objects: 100% (37/37), done. remote: Total 48 (delta 13), reused 37 (delta 8), pack-reused 0 Unpacking objects: 100% (48/48), done. This creates the directory awesome. That’s not very good!

3 Awesome Visualization Techniques for every dataset Visualizations are awesome. However, a good visualization is annoyingly hard to make. Moreover, it takes time and effort when it comes to present these visualizations to a bigger audience. We all know how to make Bar-Plots, Scatter Plots, and Histograms, yet we don’t pay much attention to beautify them. This hurts us - our credibility with peers and managers. Also, I find it essential to reuse my code. In this post, I am also going to talk about 3 cool visual tools: Categorical Correlation with Graphs,Pairplots,Swarmplots and Graph Annotations using Seaborn. In short, this post is about useful and presentable graphs. I will be using data from FIFA 19 complete player dataset on kaggle - Detailed attributes for every player registered in the latest edition of FIFA 19 database. Since the Dataset has many columns, we will only focus on a subset of categorical and continuous columns. Categorical Correlation with Graphs: In Simple terms, Correlation is a measure of how two variables move together. 1.

Merge, join, concatenate and compare — pandas 1.2.4 documentation pandas provides various facilities for easily combining together Series or DataFrame with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations. In addition, pandas also provides utilities to compare two Series or DataFrame and summarize their differences. Concatenating objects The concat() function (in the main pandas namespace) does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes. Before diving into all of the details of concat and what it can do, here is a simple example: Like its sibling function on ndarrays, numpy.concatenate, pandas.concat takes a list or dict of homogeneously-typed objects and concatenates them with some configurable handling of “what to do with the other axes”: Without a little bit of context many of these arguments don’t make much sense. Note Warning Merge dtypes

Data Science with Python in Visual Studio Code – Python at Microsoft This post was written by Rong Lu, a Principal Program Manager working on Data Science tools for Visual Studio Code Today we’re very excited to announce the availability of Data Science features in the Python extension for Visual Studio Code! With the addition of these features, you can now work with data interactively in Visual Studio Code, whether it is for exploring data or for incorporating machine learning models into applications, making Visual Studio Code an exciting new option for those who prefer an editor for data science tasks. These features as currently shipping as experimental. Exploring data and experimenting with ideas in Visual Studio Code. Now, let’s take a closer look at how Visual Studio Code works in these two scenarios. Exploring data and experimenting with ideas in Visual Studio Code Above is an example of a Python file that simply loads data from a csv file and generates a plot that outlines the correlation between data columns. A few things to note: Try it out today

In Python NumPy what is a dimension and axis? 40 Examples to Master Pandas. A comprehensive practical guide | by Soner Yıldırım Pandas is one of the most widely-used data analysis and manipulation libraries. It provides numerous functions and methods to clean, process, manipulate, and analyze data. The best way to get comfortable working with Pandas is through practice. I previously wrote a practical guide that contains 30 examples. In this article, I will enrich the examples to cover a broader scope together with the previous article. 40 examples in this article will include not only the basic functions and techniques but also some extreme cases. Most of the examples include the functions and methods that were not discussed in the previous article. We will be using a marketing and a grocery data set to do the examples.

Setting Up Sublime Text 3 for Full Stack Python Development Sublime Text 3 (ST3) is a lightweight, cross-platform code editor known for its speed, ease of use, and strong community support. It’s an incredible editor right out of the box, but the real power comes from the ability to enhance its functionality using Package Control and creating custom settings. In this article, we’ll look at how to setup Sublime Text for full stack Python development (from front to back), enhance the basic functionality with custom themes and packages, and use many of the commands, features, and keyword shortcuts that make ST3 so powerful. Note: This tutorial assumes you’re using a Mac and are comfortable with the terminal. If you’re using Windows or Linux, many of the commands will vary, but you should be able to use Google to find the answers quickly given the info in this tutorial. Before we start, let’s address what I mean exactly by “full stack.” In today’s world of HTML5 and mobile development, JavaScript is literally everywhere. Features Then repeat step one. <!

A Visual Intro to NumPy and Data Representation – Jay Alammar – Visualizing machine learning one concept at a time The NumPy package is the workhorse of data analysis, machine learning, and scientific computing in the python ecosystem. It vastly simplifies manipulating and crunching vectors and matrices. Some of python’s leading package rely on NumPy as a fundamental piece of their infrastructure (examples include scikit-learn, SciPy, pandas, and tensorflow). Beyond the ability to slice and dice numeric data, mastering numpy will give you an edge when dealing and debugging with advanced usecases in these libraries. In this post, we’ll look at some of the main ways to use NumPy and how it can represent different types of data (tables, images, text…etc) before we an serve them to machine learning models. Creating Arrays We can create a NumPy array (a.k.a. the mighty ndarray) by passing a python list to it and using ` np.array()`. There are often cases when we want NumPy to initialize the values of the array for us. Once we’ve created our arrays, we can start to manipulate them in interesting ways. Images

Related: