background preloader

Data Science

Facebook Twitter

Intro to Data Structures — pandas 0.17.1 documentation. We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started.

Intro to Data Structures — pandas 0.17.1 documentation

The fundamental behavior about data types, indexing, and axis labeling / alignment apply across all of the objects. To get started, import numpy and load pandas into your namespace: Grouping & Summarizing Data in R. Tools for making latex tables in R. Simple But Powerful Excel Tricks for Analyzing Data. Introduction I’ve always admired the immense power of Excel. This software is not only capable of doing basic data computations, but you can also perform data analysis using it. It is widely used for many purposes including the likes of financial modeling and business planning.

It can become a good stepping stone for people who are new to the world of data analysis. Even before learning R or Python, it is advisable to have knowledge of Excel. It has a few drawbacks as well. I feel fortunate that my journey started with Excel. Note: If you think you are a master coder in data science, you won’t find this article useful. Commonly used functions 1. Syntax: =VLOOKUP(Key to lookup, Source_table, column of source table, are you ok with relative match?) For above problem, we can write formula in cell “F4” as =VLOOKUP(B4, $H$4:$L$15, 5, 0) and this will return the city name for all the Customer id 1 and post that copy this formula for all Customer ids. 2. 3.

Cheatsheet - 11 Steps for Data Exploration in R (with codes) The Mod Function. What has modular arithmetic got to do with the real world?

The Mod Function

The answer any experienced programmer should give you is "a lot". Not only is it the basis for many an algorithm, it is part of the hardware. Many programmers are puzzled by the mod, short for modulo, and integer division functions/operators found in nearly all languages. Modular arithmetic used to be something that every programmer encountered because it is part of the hardware of every machine. You find it in the way numbers are represented in binary and in machine code or assembly language instructions. Once you get away from the representation of numbers as bit strings and arithmetic via registers then many mod and remainder operations lose their immediate meaning so familiar to assembly language programmers. Look for These 7 Characteristics Before Hiring a Data Scientist. Data is being collected in droves, but most of the time, people don’t know what to do with it.

Look for These 7 Characteristics Before Hiring a Data Scientist

That’s why data scientists are hot commodities in the startup world right now. In fact, between 2003 and 2013, employment in data industries grew about 21 percent -- nearly 16 percent more than overall employment growth. It’s a fairly new concept, but these people are so valuable because they understand the significance of data for your business and how you can use it. Using analytics, firms can discover patterns and stories in data, build the infrastructure needed to properly collect and store it, inform business decisions and guide strategy.

Access to sufficient and robust data is vital to sustained startup growth. CheatSheet on Data Exploration using Pandas in Python. If some one would ask me to mention 2 most important libraries in Python for data science, I’ll probably name “pandas” and “scikit-learn”.

CheatSheet on Data Exploration using Pandas in Python

Pandas for the capability to read datasets in DataFrames, exploring and making them ready for modeling / machine learning and Scikit-learn for actually learning from these features created in Pandas. While there are quite a few cheat sheets to summarize what scikit-learn brings to the table, there isn’t one I have come across for Pandas. Hence, we thought of creating a cheat sheet for common data exploration operations in Python using Pandas.

If you think we have missed any thing in the cheat sheet, please feel free to mention it in comments. The PDF version of the sheet can be downloaded from here (so that you can copy paste codes) You can keep this cheat sheet handy while performing data exploration. If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page. Running Randomized Evaluations: A Practical Guide. From Deconstruction to Big Data: How Technology is Reshaping the Corporation. Evans affirms that we are undergoing a re-acceleration of technological change despite the global recession and that something sudden and dramatic is happening.

From Deconstruction to Big Data: How Technology is Reshaping the Corporation

One important aspect of this is how Big Data is reshaping business, and transforming internal organization and industry architecture. He goes on to explain that two information technology drivers are reshaping internal organization: business strategy and the structures of industries. The first is deconstruction of value chains: the breakup of vertically-integrated businesses, as standards and interoperability replace managed interfaces. And the second is polarization of the economies of mass, meaning that in some activities, economies of scale and experience are evaporating, while in others they are intensifying. He doesn’t consider Big Data as an isolated or unique phenomenon, but rather as an example of a wider and deeper set of trends reshaping the business world.

Introduction. Base de données Une base de données informatique est un ensemble de données qui ont été stockées sur un support informatique, et organisées et structurées de manière à pouvoir facilement consulter et modifier leur contenu.


Prenons l'exemple d'un site web avec un système de news et de membres. On va utiliser une base de données MySQL pour stocker toutes les données du site : les news (avec la date de publication, le titre, le contenu, éventuellement l'auteur,…) et les membres (leurs noms, leurs emails,…). Tout ceci va constituer notre base de données pour le site. Mais il ne suffit pas que la base de données existe. Une base de données seule ne suffit donc pas, il est nécessaire d'avoir également : un système permettant de gérer cette base ;un langage pour transmettre des instructions à la base de données (par l'intermédiaire du système de gestion). Le paradigme client - serveur La plupart des SGBD sont basés sur un modèle Client - Serveur. 7 Steps of Data Exploration & Preparation before model build - Part 1.

The Hidden World of Facebook "Like Farms" Facebook has become the advertising outlet of choice for many of the world’s businesses and companies.

The Hidden World of Facebook "Like Farms"

Whenever there is a new product to test, a service to announce or event to promote, many organisations turn to Facebook to post news of the development. To enable this, Facebook allows users to create pages devoted to specific topics. Visitors can then “like” the page and then receive updates about the topic as well as connect with others with similar interest.

The number of likes is therefore an important measure of the popularity of the page and there is considerable prestige in having many likes. That is handy for Facebook which allows businesses to promote their pages using adverts targeted at certain groups of users who may be interested in the content. However there is another way to promote Facebook pages. Their approach is relatively straightforward. Welcome · Advanced R.