background preloader

Finding Data on the Internet

Finding Data on the Internet
Skip to Content A Community Site for R – Sponsored by Revolution Analytics Home » How to » Finding Data on the Internet Finding Data on the Internet By RevoJoe on October 6, 2011 The following list of data sources has been modified as of 3/18/14. If an (R) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. Economics American Economic Ass. Data Science Practice This section contains data sets used in the book "Doing Data Science" by Rachel Schutt and Cathy O'Neil (O'Reilly 2014) Datasets on the book site: Enron Email Dataset: GetGlue (time stamped events: users rating TV shows): Titanic Survival Data Set: Half a million Hubway rides: Finance CBOE Futures Exchange: Google Finance: (R) Google Trends: St Louis Fed: (R) NASDAQ: OANDA: (R) Quandl: Yahoo Finance: (R) Government Health Care Gapminder: Machine Learning Networks Stanford Large Network Dataset Collection: Public Domain Collections Science Time Series

http://www.inside-r.org/howto/finding-data-internet

Related:  Big Data / AnalyticsEstadistica

Publicly Available Big Data Sets Public Data sets on Amazon AWS Amazon provides following data sets : ENSEMBL Annotated Gnome data, US Census data, UniGene, Freebase dump Data transfer is 'free' within Amazon eco system (within the same zone) AWS data sets InfoChimps InfoChimps has data marketplace with a wide variety of data sets. InfoChimps market place Comprehensive Knowledge Archive Network open source data portal platform data sets available on datahub.io from ckan.org Stanford network data collection Open Flights Crowd sourced flight data Flight arrival data

Journal of Statistics Education (JSE) Home Page Current Issue The November 2014 (Volume 22, Number 3) issue of JSE is now available. The table of contents can be accessed at: 2014 Table of Contents. This issue includes six regular articles, two Research on K-12 Statistics Education articles, two Teaching Bits, and an interview by Allan Rossman with Josh Tabor. As we normally do in our November issue, we have acknowledged all of the great referees who helped to review articles during the past year. We couldn't publish high quality articles without the help of our many reviewers, and we are extremely thankful for their time and effort.

Mining Twitter for Airline Consumer Sentiment Airlines, Consumers, and Twitter Anyone who travels regularly recognizes that airlines struggle to deliver a consistent, positive customer experience. Through extensive interview and survey work, the American Customer Satisfaction Index ( quantifies this impression. As a group, airlines falls at the bottom of their industry rankings, below the Post Office and insurance companies:

IT Operations Analytics In the fields of information technology and systems management, IT Operations Analytics (ITOA) is an approach or method applied to application software designed to retrieve, analyze and report data for IT operations. ITOA has been described as applying big data analytics to large datasets where IT operations can extract unique business insights.[1][2] In its Hype Cycle Report, Gartner rated the business impact of ITOA as being ‘high’, meaning that its use will see businesses enjoy significantly increased revenue or cost saving opportunities.[3] By 2017, Gartner predicts that 15% of enterprises will use IT operations analytics technologies to deliver intelligence for both business execution and IT operations.[2] Definition[edit]

Create an SPSS data set Notes on the Missing Values Codes: What are missing values codes, and why do you need them? Sometimes in the collection of data there are values that are lost or cannot be gathered. These are called "missing values." When such values occur, it is important for the program to know that the values are missing so that statistical calculations may take this into account. Missing values are usually designated as an impossible value. For example, the missing values designated for the variable AGE may be -9, since it is impossible for the variable AGE to have the value -9. Extracting Time Series from Large Data Sets Introduction Analyzing time series data of all sorts is a fundamental business analytics task to which the R language is beautifully suited. In addition to the time series functions built into base stats library there are dozens of R packages devoted to time series Some packages help with basic tasks such as creating date data types, others offer specialized functions for financial applications. When working with R the difficult part isn’t finding the right analytical tool; often, it’s getting the time series data to begin with. This is especially true when the time series need to be extracted from time stamped data embedded in very large data sets: data sets that are too large to be read into memory. In this example, we are going to use “data step” functions in Revolution Analytics’ RevoScaleR package to access a large data file, manipulate it, sort it, extract the data we need and aggregate records with monthly time stamps to form multiple, monthly time series.

Data Visualisation: What's the big deal? The concept of using pictures to understand complex information — especially data — has been around for a very long time, centuries in fact. One of the most cited examples of statistical graphics is Napoleon’s invasion of Russia mapped by Charles Minard. The maps showed the size of the army and the path of Napoleon’s retreat from Moscow. It also included detailed information like temperature and time scales, providing the audience with an in-depth understanding of the event.

Statistics for the Health Sciences Book Home Welcome to the Companion Website for Dancey, Reidy & Rowe Statistics for the Health Sciences: A Non-Mathematical Introduction Statistics for the Health Sciences is a highly readable and accessible textbook on understanding statistics for the health sciences, both conceptually and via the SPSS programme. Revolution R Enterprise: Production-Grade Analysis for Business & Large-Scale Research Industry’s Most Capable Big Data Big Analytics Platform Revolution R Enterprise is the fastest, most cost effective enterprise-class big data big analytics platform available today. Supporting a variety of big data statistics, predictive modeling and machine learning capabilities, Revolution R Enterprise is also 100% R. Revolution R Enterprise provides users with the best of both – cost-effective and fast big data analytics that are fully compatibility with the R language, the de facto standard for modern analytics users. Offering high-performance, scalable, enterprise-capable analytics, Revolution R Enterprise supports a variety of analytical capabilities including exploratory data analysis, model building and model deployment.

50 external machine learning / data science resources and articles Data Science Central 50 external machine learning / data science resources and articles by Vincent Granville

Related: