background preloader

Finding Data on the Internet

Finding Data on the Internet
Skip to Content A Community Site for R – Sponsored by Revolution Analytics Home » How to » Finding Data on the Internet Finding Data on the Internet By RevoJoe on October 6, 2011 The following list of data sources has been modified as of 3/18/14. If an (R) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. Economics American Economic Ass. Data Science Practice This section contains data sets used in the book "Doing Data Science" by Rachel Schutt and Cathy O'Neil (O'Reilly 2014) Datasets on the book site: Enron Email Dataset: GetGlue (time stamped events: users rating TV shows): Titanic Survival Data Set: Half a million Hubway rides: Finance Government Health Care Gapminder: Machine Learning Networks Science Comments Related:  Big Data / AnalyticsEstadistica

Where can I find large datasets open to the public? Journal of Statistics Education (JSE) Home Page Current Issue The November 2014 (Volume 22, Number 3) issue of JSE is now available. The table of contents can be accessed at: 2014 Table of Contents. This issue includes six regular articles, two Research on K-12 Statistics Education articles, two Teaching Bits, and an interview by Allan Rossman with Josh Tabor. As we normally do in our November issue, we have acknowledged all of the great referees who helped to review articles during the past year. We couldn't publish high quality articles without the help of our many reviewers, and we are extremely thankful for their time and effort. We hope you enjoy this issue, and, as always, we welcome your feedback. The JSE Webinar Series on CAUSEweb The JSE webinar series continues to take place approximately once each month, on the third Tuesday of the month, from 12 – 1 p.m. JSE on Facebook and Twitter There is also a Twitter account for JSE that you can follow if you use Twitter (@JStatEd). Paper Submissions and Author Guidelines

Publicly Available Big Data Sets :: Hadoop Illuminated Public Data sets on Amazon AWS Amazon provides following data sets : ENSEMBL Annotated Gnome data, US Census data, UniGene, Freebase dump Data transfer is 'free' within Amazon eco system (within the same zone) AWS data sets InfoChimps InfoChimps has data marketplace with a wide variety of data sets. InfoChimps market place Comprehensive Knowledge Archive Network open source data portal platform data sets available on from Stanford network data collection Open Flights Crowd sourced flight data Flight arrival data

Create an SPSS data set Notes on the Missing Values Codes: What are missing values codes, and why do you need them? Sometimes in the collection of data there are values that are lost or cannot be gathered. These are called "missing values." When such values occur, it is important for the program to know that the values are missing so that statistical calculations may take this into account. Missing values are usually designated as an impossible value. For example, the missing values designated for the variable AGE may be -9, since it is impossible for the variable AGE to have the value -9. IT Operations Analytics In the fields of information technology and systems management, IT Operations Analytics (ITOA) is an approach or method applied to application software designed to retrieve, analyze and report data for IT operations. ITOA has been described as applying big data analytics to large datasets where IT operations can extract unique business insights.[1][2] In its Hype Cycle Report, Gartner rated the business impact of ITOA as being ‘high’, meaning that its use will see businesses enjoy significantly increased revenue or cost saving opportunities.[3] By 2017, Gartner predicts that 15% of enterprises will use IT operations analytics technologies to deliver intelligence for both business execution and IT operations.[2] Definition[edit] History[edit] Due the mainstream embrace of cloud computing and the increasing desire for businesses to adopt more Big Data practices, the ITOA industry has grown significantly since 2010. Applications[edit] Types[edit] Tools and ITOA Platforms[edit] See also[edit]

Statistics for the Health Sciences Book Home Welcome to the Companion Website for Dancey, Reidy & Rowe Statistics for the Health Sciences: A Non-Mathematical Introduction Statistics for the Health Sciences is a highly readable and accessible textbook on understanding statistics for the health sciences, both conceptually and via the SPSS programme. The textbook takes students from the basics of research design, hypothesis testing and descriptive statistical techniques through to the more advanced inferential statistical tests they are likely to encounter. Exercises and tips throughout allow students to practice using SPSS, while the companion website provides further guidance on conducting statistical analyses, including: Multiple choice questions for lecturers and students Full PowerPoint slides for lecturers Additional exercises using SPSS, and Guidance for using SAS and R statistical software This is an essential textbook for students studying beginner and intermediate level statistics across the health sciences. Reviews

Data Visualisation: What's the big deal? | Career and Hiring Insights | Aquent The concept of using pictures to understand complex information — especially data — has been around for a very long time, centuries in fact. One of the most cited examples of statistical graphics is Napoleon’s invasion of Russia mapped by Charles Minard. The maps showed the size of the army and the path of Napoleon’s retreat from Moscow. It also included detailed information like temperature and time scales, providing the audience with an in-depth understanding of the event. However, as with most things, it’s technology that has truly allowed data visualisation to take the stage and get noticed. It’s no surprise that with big data there’s potential for BIG opportunity (someone pass me the shot glass), but many corporates are genuinely challenged when it comes to: understanding the data they have finding value in it getting the wider business to buy in and just GET IT!!! So how do you tackle this? How do you get people to comprehend this information quickly? One word — INSIGHT.

Downloadable Sample SPSS Data Files Downloadable Sample SPSS Data Files Data QualityEnsure that required fields contain data.Ensure that the required homicide (09A, 09B, 09C) offense segment data fields are complete.Ensure that the required homicide (09A, 09B, 09C) victim segment data fields are complete.Ensure that offenses coded as occurring at midnight are correctEnsure that victim variables are reported where required and are correct when reported but not required. Standardizing the Display of IBR Data: An Examination of NIBRS ElementsTime of Juvenile Firearm ViolenceTime of Day of Personal Robberies by Type of LocationIncidents on School Property by HourTemporal Distribution of Sexual Assault Within Victim Age CategoriesLocation of Juvenile and Adult Property Crime VictimizationsRobberies by LocationFrequency Distribution for Victim-Offender Relationship by Offender and Older Age Groups and Location Analysis ExamplesFBI's Analysis of RobberyFBI's Analysis of Motor Vehicle Theft Using Survival Model

50 external machine learning / data science resources and articles Data Science Central 50 external machine learning / data science resources and articles by Vincent Granville Sep 24, 2015 Starred articles are candidates for the picture of the week. Resources Source: article #3, below Articles Check out our previous selection of articles. DSC Resources Additional Reading

Data files These are SPSS data files for use in our lessons. Some are my data, a few might be fictional, and some come from DASL. DASL is a good place to find extra datasets that you can use to practice your analysis techniques. I'd really recommend doing this. To use these files, click the links with your right mouse button and choose 'Save target as...'. Save the files to your computer's desktop and then double-click them to load them into SPSS ready for analysis. Data exploration and differences between two groups PsychBike.sav - data from bicycle overtaking project stereograms.sav - data from people looking at SIRDS healthdata.sav - data from people offered a CD of relaxation exercises scents.sav - the time taken to complete a maze with and without a strong scent Correlation, partial correlation and regression distance.sav - These data show how far, on average, each person in the UK drives each year. Multiple regression Polynomial regression Effect sizes effectsize.sav Cluster analysis Factor analysis