background preloader

Machine Learning Repository

Machine Learning Repository

Related:  Big data

Datasets for Data Mining and Data Science See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. French National Election Study, 1995 Principal Investigator(s): Lewis-Beck, Michael S.; Mayer, Nonna; Boy, Daniel, et al. This national survey was conducted to study the attitudes and opinions of the French electorate during election year 1995. Information is provided on respondents' interest in politics, ideological leanings, voting behavior, party choice in the 1994 European elections, choice of presidential candidate in the first and second ballot of the 1995 French national elections, perceptions of the French presidential candidates' positions on the ideological spectrum and respondents... (more info)

Using R for statistical analyses - Introduction Help and Documentation My Publications See my books about R at my Publications Page: Statistics for Ecologists using R and Excel. Published December 2011 Beginning R: The Statistical Progreamming Language. CVonline: Image Databases Index by Topic Another helpful site is the YACVID page. Action Databases Biological/Medical Face Databases Large Network Dataset Collection Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs

Data Sets The Pew Research Center's Internet Project is pleased to offer scholars access to raw data sets from our research. All uses of this data should reference the Pew Research Center as the source of the data and acknowledge that the Pew Research bears no responsibility for interpretations presented or conclusions reached based on analysis of the data. Our data sets are made available as single compressed archive files (.zip file). Pew Research is interested in learning about other ways that scholars use our data.

Graphs - R Cookbook My book about data visualization in R is available! The book covers many of the same topics as the Graphs and Data Manipulation sections of this website, but it goes into more depth and covers a broader range of techniques. You can preview it at Google Books. Datasets per Topic - TC-11 Description: This collection contains table structure ground truth data (rows, columns, cells etc) for document images containing tables in the UNLV and UW3 datasets. The ground truth that we provide is stored in XML format which stores row, column boundaries, bounding boxes of cells and additional attributes such as row-spanning column-spanning cells.The XML ground truth files have the same basename as the name of the corresponding image in the respective dataset. These XML files can then be used to generate color encoded ground truth images in PNG format which can be directly used by the pixel accurate benchmarking framework described in [1].

50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool. For Beginners

Book on Principles of Data Analysis Home Published by Cappella Archive, a micropublisher. (Why a micropublisher?) The text (700 kb) can be downloaded free, in your choice of A4 size letter size paperback size The paperback can be ordered directly from the publisher or (slightly more expensively) from Amazon UK. Common Google Universal Analytics Mistakes that kill your Analysis & Conversions I have audited hundreds of web analytics accounts and profiles. And each account/view had at least one or two issues which seriously stood in my way of getting optimum results from my analysis. I have put all of these issues into five broad categories: Directional Issues Data Collection Issues Data Integration issues Data Interpretation Issues Data Reporting Issues

Using the New Cohort Analysis in Google Analytics The cohort was the basic tactical unit of Roman Legions following the reforms of Gaius Marius in 107 BC. Initially a Roman legion consisted of ten cohorts, each consisting of 480 men. Today we use the term cohort to distinguish between groups of consumers to help us make them spend more money on things they probably don’t need. Progress? I guess I’d rather live in a world where we try and get people to spend more money on shoes, than die violently by taking a spear to my chest while fighting Carthaginians; but it’s close. And now Google Analytics has a fancy new Cohort Analysis Report that lets us analyze the death rates from the Second Punic War… Er… no… it helps us analyze the consumer/shoe thing.