background preloader

Dataresources

Facebook Twitter

Machine Learning Repository. Research-quality data sets. Public Data Sets on AWS. Click here for the detailed list of available data sets. Here are some examples of popular Public Data Sets: NASA NEX: A collection of Earth science data sets maintained by NASA, including climate change projections and satellite images of the Earth's surface Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages 1000 Genomes Project: A detailed map of human genetic variation Google Books Ngrams: A data set containing Google Books n-gram corpuses US Census Data: US demographic data from 1980, 1990, and 2000 US Censuses Freebase Data Dump: A data dump of all the current facts and assertions in the Freebase system, an open database covering millions of topics The data sets are hosted in two possible formats: Amazon Elastic Block Store (Amazon EBS) snapshots and/or Amazon Simple Storage Service (Amazon S3) buckets.

If you have any questions or want to participate in our Public Data Sets community, please visit our Public Data Sets forum. ArXiv.org help - arXiv Bulk Data Access. Home - GEO - NCBI. StatLib---Datasets Archive. If you have an interesting dataset, or collection of data from a book, please consider submitting the data. To submit a dataset, please see the submissions guidelines, via Some of the entries are shar archives. If you don't know how to deal with a shar archive, send the message for instructions. The datasets archive currently contains: NIST Statistical Reference Datasets (StRD) A pointer to a NIST site that contains reference datasets for the objective evaluation of the computational accuracy of statistical software. Agresti Contains data from "An Introduction to Categorical Data Analysis," by Alan Agresti, John Wiley, 1996, plus SAS code for various analyses.

Aldrich_Nelson.zip This data is used in the following book: Aldrich, J. and Forrest, N. (1984) "Linear Probability, Logit and Probit Models". Alr This file contains data from Applied Linear Regression, 2nd Edition, by Sanford Weisberg, John Wiley, 1985 (sandy@umnstat.stat.umn.edu) (36808 bytes) analcatdata Andrews Arsenic arsenic.zip backache. Datasets for Data Mining and Data Science. See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. Awesome Public Datasets on github, curated by caesar0301. AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Related. Machine Learning Repository.

Large Network Dataset Collection. Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Location-based online social networks Wikipedia networks, articles, and metadata Temporal networks User Actions Memetracker and Twitter Online Communities Online Reviews Face-to-Face Communication Networks Graph classification datasets Network types Directed : directed network Undirected : undirected network Bipartite : bipartite network Multigraph : network has multiple edges between a pair of nodes Temporal : for each node/edge we know the time when it appeared in the network Labeled : network contains labels (weights, attributes) on nodes and/or edges Network statistics Citing SNAP We encourage you to cite our datasets if you have used them in your work.

Kaggle: The Home of Data Science. Data resources. Gapminder: Unveiling the beauty of statistics for a fact based world view. Открытые данные в России | Открытые данные — государственные, коммерческие, общественные. Все. Портал открытых данных Правительства Москвы.