background preloader

Datasets for Data Mining and Data Science

Datasets for Data Mining and Data Science
See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. Awesome Public Datasets on github, curated by caesar0301. AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Related

http://www.kdnuggets.com/datasets/index.html

Related:  Deep LearningBig data

Machine Learning is Fun! Part 3: Deep Learning and Convolutional Neural Networks Recognizing Objects with Deep Learning You might have seen this famous xkcd comic before. The goof is based on the idea that any 3-year-old child can recognize a photo of a bird, but figuring out how to make a computer recognize objects has puzzled the very best computer scientists for over 50 years. In the last few years, we’ve finally found a good approach to object recognition using deep convolutional neural networks. Large Network Dataset Collection Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks

Finding Data on the Internet Skip to Content A Community Site for R – Sponsored by Revolution Analytics Home » How to » Finding Data on the Internet Finding Data on the Internet By RevoJoe on October 6, 2011 Datasets Archive If you have an interesting dataset, or collection of data from a book, please consider submitting the data. To submit a dataset, please see the submissions guidelines, via Some of the entries are shar archives. Installation - TFLearn Tensorflow Installation TFLearn requires Tensorflow (version >= 0.9.0) to be installed. Select the correct binary to install, according to your system: # Ubuntu/Linux 64-bit, CPU only, Python 2.7 $ export TF_BINARY_URL= # Ubuntu/Linux 64-bit, GPU enabled, Python 2.7# Requires CUDA toolkit 7.5 and CuDNN v5. For other versions, see "Install from sources" below. $ export TF_BINARY_URL= # Mac OS X, CPU only, Python 2.7: $ export TF_BINARY_URL= # Mac OS X, GPU enabled, Python 2.7: $ export TF_BINARY_URL= # Ubuntu/Linux 64-bit, CPU only, Python 3.4 $ export TF_BINARY_URL= # Ubuntu/Linux 64-bit, GPU enabled, Python 3.4# Requires CUDA toolkit 7.5 and CuDNN v5. For other versions, see "Install from sources" below. $ export TF_BINARY_URL= # Ubuntu/Linux 64-bit, CPU only, Python 3.5 $ export TF_BINARY_URL= # Ubuntu/Linux 64-bit, GPU enabled, Python 3.5# Requires CUDA toolkit 7.5 and CuDNN v5.

50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool.

Data Sets The Pew Research Center's Internet Project is pleased to offer scholars access to raw data sets from our research. All uses of this data should reference the Pew Research Center as the source of the data and acknowledge that the Pew Research bears no responsibility for interpretations presented or conclusions reached based on analysis of the data. Our data sets are made available as single compressed archive files (.zip file). Public Data Sets on AWS Click here for the detailed list of available data sets. Here are some examples of popular Public Data Sets: NASA NEX: A collection of Earth science data sets maintained by NASA, including climate change projections and satellite images of the Earth's surface Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages 1000 Genomes Project: A detailed map of human genetic variation Google Books Ngrams: A data set containing Google Books n-gram corpuses US Census Data: US demographic data from 1980, 1990, and 2000 US Censuses Freebase Data Dump: A data dump of all the current facts and assertions in the Freebase system, an open database covering millions of topics The data sets are hosted in two possible formats: Amazon Elastic Block Store (Amazon EBS) snapshots and/or Amazon Simple Storage Service (Amazon S3) buckets. If you have any questions or want to participate in our Public Data Sets community, please visit our Public Data Sets forum.

Download and Setup   You can install TensorFlow either from our provided binary packages or from the github source. Requirements The TensorFlow Python API supports Python 2.7 and Python 3.3+. The GPU version works best with Cuda Toolkit 8.0 and cuDNN v5.1.

Related: