Large Network Dataset Collection

Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Location-based online social networks Wikipedia networks, articles, and metadata Temporal networks User Actions Memetracker and Twitter Online Communities Online Reviews Face-to-Face Communication Networks Graph classification datasets Network types Directed : directed network Undirected : undirected network Bipartite : bipartite network Multigraph : network has multiple edges between a pair of nodes Temporal : for each node/edge we know the time when it appeared in the network Labeled : network contains labels (weights, attributes) on nodes and/or edges Network statistics Citing SNAP We encourage you to cite our datasets if you have used them in your work. Related:  Big data

Machine Learning Repository Graphs please contact Christian Sommer for comments and questions, or if you have other data sets.last update April 2010 used for shortest path queries, DIMACS means 9th DIMACS Implementation Challenge - Shortest Paths DBLP graph The DBLP Computer Science Bibliography co-author graph largest connected component Web graph WebGraph by the Laboratory for Web Algorithmics link graph interpreted as undirected graph (in which case it is already connected) Router topology CAIDA's Router-Level Topology Measurements "The [...] data file holds link directions corresponding to the traceroute directions." second file (itdk0304_rlinks_undirected), interpreted as undirected graph, largest connected component Citation graph KDD competition, citation graph of the hep-th portion of the arXiv hep-th citations tarball, interpreted as undirected graph, largest connected component Database of Interacting Proteins BioGRID DIMACS format copied from DIMACS

Datasets for Data Mining and Data Science See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. Awesome Public Datasets on github, curated by caesar0301. AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Related

Data + Design Running your own study to collect data is not the only or best way to start your data analysis. Using someone else’s dataset and sharing your data is on the rise and has helped advance much of the recent research. Using external data offers several benefits: Where to Find External Data All those benefits sound great! Public Data Once you have a better idea of what you’re looking for in an external dataset, you can start your search at one of the many public data sources available to you, thanks to the open content and access movement that has been gaining traction on the Internet. If you decide to use a search engine (like Google) to look for datasets, keep in mind that you’ll only find things that are indexed by the search engine. If you’re not sure what to do with a particular type of data, try browsing through the Information is Beautiful awards for inspiration. Non-Public Data Of course, not all data is public. Assessing External Data Using External Data

umbrae/reddit-top-2.5-million Common Google Universal Analytics Mistakes that kill your Analysis & Conversions I have audited hundreds of web analytics accounts and profiles. And each account/view had at least one or two issues which seriously stood in my way of getting optimum results from my analysis. I have put all of these issues into five broad categories: Directional Issues Data Collection Issues Data Integration issues Data Interpretation Issues Data Reporting Issues These are the most common mistakes that kill your analysis, reporting and conversions. In order to get optimum results from your analysis of Universal Analytics reports you must aim to find and fix as many of these issues as possible. Failing to do so will almost always result in inaccurate analysis, interpretation and reporting. 1. These issues are not associated with Google Universal Analytics or any other analytics software you use but are commonly found in analysts themselves and are reflected in the way they set up Google Analytics account, advanced segment, conversions segments, filters and custom reports. For example: 1. 2.

PhysioBank Archive Index This page lists all currently available databases in the PhysioBank archives, organized according to the types of signals and annotations contained in each database: If you prefer, you can view separate lists of these databases organized by class: Class 1 (completed reference databases) Class 2 (archival copies of raw data that support published research, contributed by authors or journals) Class 3 (other contributed collections of data, including works in progress) We make class 2 and class 3 data available via PhysioNet as a service to the research community. On this page, listings within each group are ordered by class, and then alphabetically by the name of the database. Multi-Parameter Databases These databases include a variety of digitized physiologic signals in each recording. [Class 1] MGH/MF Waveform Database. ECG Databases Unless specifically noted, each recording in these databases includes one or more digitized ECG signals and a set of beat annotations.

Using the New Cohort Analysis in Google Analytics The cohort was the basic tactical unit of Roman Legions following the reforms of Gaius Marius in 107 BC. Initially a Roman legion consisted of ten cohorts, each consisting of 480 men. Today we use the term cohort to distinguish between groups of consumers to help us make them spend more money on things they probably don’t need. Progress? I guess I’d rather live in a world where we try and get people to spend more money on shoes, than die violently by taking a spear to my chest while fighting Carthaginians; but it’s close. And now Google Analytics has a fancy new Cohort Analysis Report that lets us analyze the death rates from the Second Punic War… Er… no… it helps us analyze the consumer/shoe thing. Ok, So What are Cohorts? For our purposes – cohorts are a way of grouping together people (or content), usually, based on date, and for our purposes it’s grouping them by their first session on a website. What is Cohort Analysis? The New Cohort Analysis Report Lines and Triangle Charts

