background preloader

Large Network Dataset Collection

Large Network Dataset Collection
Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Location-based online social networks Wikipedia networks and metadata Memetracker and Twitter Online Communities Online Reviews Network types Directed : directed network Undirected : undirected network Bipartite : bipartite network Multigraph : network has multiple edges between a pair of nodes Temporal : for each node/edge we know the time when it appeared in the network Labeled : network contains labels (weights, attributes) on nodes and/or edges Network statistics Citing SNAP We encourage you to cite our datasets if you have used them in your work.

Related:  Social Network AnalysisData RepositoriesBig data

Introduction to Social Network Methods: Table of Contents Robert A. Hanneman and Mark Riddle Introduction to social network methods Table of contents About this book start [myPersonality Project] If you're here because of the news coverage: This wiki is aimed at researchers, although you're welcome to look around and see what we do. We also encourage you to try which predicts your personality based on your Facebook Likes. 2013-04-22 Added Smiley data in the download section myPersonality was a popular Facebook application that allowed users to take real psychometric tests, and us to record (with consent!) their psychological and Facebook profile.

Datasets for Data Mining and Data Science See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. Awesome Public Datasets on github, curated by caesar0301. AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.

Techniques and Tools: How To Visualize Your Network Earlier this month I was lucky enough to attend the CATechFest in LA designed and expertly facilitated by Aspiration. What I really enjoy about events that Aspiration convenes is that they allow the time and depth for practitioners to share knowledge and strengthen connections. The participants are the content – and the design of getting participants into small group discussions where we can discuss topics related to our work that we are passionate about and want to explore and learn. These discussions are not lectures or traditional panels and are participant driven. Ari Sahagun, a consultant who works with social justice groups on network visualizations, called for a group to discuss Network Mapping and Visualization.

Data, Data, Data: Thousands of Public Data Sources We love data, big and small and we are always on the lookout for interesting datasets. Over the last two years, the BigML team has compiled a long list of sources of data that anyone can use. It’s a great list for browsing, importing into our platform, creating new models and just exploring what can be done with different sets of data. In this post, we are sharing this list with you. Why? Well, searching for great datasets can be a time consuming task.

50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool. For Beginners The following list of links will help you get started with Google Analytics from setup to understanding what data is being presented by Google Analytics. How to Use Google Analytics for Beginners – Mahalo’s how-to guide for beginners.

Meerkat Fact Sheet Meerkat is a social network analysis application under development by Dr. Osmar Zaiane and his lab. Machine Learning Repository: Bag of Words Data Set Source: David Newman newman '@' uci.eduUniversity of California, Irvine Data Set Information: For each text collection, D is the number of documents, W is the number of words in the vocabulary, and N is the total number of words in the collection (below, NNZ is the number of nonzero counts in the bag-of-words). After tokenization and removal of stopwords, the vocabulary of unique words was truncated by only keeping words that occurred more than ten times. Individual document names (i.e. a identifier for each docID) are not provided for copyright reasons.

Common Google Universal Analytics Mistakes that kill your Analysis & Conversions I have audited hundreds of web analytics accounts and profiles. And each account/view had at least one or two issues which seriously stood in my way of getting optimum results from my analysis. I have put all of these issues into five broad categories: Directional Issues Data Collection Issues Data Integration issues Data Interpretation Issues Data Reporting Issues These are the most common mistakes that kill your analysis, reporting and conversions. In order to get optimum results from your analysis of Universal Analytics reports you must aim to find and fix as many of these issues as possible.

DM (2008) The application of network analysis to ancient transport geography: A case study of Roman Baetica [ Return to Navigation] Introduction § 1 In many ways the Roman province of Baetica is an ideal subject for exploring new approaches to historic transport geography. This is not due to the completeness of its record (for it is not), but because it provides a remarkable breadth of pertinent data (Sillières 1990, 9-16). This paper, loosely based on a seminar hosted by the Digital Classicist at King’s College London, will briefly discuss the results of applying some as-yet relatively uncommon techniques to the archaeology and documentary record of transport in the area.