background preloader

Large Network Dataset Collection

Large Network Dataset Collection
Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Location-based online social networks Wikipedia networks, articles, and metadata Temporal networks User Actions Memetracker and Twitter Online Communities Online Reviews Face-to-Face Communication Networks Graph classification datasets Network types Directed : directed network Undirected : undirected network Bipartite : bipartite network Multigraph : network has multiple edges between a pair of nodes Temporal : for each node/edge we know the time when it appeared in the network Labeled : network contains labels (weights, attributes) on nodes and/or edges Network statistics Citing SNAP We encourage you to cite our datasets if you have used them in your work. Related:  Big data

Machine Learning Repository OpeNER - Webservices Input Tools This collection of components is used to start OpeNER pipelines. For now a language identifier is available. Language Identifier Language identifier receives plain text and outputs the language of the input text. The identified language can be used as a parameter to the OpeNER modules that require a language parameter. More information about the webservice can be found at its endpoint. Basics These components are the start of each OpeNER pipeline. Tokenizer The tokenizer receives plain text as input and a language parameter. More information about the webservice can be found at its endpoint. POS Tagger Part of Speech Tagging means identifying whether each word is a noun, a verb, etc. More information about the webservice can be found at its endpoint. Tree Tagger This tool implements a wrapper for TreeTagger ( allowing to apply this tagger to KAF files and obtain the result also in KAF format. NER/NED/Co-reference Coreference

Graphs please contact Christian Sommer for comments and questions, or if you have other data sets.last update April 2010 used for shortest path queries, DIMACS means 9th DIMACS Implementation Challenge - Shortest Paths DBLP graph The DBLP Computer Science Bibliography co-author graph largest connected component Web graph WebGraph by the Laboratory for Web Algorithmics link graph interpreted as undirected graph (in which case it is already connected) Router topology CAIDA's Router-Level Topology Measurements "The [...] data file holds link directions corresponding to the traceroute directions." second file (itdk0304_rlinks_undirected), interpreted as undirected graph, largest connected component Citation graph KDD competition, citation graph of the hep-th portion of the arXiv hep-th citations tarball, interpreted as undirected graph, largest connected component Database of Interacting Proteins BioGRID DIMACS format copied from DIMACS

Datasets for Data Mining and Data Science See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. Awesome Public Datasets on github, curated by caesar0301. AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Related

stop | A Spanish stop word list. Comments begin with vertical bar. Data + Design Running your own study to collect data is not the only or best way to start your data analysis. Using someone else’s dataset and sharing your data is on the rise and has helped advance much of the recent research. Using external data offers several benefits: Where to Find External Data All those benefits sound great! Public Data Once you have a better idea of what you’re looking for in an external dataset, you can start your search at one of the many public data sources available to you, thanks to the open content and access movement that has been gaining traction on the Internet. If you decide to use a search engine (like Google) to look for datasets, keep in mind that you’ll only find things that are indexed by the search engine. If you’re not sure what to do with a particular type of data, try browsing through the Information is Beautiful awards for inspiration. Non-Public Data Of course, not all data is public. Assessing External Data Using External Data

50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool. For Beginners The following list of links will help you get started with Google Analytics from setup to understanding what data is being presented by Google Analytics. How to Use Google Analytics for Beginners – Mahalo’s how-to guide for beginners. Tips & Tricks If you’re already fairly familiar with Google Analytics and you’re ready to dig deeper and learn more about how to make use of the information that is available to you with Google Analytics, this list of tips & tricks is for you. Plugins, Hacks & Additions Want to learn how to get even more out of and extend Google Analytics by extending it with third party plugins, additions and hacks?

indico | Documentation Text Tags Determine the topics in the phrase or document `str`. Private cloud endpoint POST Arguments - String | List - required - text to be analyzed - String - optional - your indico API key - String - optional - your private cloud subdomain - Integer - optional (defaults to 1) - specify model version - Integer - optionals - only return this many of the most likely topics - Float (defaults to 0.) - optional - only return topics with likelihood greater than this number - Boolean (defaults to False) - optional - when False, the probabilities of all topics sum to 1, when True, topic probabilities are independent and are not constrained to sum to 1. For an example of how to pass keyword arguments to the indico API in a post request, see the right hand sidebar. Output This function will return a dictionary with 111 key-value pairs. Complete List of Tags

umbrae/reddit-top-2.5-million Common Google Universal Analytics Mistakes that kill your Analysis & Conversions I have audited hundreds of web analytics accounts and profiles. And each account/view had at least one or two issues which seriously stood in my way of getting optimum results from my analysis. I have put all of these issues into five broad categories: Directional Issues Data Collection Issues Data Integration issues Data Interpretation Issues Data Reporting Issues These are the most common mistakes that kill your analysis, reporting and conversions. In order to get optimum results from your analysis of Universal Analytics reports you must aim to find and fix as many of these issues as possible. Failing to do so will almost always result in inaccurate analysis, interpretation and reporting. 1. These issues are not associated with Google Universal Analytics or any other analytics software you use but are commonly found in analysts themselves and are reflected in the way they set up Google Analytics account, advanced segment, conversions segments, filters and custom reports. For example: 1. 2.

Related: