background preloader

Large Network Dataset Collection

Large Network Dataset Collection
Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Location-based online social networks Wikipedia networks, articles, and metadata Temporal networks User Actions Memetracker and Twitter Online Communities Online Reviews Face-to-Face Communication Networks Graph classification datasets Network types Directed : directed network Undirected : undirected network Bipartite : bipartite network Multigraph : network has multiple edges between a pair of nodes Temporal : for each node/edge we know the time when it appeared in the network Labeled : network contains labels (weights, attributes) on nodes and/or edges Network statistics Citing SNAP We encourage you to cite our datasets if you have used them in your work. Related:  Big data

Machine Learning Repository OpeNER - Webservices Input Tools This collection of components is used to start OpeNER pipelines. For now a language identifier is available. Language Identifier Language identifier receives plain text and outputs the language of the input text. The identified language can be used as a parameter to the OpeNER modules that require a language parameter. More information about the webservice can be found at its endpoint. Basics These components are the start of each OpeNER pipeline. Tokenizer The tokenizer receives plain text as input and a language parameter. More information about the webservice can be found at its endpoint. POS Tagger Part of Speech Tagging means identifying whether each word is a noun, a verb, etc. More information about the webservice can be found at its endpoint. Tree Tagger This tool implements a wrapper for TreeTagger ( allowing to apply this tagger to KAF files and obtain the result also in KAF format. NER/NED/Co-reference Coreference

Datasets for Data Mining and Data Science See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. Awesome Public Datasets on github, curated by caesar0301. AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Related

stop | A Spanish stop word list. Comments begin with vertical bar. 50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool. For Beginners The following list of links will help you get started with Google Analytics from setup to understanding what data is being presented by Google Analytics. How to Use Google Analytics for Beginners – Mahalo’s how-to guide for beginners. Tips & Tricks If you’re already fairly familiar with Google Analytics and you’re ready to dig deeper and learn more about how to make use of the information that is available to you with Google Analytics, this list of tips & tricks is for you. Plugins, Hacks & Additions Want to learn how to get even more out of and extend Google Analytics by extending it with third party plugins, additions and hacks?

indico | Documentation Text Tags Determine the topics in the phrase or document `str`. Private cloud endpoint POST Arguments - String | List - required - text to be analyzed - String - optional - your indico API key - String - optional - your private cloud subdomain - Integer - optional (defaults to 1) - specify model version - Integer - optionals - only return this many of the most likely topics - Float (defaults to 0.) - optional - only return topics with likelihood greater than this number - Boolean (defaults to False) - optional - when False, the probabilities of all topics sum to 1, when True, topic probabilities are independent and are not constrained to sum to 1. For an example of how to pass keyword arguments to the indico API in a post request, see the right hand sidebar. Output This function will return a dictionary with 111 key-value pairs. Complete List of Tags

Common Google Universal Analytics Mistakes that kill your Analysis & Conversions I have audited hundreds of web analytics accounts and profiles. And each account/view had at least one or two issues which seriously stood in my way of getting optimum results from my analysis. I have put all of these issues into five broad categories: Directional Issues Data Collection Issues Data Integration issues Data Interpretation Issues Data Reporting Issues These are the most common mistakes that kill your analysis, reporting and conversions. In order to get optimum results from your analysis of Universal Analytics reports you must aim to find and fix as many of these issues as possible. Failing to do so will almost always result in inaccurate analysis, interpretation and reporting. 1. These issues are not associated with Google Universal Analytics or any other analytics software you use but are commonly found in analysts themselves and are reflected in the way they set up Google Analytics account, advanced segment, conversions segments, filters and custom reports. For example: 1. 2.

TASS Welcome to the home page of the Workshop on Semantic Analysis at SEPLN (TASS). TASS has been celebrated since 2012, and its original aim was the furthering of the research on sentiment analysis in Spanish. TASS remains to foster sentiment analysis in Spanish, but it wants to promote other tasks related to semantic analysis in Spanish, so the Organization of TASS invites to the research community to proposes new tasks related to semantic analysis in Spanish. If you are interested in the previous editions of TASS or the downloading of the corpora released, we encourage you to visit the web pages of the previous editions. In order to download any of the corpora of TASS, you must accept the TASS data license by submitting the license form. After the submission of the form you will receive an email with the link to download the data. If you have any question or suggestion to TASS, please, send an email to tass-tasks@googlegroups.com.

Using the New Cohort Analysis in Google Analytics The cohort was the basic tactical unit of Roman Legions following the reforms of Gaius Marius in 107 BC. Initially a Roman legion consisted of ten cohorts, each consisting of 480 men. Today we use the term cohort to distinguish between groups of consumers to help us make them spend more money on things they probably don’t need. Progress? I guess I’d rather live in a world where we try and get people to spend more money on shoes, than die violently by taking a spear to my chest while fighting Carthaginians; but it’s close. And now Google Analytics has a fancy new Cohort Analysis Report that lets us analyze the death rates from the Second Punic War… Er… no… it helps us analyze the consumer/shoe thing. Ok, So What are Cohorts? For our purposes – cohorts are a way of grouping together people (or content), usually, based on date, and for our purposes it’s grouping them by their first session on a website. What is Cohort Analysis? The New Cohort Analysis Report Lines and Triangle Charts

An Introduction to Sentiment Analysis / Opinion Mining In the last decade, sentiment analysis (SA), also known as opinion mining, has attracted an increasing interest. It is a hard challenge for language technologies, and achieving good results is much more difficult than some people think. The task of automatically classifying a text written in a natural language into a positive or negative feeling, opinion or subjectivity (Pang and Lee, 2008), is sometimes so complicated that even different human annotators disagree on the classification to be assigned to a given text. Personal interpretation by an individual is different from others, and this is also affected by cultural factors and each person’s experience. Two approaches The problem has been tackled mainly from two different approaches (Liu, 2012): computational learning techniques (Pang, Lee, and Vaithyanathan, 2002) and semantic approaches (Turney, 2002). Pros and Cons There are numerous national and international workshops for sentiment analysis evaluation and assessment. Our solution

Related: