background preloader

AlchemyAPI - Transforming Text Into Knowledge

AlchemyAPI - Transforming Text Into Knowledge

4 free data tools for journalists (and snoops) - O'Reilly Radar Note: The following is an excerpt from Pete Warden’s free ebook “Where are the bodies buried on the web? Big data for journalists.” There’s been a revolution in data over the last few years, driven by an astonishing drop in the price of gathering and analyzing massive amounts of information. The technology is also getting easier to use. What does this mean for journalists? Many of you will already be familiar with WHOIS, but it’s so useful for research it’s still worth pointing out. You can also enter numerical IP addresses here and get data on the organization or individual that owns that server. Blekko The newest search engine in town, one of Blekko’s selling points is the richness of the data it offers. The first tab shows other sites that are linking to the current domain, in popularity order. The other handy tab is “Crawl stats,” especially the “Cohosted with” section: This tells you which other websites are running from the same machine. bit.ly Then click on the ‘Info Page+’ link:

NodeXL: Network Overview, Discovery and Exploration for Excel Call For Papers | Analytics and Decision Support for Ecosystems Part of the Organizational Systems and Technology Track at HICSS 2016. Relevant top minitrack papers will be invited for "fast-track" submissions to the Journal of Enterprise Transformation (JET). URL of this document: MiniTrack page on HICSS 2016 website: Abstract submission (optional): April 1 through June 15, 2015 Paper submission (mandatory): June 15, 2015 at midnight (Hawaii) Notification of acceptance: August 16, 2015 Opening of registration: April Conference Dates: January 5-8, 2016 The ecosystem concept is increasingly used in explaining the complexities of interconnected business and innovation activities at various levels (regional, national, global). This minitrack calls for theory, research, and practice in analytics and decision support for ecosystem orchestration. In short, a wide range of stakeholders need support for decision making about business ecosystems. Including, but not limited to: Theory/Models Methods Applications

tf–idf One of the simplest ranking functions is computed by summing the tf–idf for each query term; many more sophisticated ranking functions are variants of this simple model. Motivation[edit] Suppose we have a set of English text documents and wish to determine which document is most relevant to the query "the brown cow". A simple way to start out is by eliminating documents that do not contain all three words "the", "brown", and "cow", but this still leaves many documents. However, because the term "the" is so common, this will tend to incorrectly emphasize documents which happen to use the word "the" more frequently, without giving enough weight to the more meaningful terms "brown" and "cow". Mathematical details[edit] tf–idf is the product of two statistics, term frequency and inverse document frequency. The inverse document frequency is a measure of whether the term is common or rare across all documents. with Then tf–idf is calculated as Example of tf–idf[edit] Idf is a bit more involved:

4 Promising Curation Tools That Help Make Sense of the Web Steven Rosenbaum is a curator, author, filmmaker and entrepreneur. He is the CEO of Magnify.net, a real-time video curation engine for publishers, brands, and websites. His book Curation Nation is slated to be published this spring by McGrawHill Business. As the volume of content swirling around the web continues to grow, we're finding ourselves drowning in a deluge of data. Where is the relevant material? The solution on the horizon is curation. In the past 90 days alone, there has been an explosion of new software offerings that are the early leaders in the curation tools category. 1. Storify co-founder Burt Herman worked as a reporter for the Associated Press during a 12-year career, six of those in news management as a bureau chief and supervising correspondent. At the AP, editors sending messages to reporters asking them to do a story would regularly write, “Can u pls storify?” Storify is currently invite only. 2. Scoop.it is often described as Tumblr without the blog. 3. 4.

Bridging Analytics and Game Design: Lessons from the Trenches | Events | mediaX Event Description: Interactive media and games increasingly pervade and shape our society. In addition to their dominant roles in entertainment, video games play growing roles in education, arts, science and health. This seminar series brings together a diverse set of experts to provide interdisciplinary perspectives on these media regarding their history, technologies, scholarly research, industry, artistic value and potential future. Join us every Friday From April 3rd until June 5th from 12pm-1pm in Shriram 104. Also listed as one-unit course BIOE196. Presenters: Nick Yee & Nicolas Ducheneaut, Bridging Analytics and Game Design: Lessons from the Trenches. Nick Yee and Nic Ducheneaut both have academic backgrounds that combine social science with computer science.

ActiveWarehouse: Extract-Transform-Load Tool The ActiveWarehouse ETL component provides a means of getting data from multiple data sources into your data warehouse. The links in the side bar provide additional information on ETL. Here’s how to get rolling: Install the Gem Get to your command line and type sudo gem install activewarehouse-etl on Linux or OS X or type gem install activewarehouse-etl on Windows. ActiveWarehouse ETL depends on ActiveSupport, ActiveRecord, adapter_extensions and FasterCSV. You can also download the packages in Zip, Gzip, or Gem format from the ActiveWarehouse files section on RubyForge. Create Control Files Create the ETL control files. Execute the etl command Execute the etl command passing the control file name as the argument. Right now the ETL component has the following functionality: Fixed-width and delimited file parsing File and database source File and database destination Virtual source fields, which can be populated via output from Ruby code Support for pre- and post-processing code Transform pipeline

Tagxedo - Tag Cloud with Styles MALLET homepage MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. MALLET includes sophisticated tools for document classification: efficient routines for converting text to “features”, a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. Quick Start / Developer’s Guide In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers. Topic models are useful for analyzing large collections of unlabeled text. Many of the algorithms in MALLET depend on numerical optimization.

Related: