background preloader

Tagging

Facebook Twitter

Del.icio.us

Reuters Wants The World To Be Tagged - ReadWriteWeb. As Richard MacManus recently predicted, in 2008 we'll witness the rise of semantic web services.

Reuters Wants The World To Be Tagged - ReadWriteWeb

From the native support for Microformats in Firefox 3, to the New York Times' utilization of rich headers metadata, to this week's release of the Social Graph API by Google, semantics are starting to slip onto the web. The impact is being felt because large companies are really starting to focus on structured information. In the same vein, last week Reuters - an international business and financial news giant - launched an API called Open Calais. The API does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events. This technology is the next generation of the Clear Forest offering, which Reuters acquired last year. Open Calais API Basics The idea behind Calais is simple - identify interesting bits into metadata in documents. For any document submitted into Calais, entities are identified, extracted and annotated.

What is Calais Good For? Conclusion. Lorcan Dempsey's weblog: Serendipitous encounter through ta. The University of Michigan has introduced a social bookmarking application, MTagger.

Lorcan Dempsey's weblog: Serendipitous encounter through ta

Here is Ken Varnum: More important than the tagging functionality itself is what MTagger will allow our faculty, staff, and students to do. MTagger brings a social component to research that we have not previously had. It will allow users to share knowledge about library resources with each other, to enable quick-and-dirty subject guides to be produced, and -- we hope -- to bring researchers together via their individual tag clouds.

As research moves online, chance meetings in the stacks of researchers with overlapping interests become even more rare. I very much like the way Ken describes the rationale for this initiative above, and the focus on social connection rather than retrieval. Related entries: Australian Museum Uses Open Calais to Tag Collection - ReadWrite. The Powerhouse Museum of Science and Design in Sydney, Australia has begun to utilize the Reuters Open Calais API (our coverage) to tag their collection.

Australian Museum Uses Open Calais to Tag Collection - ReadWrite

The museum's online collection database houses some 66,303 objects, so tagging them all by hand would be quite a task. By using the Open Calais web service, the museum is able to automate much of the process. That the museum has so much of its collection online is actually quite impressive in its own right. About 70% of the museum's electronically documented collection is online in the database which went live in June 2006.

Museum objects are searchable, taggable (by humans) and painstakingly described. However, there are so many objects, that even though users can help to tag them, many of them haven't yet been tagged. The automatically generated tags at right were created by the API for some swim wear designed by Speedo for the 1991 Australian swimming team that competed at the World Swimming Championships in Perth. Lorcan Dempsey's weblog: Tags. Stanford researchers collected data from del.icio.us and come to some pretty interesting conclusions about tagging.

Lorcan Dempsey's weblog: Tags

Of course, they are talking about tagging of web pages where the text of the tagged item is available for indexing. Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the largest dataset from a social bookmarking site yet analyzed by academic researchers. Our dataset represents about forty million bookmarks from the social bookmarking site del.icio.us. In general they found that users thought that tags were objective and relevant. Result 11: Domains are often highly correlated with particular tags and vice versa.Conclusion: It may be more efficient to train librarians to label domains than to ask users to tag pages.