Knowledge extraction Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data. Overview After the standardization of knowledge representation languages such as RDF and OWL, much research has been conducted in the area, especially regarding transforming relational databases into RDF, identity resolution, knowledge discovery and ontology learning. Examples XML
What’s the law around aggregating news online? A Harvard Law report on the risks and the best practices [So much of the web is built around aggregation — gathering together interesting and useful things from around the Internet and presenting them in new ways to an audience. It’s the foundation of blogging and social media. But it’s also the subject of much legal debate, particularly among the news organizations whose material is often what’s being gathered and presented. Kimberley Isbell of our friends the Citizen Media Law Project has assembled a terrific white paper on the current state of the law surrounding aggregation — what courts have approved, what they haven’t, and where the (many) grey areas still remain. This should be required reading for anyone interested in where aggregation and linking are headed. You can get the full version of the paper (with footnotes) here; I’ve added some links for context. During the past decade, the Internet has become an important news source for most Americans. What is a news aggregator? Can they do that? AFP v. Associated Press v. So is it legal?
Museums and the Web 2010: Papers: Miller, E. and D. Wood, Recollection: Building Communities for Distributed Curation and Data Sharing Background The National Digital Information Infrastructure and Preservation Program at the Library of Congress is an initiative to develop a national strategy to collect, archive and preserve the burgeoning amounts of digital content for current and future generations. It is based on an understanding that digital stewardship on a national scale depends on active cooperation between communities. These diverse collections are held in the dispersed repositories and archival systems of over 130 partner institutions where each organization collects, manages, and stores at-risk digital content according to what is most suitable for the industry or domain that it serves. NDIIPP partners understand through experience that aggregating and sharing diverse collections is very challenging. Early in 2009, a pilot project recognizing the specific characteristics of this community was initiated by the Library of Congress and Zepheira. Specific goals for the Recollection project are to: How It Works
Real-Time News Curation - The Complete Guide Part 4: Process, Key Tasks, Workflow I have received a lot of emails from readers asking to illustrate more clearly what the actual typical tasks of a news curator are, and what are the tools that someone would need to use to carry them out. In Part 4 and 5 of this guide I am looking specifically at both the workflow, the tasks involved as well as at the attributes, qualities and skills that a newsmaster, or real-time news curator should have. 1. Sequence your selected news stories to provide the most valuable information reading experience to your readers. There are likely more tasks and elements to the news curator workflow that I have been able to identify right here. Please feel free to suggest in the comment area, what you think should be added to this set of tasks. Photo credits:1.
The Accidental Taxonomist: Taxonomy Trends and Future What are the trends in taxonomies, and where is the field going? The future of taxonomies turned out to be a unifying theme of last week’s annual Taxonomy Boot Camp conference, in Washington, DC, the premier event in taxonomies, from its opening keynote to its closing panel. “From Cataloguer to Designer” was the title of the opening keynote, an excellent presentation by consultant Patrick Lambe of Straits Knowledge. He said that there are new opportunities for taxonomists, especially in the technology space, if they change their mindset and their role from that of cataloguers, who describe the world as it is, to that of designers, who plan things as they could be. New trends involving taxonomies that he described include search-based applications, autoclassification, and knowledge graphs (such as the automatically curated index card of key information on a topic, as appears in some Google search results). New trends and technologies were discussed in individual presentations, too.
Knowledge tags Tagging was popularized by websites associated with Web 2.0 and is an important feature of many Web 2.0 services. It is now also part of some desktop software. A Description of the Equator and Some Otherlands, collaborative hypercinema portal, produced by documenta X, 1997. User upload page associating user contributed media with the term Tag. Online and Internet databases and early websites deployed them as a way for publishers to help users find content. In 1997, the collaborative portal "A Description of the Equator and Some Other Lands" produced by documenta X, Germany, coined the folksonomic term Tag for its co-authors and guest authors on its Upload page. Tagging has gained wide popularity due to the growth of social networking, photography sharing and bookmarking sites. Websites that include tags often display collections of tags as tag clouds. Many blog systems allow authors to add free-form tags to a post, along with (or instead of) placing the post into categories. Others
Intute: Encouraging Critical Thinking Online Encouraging Critical Thinking Online is a set of free teaching resources designed to develop students' analytic abilities, using the Web as source material. Two units are currently available, each consisting of a series of exercises for classroom or seminar use. Students are invited to explore the Web and find a number of sites which address the selected topic, and then, in a teacher-led group discussion, to share and discuss their findings. The exercises are designed so that they may be used either consecutively to form a short course, or individually. The resources encourage students to think carefully and critically about the information sources they use. The subject matter of the exercises is of relevance to a range of humanities disciplines (most especially, though by no means limited to, philosophy and religious studies), while the research skills gained will be valuable to all students. Teacher's Guide (Units 1 and 2) Printable version (PDF) Resources for Unit 1 Resources for Unit 2
Folksonomy An empirical analysis of the complex dynamics of tagging systems, published in 2007, has shown that consensus around stable distributions and shared vocabularies does emerge, even in the absence of a central controlled vocabulary. For content to be searchable, it should be categorized and grouped. While this was believed to require commonly agreed on sets of content describing tags (much like keywords of a journal article), recent research has found that, in large folksonomies, common structures also emerge on the level of categorizations. Accordingly, it is possible to devise mathematical models of collaborative tagging that allow for translating from personal tag vocabularies (personomies) to the vocabulary shared by most users. Origin Folksonomy is a type of collaborative tagging system in which the classification of data is done by users. Folksonomies consist of three basic entities: users, tags, and resource. There are two different groups of folksonomies.
Folksonomy :: vanderwal.net This page is a static permanent web document. It has been written to provide a place to cite the coinage of folksonomy. This is response the request from many in the academic community to document the circumstances and date of the creation of the term folksonomy. The definition at creation is also part of this document. Background I have been a fan of ad hoc labeling and tagging systems since at least the late 1980s after watching a co-worker work his magic with Lotus Magellan (he would add his own ad hoc keywords or tags to the documents on his hard drive, paying particular attention to add these tags to documents others created so to add his context). In 2003 del.icio.us was started by Joshua Schacter and it included identity in its social bookmarking. Not long after del.icio.us, Flickr (a social photo sharing site) started including tags while it was still early in its product development. Creation of Folksonomy Term I am a fan of the word folk when talking about regular people.
Tags & Folksonomies - What are they, and why should you care? Tags, or folksonomies are actually a lot simpler than much of the acedemic debate surrounding them. Put simply, they are a user defined method for organizing data. Im going to try to explain what they are, why they are important to marketers and web devs and suggest some ways you might use them. Follow the title link above for the full post. First, Some Examples of Tags in Action There are only a few good, working examples of tagging in operation right now. del.icio.us - a social bookmarking systemFlickr - a photo publishing / sharing siteTechnorati Tags - a recent feature added to the popular blog search engineMetaFilter Tags - another recently added feature to the original group blog.TagSurf - an experimental forum based on tags rather than the standard way of organizing topics del.icio.us and flickr were the first systems to use tagging as far as im aware, at least to become popular because of it. So How does it Work? So What Makes Tags Important? Oh boy, starting to get the picture?