background preloader

Lucene + SKOS = Search + thesaurus

Facebook Twitter

Bernhard Haslhofer. SKOS + Linked Data. The inimitable Ed Summers has been working inside the Library of Congress, building examples and demonstrators of how LC could be getting themselves into the semantic web, the linked-data web.

SKOS + Linked Data

It appears he’s got fed up of waiting for the support, permission and infrastructure he so richly deserves to get this data out there and he’s been and gone and done something smart outside. lcsh.info is now a home where you can find a copy of the Library of Congress Subject Headings available in SKOS. This is a great piece of work and fits in perfectly with the work I’ve been doing on Semantic Marc. After much discussion with Ed he’s provided two URI schemes, the primary scheme is based on the LC Control Number, and the second is based on the natural language term of the heading. ASKOSI : SKOS for Solr.

To integrate SKOS within DSpace, we added: indexation expansion: every concept reference is expanded in the Lucene search indexes: to all translations of the preferred term to all synonyms to all alias of the identification code (including the main concept identification code) to all notations in other coding scheme for the concept auto-complete to help the user select a term and provide the corresponding code to the search engine translation of the stored code for the concept reference, taking into account the user language faceted browsing improved result display and sort options.

ASKOSI : SKOS for Solr

We had to modify the tokenizers to ensure that a code following the syntax: ConceptScheme_ConceptId is indexed as one word and left untouched (no stemming for instance). This work was presented at OAI7 in Geneva. SolR, a future project? # Lucene-skos - A SKOS analyzer for Lucene and Solr.