background preloader

Faceted Classification

Facebook Twitter

Web indexing. Metadata web indexing involves assigning keywords or phrases to web pages or web sites within a meta-tag field, so that the web page or web site can be retrieved with a search engine that is customized to search the keywords field.

Web indexing

This may or may not involve using keywords restricted to a controlled vocabulary list. This method is commonly used by search engine indexing. See also[edit] Universal Decimal Classification. The Universal Decimal Classification (UDC) is a bibliographic and library classification developed by the Belgian bibliographers Paul Otlet and Henri La Fontaine at the end of the 19th century.

Universal Decimal Classification

UDC provides a systematic arrangement of all branches of human knowledge organized as a coherent system in which knowledge fields are related and inter-linked.[1][2][3][4] Since the first edition in French "Manuel du Répertoire bibliographique universel" (1905), UDC has been translated and published in various editions in 40 languages.[8][9] UDC Summary, an abridged Web version of the scheme is available in over 50 languages.[10] The classification has been modified and extended over the years to cope with increasing output in all areas of human knowledge, and is still under continuous review to take account of new developments.[11][12] Faceted classification. Definition[edit] Faceted classification is used in faceted search systems that enable a user to navigate information along multiple paths corresponding to different orderings of the facets.

Faceted classification

This contrasts with traditional taxonomies in which the hierarchy of categories is fixed and unchanging. In other words, once information is categorized using multiple facets, it can also be retrieved using multiple facets. Thus, a user would not be restricted to one identifying search term in order to retrieve an item. Colon classification. Colon classification (CC) is a system of library classification developed by S.

Colon classification

R. Ranganathan. It was the first ever faceted (or analytico-synthetic) classification. The first edition was published in 1933. Since then six more editions have been published. Information Architecture Consulting by Peter Morville. Controlled vocabulary. In library and information science[edit] For example, in the Library of Congress Subject Headings (a subject heading system that uses a controlled vocabulary), authorized terms -- subject headings in this case -- have to be chosen to handle choices between variant spellings of the same concept (American versus British), choice among scientific and popular terms (Cockroaches versus Periplaneta americana), and choices between synonyms (automobile versus cars), among other difficult issues.

Controlled vocabulary

Choices of authorized terms are based on the principles of user warrant (what terms users are likely to use), literary warrant (what terms are generally used in the literature and documents), and structural warrant (terms chosen by considering the structure, scope of the controlled vocabulary). Controlled vocabularies also typically handle the problem of homographs, with qualifiers. Taxonomy. From Wikipedia, the free encyclopedia Taxonomy may refer to: Science[edit]

Taxonomy

Faceted search. Facets correspond to properties of the information elements.

Faceted search

They are often derived by analysis of the text of an item using entity extraction techniques or from pre-existing fields in a database such as author, descriptor, language, and format. Thus, existing web-pages, product descriptions or online collections of articles can be augmented with navigational facets. Development[edit] The Association for Computing Machinery's Special Interest Group on Information Retrieval provided the following description of the role of faceted search for a 2006 workshop: The web search world, since its very beginning, has offered two paradigms:Navigational search uses a hierarchy structure (taxonomy) to enable users to browse the information space by iteratively narrowing the scope of their quest in a predetermined order, as exemplified by Yahoo!

Taxonomic database. A taxonomic database is a database created to hold information related to biological taxa - for example groups of organisms organized by species name or other taxonomic identifier - for efficient data management and information retrieval as required.

Taxonomic database

Today, taxonomic databases are routinely used for the automated construction of biological checklists such as floras and faunas, both for print publication and online; to underpin the operation of web based species information systems; as a part of biological collection management (for example in museums and herbaria); as well as providing, in some cases, the taxon management component of broader science or biology information systems. They are also a fundamental contribution to the discipline of biodiversity informatics.

Categorization. There are many categorization theories and techniques.

Categorization

In a broader historical view, however, three general approaches to categorization may be identified: Classical categorizationConceptual clusteringPrototype theory The classical view[edit] The classical Aristotelian view claims that categories are discrete entities characterized by a set of properties which are shared by their members. In analytic philosophy, these properties are assumed to establish the conditions which are both necessary and sufficient conditions to capture meaning. According to the classical view, categories should be clearly defined, mutually exclusive and collectively exhaustive.

Knowledge representation and reasoning. Knowledge representation and reasoning (KR) is the field of artificial intelligence (AI) devoted to representing information about the world in a form that a computer system can utilize to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language.

Knowledge representation and reasoning

Knowledge representation incorporates findings from psychology about how humans solve problems and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning, such as the application of rules or the relations of sets and subsets. Examples of knowledge representation formalisms include semantic nets, Frames, Rules, and ontologies.

Examples of automated reasoning engines include inference engines, theorem provers, and classifiers. Overview[edit] This hypothesis was not always taken as a given by researchers. History[edit] Systematics. A comparison of phylogenetic and phenetic concepts Biological systematics is the study of the diversification of living forms, both past and present, and the relationships among living things through time. Named entity recognition. Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Most research on NER systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp. in 2006. Towards the World Brain « Trend Monitor 2.0. From highlighting pens to faceted clipmarks … By Jan Wyllie, Trend Monitor 2.0. Content analysis. Content analysis is "a wide and heterogeneous set of manual or computer-assisted techniques for contextualized interpretations of documents produced by communication processes in the strict sense of that phrase (any kind of text, written, iconic, multimedia, etc.) or signification processes (traces and artifacts), having as ultimate goal the production of valid and trustworthy inferences.

"[1] On the other side, Content Analysis can also study traces (documents from past times) and artifacts (non-linguistic documents), which come from communication processes in a broad sense of that phrase - commonly referred to as "signification" in Semiotics (in the absence of an intentional sender, semiosis is developed by abduction).[1] Over the years, content analysis has been applied to a variety of scopes. Concordancer. Concordancers are also used in corpus linguistics to retrieve alphabetically or otherwise sorted lists of linguistic data from the corpus in question, which the corpus linguist then analyzes.

Hermeneutics. Hermes, messenger of the gods. Hermeneutics /hɜrməˈnjuːtɪks/ is the theory of text interpretation, especially the interpretation of biblical texts, wisdom literature, and philosophical texts.[1][2] The terms "hermeneutics" and "exegesis" are sometimes used interchangeably. Concordance (publishing) Mordecai Nathan's Hebrew-Latin Concordance of the Bible A concordance is more than an index; additional material, such as commentary, definitions, and topical cross-indexing make producing them a labor-intensive process, even when assisted by computers. Although an automatically generated index lacks the richness of a published concordance, the ability to combine the result of queries concerning multiple terms (such as searching for words near other words) has reduced interest in concordance publishing.

Knowledge retrieval. Knowledge Retrieval seeks to return information in a structured form, consistent with human cognitive processes as opposed to simple lists of data items. Knowledge management. Knowledge management (KM) is the process of capturing, developing, sharing, and effectively using organizational knowledge.[1] It refers to a multi-disciplined approach to achieving organisational objectives by making the best use of knowledge.[2] An established discipline since 1991 (see Nonaka 1991), KM includes courses taught in the fields of business administration, information systems, management, and library and information sciences.[3][4] More recently, other fields have started contributing to KM research; these include information and media, computer science, public health, and public policy.[5] Columbia University and Kent State University offer dedicated Master of Science degrees in Knowledge Management.[6][7][8] History[edit] In 1999, the term personal knowledge management was introduced; it refers to the management of knowledge at the individual level.[14]

Content_analysis. Metaknowledge. Metaknowledge or meta-knowledge is knowledge about a preselected knowledge. Taxonomies 3.0 « Trend Monitor 2.0. Taxonomy specialist Jan Wyllie, author of one of Ark Group’s biggest-selling special reports, is writing an updated report intended for publication before the end of the year. IK interviewed him about his reasons for bringing out a Third Edition.

What’s new since the old report that makes it worth writing a follow-up? The report, which was written four years ago, does include sections on folksonomies and tagging over the new user made Web of blogs and wikis which was at the beginning of what is now called Web 2.0. Now the millions who use the new free media of Web 2.0 just assign any descriptive words which come to mind, and hope to remember them, and that other people whom they would like to see their stuff will happen to use the same words.

Yet we know that taxonomies, especially faceted classification, add considerable meaning and value to the information retrieval experience.