background preloader

Automated Libraries

Facebook Twitter

Vitalie Scurtu's Official Page. Short Text Language Detection with Infinity-Gram. Language Detection Library for Java. Language Detection for twitter with 99.1% Accuracy. I released a newer prototype of Language Detection ( Language Identification ) with Infinity-Gram (ldig), a language detection prototype for twitter.

Language Detection for twitter with 99.1% Accuracy

It detects tweets in 17 languages with 99.1% accuracy (Czech, Dannish, German, English, Spanish, Finnish, French, Indonesian, Italian, Dutch, Norwegian, Polish, Portuguese, Romanian, Swedish, Turkish and Vietnamese). ldig specialized with noisy short text (more than 3 words) and is limited to Latin alphabet language because input text can separate into character type blocks and Latin alphabet detection is most difficult. My language-detection (langdetect) is not good at short text detection, so that most users seem troubled in language detection for twitter. langdetect uses character 3-grams as feature so it is insufficient for short text detection.

I supposed that maximal substrings [Okanohara+ 09] makes sufficient features for short text detection and prepared twitter corpus with 17 languages. Reference Like this: Like Loading... How to use the new Bing translator API with access tokens. Bing translator has changed its API recently, and it forces developers to use a more complicated way than the previous way using AppID.

How to use the new Bing translator API with access tokens

The new API involves a temporal token, named as access token, which will expire in 10 minutes after you get it. The detailed steps of using the new API with access tokens are as follows (all the parameters for the curl command should be used after URL encode): (1) sign up for API access at Azure Data Market, from which you can get your Client Secret (a string, in My Account -> Account Keys); (2) register the client application (that is, the app using the API), where you can create your own Client ID and Name, and you also have to type in redirect URI which should be a valid URL address (like " (3) to get an access token, make a POST request to:

Searching by Title - Product Advertising API. The response is a set of items (up to 10 per page) represented in XML.

Searching by Title - Product Advertising API

Below is an example of two of the returned products with a subset of the catalog information: Getting Started with Microsoft Translator. Even as the world becomes smaller through the use of many modern communication methods, language differences still create a chasm between people.

Getting Started with Microsoft Translator

How can this barrier be overcome when creating software or websites? The answer lies in machine translation. This paper will help users get started using the Microsoft Translator API in software that they write using any of four methods: a Web widget, AJAX, HTTP, or SOAP. Contents The Translator Web Widget – Translating an Entire Site Using the AJAX API – Translating a Block of Text Using the HTTP API – Detecting the Language of a Block of Text Using the SOAP API – Connecting the Language List Summary The Translator Web Widget – Translating an Entire Site Easily use the Microsoft Translator Web Widget for basic website translation scenarios. 53 Books APIs: Google Books, Goodreads and SharedBook. Bing Translator for developers. Translation - Alternative to Google Translate API. PHP Spell Check. Php - GET Spell Checker/Correction API. Spell Check. Useful web resources related to automatic topic indexing.

Www.sigir.org/museum/pdfs/Relevance_Feedback_In_An_Automatic_Document_Retrieval_System/pdfs/p3-section_1.pdf. Information Processing & Management - Automatic index construction for multimedia digital libraries. Abstract Indexing remains one of the most popular tools provided by digital libraries to help users identify and understand the characteristics of the information they need.

Information Processing & Management - Automatic index construction for multimedia digital libraries

Despite extensive studies of the problem of automatic index construction for text-based digital libraries, the construction of multimedia digital libraries continues to represent a challenge, because multimedia objects usually lack sufficient text information to ensure reliable index learning. This research attempts to tackle the problem of automatic index construction for multimedia objects by employing Web usage logs and limited keywords pertaining to multimedia objects. Www.cis.strath.ac.uk/cis/research/publications/papers/strath_cis_publication_317.pdf. 0-www.ala.org.sapl.sat.lib.tx.us/acrl/sites/ala.org.acrl/files/content/conferences/pdf/marion.pdf. Indexing, from thesauri to the Semantic Web. Ils.unc.edu/mrc/wp-content/uploads/2012/02/hive_poster_iconference_2012.pdf. Automatic indexing. SIGIR, from its very onset in the early 1960's, has been concerned with the development of automatic information retrieval systems and with improving the effectiveness of automatic indexing.

Automatic indexing

Automatic indexing is a subsystem, or component, of an automatic information retrieval system. The term "automatic" implies that the process is to be accomplished by a set of computer programs rather than by the intellectual effort of skilled people. The essential questions that need to be answered are how automatic indexing can:• Adequately represent the subject content of a document;• Improve recall by increasing the number of relevant documents retrieved;• Improve precision by decreasing the number of non-relevant documents retrieved.In this tutorial, I will review the basic and improved procedures that have been devised to respond to each of these questions.

[cs/9902022] Semi-Automatic Indexing of Multilingual Documents. Www.dsc.ufcg.edu.br/~ulrich/Artigos/SIM-RIDE MLIM.pdf. Garfield.library.upenn.edu/essays/V1p084y1962-73.pdf. Automatic Indexing: A Matter of Degree. By Marjorie M.K.

Automatic Indexing: A Matter of Degree

Hlava October 2002 First published in the Bulletin of the American Society for Information Science and Technology, Vol. 29 No. 1, October/November 2002. Table of Contents DefinitionsWhat Systems Are There? How Should They Be Applied? What Are the Strengths and Weaknesses? Picture yourself standing at the base of that metaphorical range, the Information Mountains, trailhead signs pointing this way and that: Taxonomy, Automatic Classification, Categorization, Content Management, Portal Management. In general, it’s been those venture-funded systems and their followers, the knowledge management people and the taxonomy people. We failed to keep up. Definitions The current challenge is to understand, in your own terms, what automatic indexing systems really do and whether you can use them with your own information collection.

These definitions are patterned after the forthcoming revision of the British National Standard for Thesauri, but do not exactly replicate that work. Summary. Automatic Indexing: A State-of-the-Art Report : UNT Digital Library. Automated Indexing: The Key to Information Retrieval in the 21st Century, Tony I. Obaseki. Introduction Global changes in physical infrastructure, population, technological development, and climate have contributed to an information explosion.

Automated Indexing: The Key to Information Retrieval in the 21st Century, Tony I. Obaseki

This is a major challenge to information managers, who are faced, not only with the challenge of selecting, acquiring, and storing the information, with the perennial problem of how to make it available to potential users quickly and easily. The world is shifting from manual to automated practices. Automatic Indexing. Automatic Indexing Automatic indexing is indexing made by algorithmic procedures.

Automatic Indexing

The algorithm works on a database containing document representations (which may be full text representations or bibliographical records or partial text representations and in principle also value added databases). Automatic indexing may also be performed on non-text databases, e.g. images or music. In text-databases may the algorithm perform string searching, but is mostly based on searching the words in the the single document representation as well as in the total database (via inverted files). The use of words is mostly based on stemming). Digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1347&context=libphilprac. Www.shieldsnetwork.com/LI842_Shields_Automatic_Indexing.pdf. Automated Digital Libraries: How Effectively Can Computers Be Used for the Skilled Tasks of Professional Librarianship?

The cost of access to research information Libraries are expensive and research libraries are particularly expensive.

Automated Digital Libraries: How Effectively Can Computers Be Used for the Skilled Tasks of Professional Librarianship?

Even in the United States, few people can afford good access to primary scientific, medical, legal and scholarly information. Members of major universities have excellent library services. So do people who work in teaching hospitals, or for drug companies or rich law firms.