Robot d'indexation

Un article de Wikipédia, l'encyclopédie libre. Pour les articles homonymes, voir Spider. Fonctionnant sur le même principe, certains robots malveillants (spambots) sont utilisés pour archiver les ressources ou collecter des adresses électroniques auxquelles envoyer des courriels. En Français, depuis 2013, crawler est remplaçable par le mot collecteur[1]. Il existe aussi des collecteurs analysant finement les contenus afin de ne ramener qu'une partie de leur information. Principes d'indexation[modifier | modifier le code] Pour indexer de nouvelles ressources, un robot procède en suivant récursivement les hyperliens trouvés à partir d'une page pivot. Un fichier d'exclusion (robots.txt) placé dans la racine d'un site Web permet de donner aux robots une liste de ressources à ignorer. Deux caractéristiques du Web compliquent le travail du robot d'indexation : le volume de données et la bande passante. Le comportement d'un robot d'indexation résulte de la combinaison des principes suivants :

Indexation automatique de documents Un article de Wikipédia, l'encyclopédie libre. Un index est en toute généralité, une liste de descripteurs à chacun desquels est associée une liste des documents et/ou parties de documents auxquels ce descripteur renvoie. Ce renvoi peut être pondéré. Lors de la recherche d'information d'un usager, le système rapprochera la demande de l'index pour établir une liste de réponses. En amont, les méthodes utilisées pour constituer automatiquement un index pour un ensemble de documents varient considérablement avec la nature des contenus documentaires à indexer. Indexation de textes[modifier | modifier le code] Pour un texte, un index très simple à établir automatiquement est la liste ordonnée de tous les mots apparaissant dans les documents avec la localisation exacte de chacune de leurs occurrences ; mais un tel index est volumineux et surtout peu exploitable. L'indexation automatique tend donc plutôt à rechercher les mots qui correspondent au mieux au contenu informationnel d'un document.

Search engine indexing Popular engines focus on the full-text indexing of online, natural language documents.[1] Media types such as video and audio[2] and graphics[3] are also searchable. Meta search engines reuse the indices of other services and do not store a local index, whereas cache-based search engines permanently store the index along with the corpus. Unlike full-text indices, partial-text services restrict the depth indexed to reduce index size. Indexing[edit] The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Index design factors[edit] Major factors in designing a search engine's architecture include: Merge factors Storage techniques How to store the index data, that is, whether information should be data compressed or filtered. Index size How much computer storage is required to support the index. Lookup speed How quickly a word can be found in the inverted index. Maintenance How the index is maintained over time.[5] Fault tolerance

Society of Indexers History[edit] The Society of Indexers was formally constituted at the premises of the National Book League in the UK on 30th March 1957 by G. Norman Knight and approximately 60 other people. He "count[ed] it as one of the achievements of the Society to have removed the intense feeling of solitude in which the indexer (of books and journals, at any rate) used to work."[1] Later members in various areas of the world grouped together and formed societies which are now affiliated Publications[edit] It started publishing its journal, The Indexer ISSN 0019-4131 (print) ISSN 1756-0632 (online), in 1958 which continues today and is the official journal of all the indexing societies. The society newsletter SIdelights is published quarterly and is only available to society members. Conferences[edit] Conferences are held, usually annually and in the UK. References[edit] External links[edit]

Tim Craven - Freeware 32-bit Windows packages (The self-extractors for these packages currently all require 16-bit support. In case of a "16-bit MS-DOS Subsystem" error message, consult the Microsoft help page at (In Windows Vista, running the self-extractors as administrator is recommended. C:\Users\username\Local\Temp\_INS0432. (An alternative to running a self-extractor as a program is to change the extension to , extract the contents, and run in the folder containing the extracted files.) (Using XP compatibility mode may also help with some problems.) (In Windows XP and Vista, the applications are best viewed with "Windows and Buttons" set to "Windows Classic Style".) (There are no specifically 64-bit versions of these programs, nor are there likely to be. Article on using TheW32: De Vorsey, K.L.; Elson, C.; Gregorev, N.P.; Hansen, J. 2006. For sample XRefHT32 indexes, see Java packages Source code. Download Java from java.com Lewisboro Library Index

Home :: The Society of Indexers Indexing and abstracting service The product is often an abstract journal or a bibliographic index, which may be a subject bibliography or a bibliographic database. Guidelines for indexing and abstracting, including the evaluation of such services, are given in the literature of library and information science.[3] See also[edit] References[edit] Jump up ^ Manzer, B. M. (1977). External links[edit] Back-of-the-book index Perhaps the most advanced investigation of problems related to book indexes is made in the development of topic maps, which started as a way of representing the knowledge structures inherent in traditional back-of-the-book indexes. Texts about the indexing of specialized books include: History (Towery, 1998), law books (Kendrick & Zafran, 2001), medicine (Wyman, 1999), psychology (Hornyak, 2002), among others. Indexing software[edit] Commercial software packets are available for aiding an indexer in building a book index.[1] Book indexing as a profession[edit] In the United States, according to tradition, the index for a non-fiction book is the responsibility of the author, but most authors don't actually do it. American Society for Indexing[edit] The American Society for Indexing, Inc. See also[edit] References[edit] Further reading[edit] Wu, Z. etc. (2013). External links[edit] American Society for Indexing