background preloader

Apache Solr

Facebook Twitter

La recherche Full Text avec Solr. Apache Lucene est un moteur d'indexation de texte permettant d'effectuer des recherches en langage naturel à l'aide de diverses manipulations automatiques du texte. Le texte indexé est enregistré sous de multiples représentations, de même pour le texte recherché, et les résultats de recherche sont déterminés suite à la comparaison de ces variantes.

Apache Solr étend le principe de Lucene en facilitant l'administration (interface RESTful) et en ajoutant des fonctionnalités : filtres de recherche, manipulation des résultats, etc. Dans ce tutoriel, vous apprendrez à mettre en place un serveur Tomcat avec plusieurs schémas Solr. Cet article est destiné aux programmeurs ayant déjà mis en place au moins un moteur de recherche sur un site ou dans une application.

Si vous n'avez encore jamais mis en place de moteur de recherche ou si vous n'avez remarqué aucun problème en les utilisant, vous risquez de ne pas comprendre certaines subtilités, voire même l'intérêt d'utiliser Solr. 10 commentaires. Auto-Suggest From Popular Queries Using EdgeNGrams. A popular feature of most modern search applications is the auto-suggest or auto-complete feature where, as a user types their query into a text box, suggestions of popular queries are presented. As each additional character is typed in by the user the list of suggestions is refined.

There are several different approaches in Solr to provide this functionality, but we will be looking at an approach that involves using EdgeNGrams as part of the analysis chain. Two other approaches are to use either the TermsComponent (new in Solr 1.4) or faceting. N-grams and Edge N-grams An N-gram is an n-character substring of a longer sequence of characters. For example, the term “cash” is composed of the following n-grams:unigrams: “c”, “a”, “s”, “h”bigrams: “ca”, “as”, “sh”trigrams: “cas”, “ash”4-grams: “cash” N-grams can be useful when substrings of terms need to be searched.

An Edge n-gram is an n-gram built from one side or edge of a term. An Overview of the Process Configuring schema.xml. What’s a “DisMax” ? The term “dismax” gets tossed around on the Solr lists frequently, which can be fairly confusing to new users. It originated as a shorthand name for the DisMaxRequestHandler (which I named after the DisjunctionMaxQueryParser, which I named after the DisjunctionMaxQuery class that it uses heavily). In recent years, the DisMaxRequestHandler and the StandardRequestHandler were both refactored into a single SearchHandler class, and now the term “dismax” usually refers to the DisMaxQParser.Clear as Mudd, right?

Regardless of whether you use the DisMaxRequestHandler via the qt=dismax parameter, or use the SearchHandler with the DisMaxQParser via defType=dismax the end result is that your q parameter gets parsed by the DisjunctionMaxQueryParser.The original goals of dismax (whichever meaning you might infer) have never changed: … supports a simplified version of the Lucene QueryParser syntax. DefType = dismax mm = 50% qf = features^2 name^3 q = +"apache solr" search server With me so far right? Presentatation du 10 Aout 2011. Apache Solr - Do only what matters. Apache SolrCloudZooKeeper ClusterJ LeaderLatch, Barrier and NodeCache Overseer and OverseerCollectionProcessor We do not want to have to write a separate init script for Solr. My unanswered questions on SolrUnread articlesMiscellaneousRunning Apache Solr in a cloud I was having a problem with using wildcard. Try using EdgeNGrams. String is not analyzed. No. Solr Cell Project. A.k.a the "Solr Cell" project!

Solr1.4 A common need of users is the ability to ingest binary and/or structured documents such as Office, Word, PDF and other proprietary formats. The Apache Tika project provides a framework for wrapping many different file format parsers, such as PDFBox, POI and others. Solr's ExtractingRequestHandler uses Tika to allow users to upload binary files to Solr and have Solr extract text from it and then index it. Before getting started, there are a few concepts that are helpful to understand. Tika will automatically attempt to determine the input document type (word, pdf, etc.) and extract the content appropriately. Now start the solr example server: cd example java -jar start.jar In a separate window go to the docs/ directory (which contains some nice example docs), or the site directory if you built Solr from source, and send Solr a file via HTTP POST: cd site/html curl " curl "