solr

TwitterFacebook
Get flash to fully experience Pearltrees
From time to time, users on the Lucene mailing list ask a variant of the following question: Given a term match in a document, what’s the best way to get a window of words around that match? Getting a window of words around a match can be useful for a lot of things, including, to name a few: Highlighting (although I’d recommend using Lucene’s Highlighter package for that) Co-occurrence analysis Sentiment analysis Question Answering Unfortunately, given how inverted indexes are structured, retrieving content around a match isn’t efficient without doing some extra work during indexing.

Accessing words around a positional match in Lucene

http://searchhub.org/2009/05/26/accessing-words-around-a-positional-match-in-lucene/
diacritic

indexing

http://wiki.apache.org/solr/SolrRelevancyFAQ

SolrRelevancyFAQ - Solr Wiki

Relevancy is the quality of results returned from a query, encompassing both what documents are found, and their relative ranking (the order that they are returned to the user.) Should I use the standard or dismax Query Parser The standard Query Parser uses SolrQuerySyntax to specify the query via the q parameter, and it must be well formed or an error will be returned. It's good for specifying exact, arbitrarily complex queries.
On Wednesday 28 May 2008 01:37:57 Otis Gospodnetic wrote: If you have tokenized fields of variable size and you want the field length to affect the relevance score, then you do not want to omit norms. Omitting norms is good for fields where length is of no importance (e.g. gender="Male" vs. gender="Female"). Omitting norms saves you heap/RAM, one byte per doc per field without norms, I believe. I am also toying with the hypothesis that omitting the field norm may be a good idea for title fields in languages with compound words, which typically consist of only a few words. http://markmail.org/message/avmlqt4x26gyb5fb#query:omitNorms%20solr+page:1+mid:7xdlhtw74rkdwfkv+state:results

omitNorm

Faceted search is the dynamic clustering of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field. Each facet displayed also shows the number of hits within the search that match that category. Users can then “drill down” by applying specific contstraints to the search results. Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search. It’s easiest to understand what faceted search is through an example, appropriately from CNET Reviews, the first website to use Solr even before it had been contributed to Apache by CNET. This example is actually faceted browsing because it started with all digital cameras and not a user search. http://searchhub.org/2009/09/02/faceted-search-with-solr/

Faceted Search with Solr | Enterprise Search support for Apache Lucene and Solr by Lucid Imagination

A popular feature of most modern search applications is the auto-suggest or auto-complete feature where, as a user types their query into a text box, suggestions of popular queries are presented. As each additional character is typed in by the user the list of suggestions is refined. There are several different approaches in Solr to provide this functionality, but we will be looking at an approach that involves using EdgeNGrams as part of the analysis chain. Two other approaches are to use either the TermsComponent (new in Solr 1.4) or faceting. http://searchhub.org/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Auto-Suggest From Popular Queries Using EdgeNGrams

Re: termFreq always = 1 ?

http://www.mail-archive.com/solr-user@lucene.apache.org/msg14308.html : Yes this may be my problem, : : But is there any solution to have only one "men" keyword indexed when i''ve : got something like this : SOLR-739 is working towards a new omitTf option for fields (taking advantage of a Lucene optimization for this case) but in the mean time the best options i can think of are 1) a custom TokenFilter that keeps track of every token it's ever seen and removes *all* dups 2) a custom Similarity with a tf() func that returns a constant value regardless of the input. (the termFreq stored in the index will be the same, but the scores will be equivilent) -Hoss <p style="text-align:right;color:#A8A8A8"></p>