background preloader

The Xapian Project

The Xapian Project

Lucene - Apache Lucene Core Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Apache Lucene is an open source project available for free download. Lucene offers powerful features through a simple API: Scalable, High-Performance Indexing over 150GB/hour on modern hardwaresmall RAM requirements -- only 1MB heapincremental indexing as fast as batch indexingindex size roughly 20-30% the size of text indexed Powerful, Accurate and Efficient Search Algorithms Cross-Platform Solution Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs100%-pure JavaImplementations in other programming languages available that are index-compatible The Apache Software Foundation The Apache Software Foundation provides support for the Apache community of open-source software projects.

DataparkSearch Engine - an open source search engine Spelling Checker using Lucene My initial interest in spell checking algorithms started when I had to fix some bugs in the spell checking code at work over a year ago. We use Jazzy, a Java implementation of GNU Aspell, as our spell checking library. Jazzy uses a combination of Metaphone and Levenshtein distance (aka Edit distance) to match a misspelled word to a set of words in its dictionary. An alternative approach to spell checking is the use of n-grams. Basically you break up your dictionary word into sequences of characters of size n, moving your pointer forward one character at a time, and store it in an index. When the user enters a misspelled word, you do the same thing with his input word, then match the ngrams generated to the ngrams in your dictionary. I read about this approach first on Tom White's article "Did you mean Lucene?" You see, the spell checking on most search sites is surfaced as a one line "Did you mean..." component at the top of your search results.

mnoGoSearch - Internet search engine software Welcome to Uclue Open Source Search Results Clustering Engine No. Carrot2 can add clustering of search results to an existing search engine. You can use an Open Source project called Nutch to crawl your website. Absolutely. No. The most important characteristic of Carrot2 algorithms to keep in mind is that they perform in-memory clustering. Yes. Yes. Please put a statement equivalent to "This product includes software developed by the Carrot2 Project" on your site and link it to Carrot2's website ( The focus of the Carrot2 project is on clustering algorithms. Microsoft provides a search API for Bing with a free monthly limit of 5000 requests. We provide the search interface as a demo of the technology and we use partnership with a company called Comcepta (eTools) for providing a limited number of free search requests. If you wish to extend your query limits please install Carrot2 locally and contact Comcepta for custom query limit arrangements. Apologies for inconvenience. sudo apt-get install libwebkitgtk-1.0-0

Apache UIMA - Apache UIMA TEK: An Email-Based Web Browser

Related: