background preloader

Algo

Facebook Twitter

Damn Cool Algorithms, Part 1: BK-Trees - Nick's Blog - Vimperator. Posted by Nick Johnson | Filed under coding, tech, damn-cool-algorithms This is the first post in (hopefully) a series of posts on Damn Cool Algorithms - essentially, any algorithm I think is really Damn Cool, particularly if it's simple but non-obvious. BK-Trees, or Burkhard-Keller Trees are a tree-based data structure engineered for quickly finding near-matches to a string, for example, as used by a spelling checker, or when doing a 'fuzzy' search for a term. The aim is to return, for example, "seek" and "peek" if I search for "aeek". What makes BK-Trees so cool is that they take a problem which has no obvious solution besides brute-force search, and present a simple and elegant method for speeding up searches substantially. BK-Trees were first proposed by Burkhard and Keller in 1973, in their paper "Some approaches to best match file searching".

The only copy of this online seems to be in the ACM archive, which is subscription only. Previous PostNext Post. Data Structure Visualization. Simple algorithms. Clever Algorithms: Nature-Inspired Programming Recipes. Damn Cool Algorithms: Levenshtein Automata. Posted by Nick Johnson | Filed under python, coding, tech, damn-cool-algorithms In a previous Damn Cool Algorithms post, I talked about BK-trees, a clever indexing structure that makes it possible to search for fuzzy matches on a text string based on Levenshtein distance - or any other metric that obeys the triangle inequality. Today, I'm going to describe an alternative approach, which makes it possible to do fuzzy text search in a regular index: Levenshtein automata. Introduction The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word.

Of course, if that were the only benefit of Levenshtein automata, this would be a short article. Construction and evaluation The diagram on the right shows the NFA for a Levenshtein automaton for the word 'food', with maximum edit distance 2. Because this is an NFA, there can be multiple active states. Indexing.