background preloader

Algorithms

Facebook Twitter

Method and apparatus for suggesting completions for a partially entered data item based on previously-entered, associated data items - US Patent 5845300 Description. Description The present invention relates generally to the field of computer-based applications requiring data entry and more particularly to the field of improving the data entry process by automatically completing a partially entered data item with a matching data item from a list of previously entered data items.

Method and apparatus for suggesting completions for a partially entered data item based on previously-entered, associated data items - US Patent 5845300 Description

In the field of data processing systems, data is entered into databases for the purposes of rapid searching, retrieving or processing. Since its inception, the field of data processing has been bottle-necked by the time consuming and human-error prone process of data entry. Therefore, it is desirable to improve the efficiency and reliability of entering data into a database system. One method to achieve this objective is the use of automatic completion algorithms to assist in the data entry process. Automatic data entry completion algorithms have appeared in various types of applications. The Operating Environment. Weblog: What Makes a Good Autocomplete? Redesign We’ve been working on a problem for the past couple of weeks: an optimal autocomplete algorithm.

Weblog: What Makes a Good Autocomplete?

Many of our users have said that while Enso is great, it requires a bit too much typing. We’re inclined to agree. Yet, figuring out the best solution is tricky: there are more autocomplete algorithms than bones in a school of lionfish. Trie. A trie for keys "A","to", "tea", "ted", "ten", "i", "in", and "inn".

Trie

Note that this example does not have all the children alphabetically sorted from left to right as it should be (the root and node 't'). In the example shown, keys are listed in the nodes and values below them. Each complete English word has an arbitrary integer value associated with it. A trie can be seen as a tree-shaped deterministic finite automaton. Each finite language is generated by a trie automaton, and each trie can be compressed into a deterministic acyclic finite state automaton. Though tries are usually keyed by character strings,[not verified in body] they need not be. History and etymology[edit] Applications[edit] Levenshtein distance. Several definitions of edit distance exist, using different sets of string operations.

Levenshtein distance

One of the most common variants is called Levenshtein distance, named after the Soviet Russian computer scientist Vladimir Levenshtein. In this version, the allowed operations are the removal or insertion of a single character, or the substitution of one character for another. Levenshtein distance may also simply be called "edit distance", although several variants exist.:32 Formal definition and properties[edit] Given two strings a and b on an alphabet Σ (e.g. the set of ASCII characters, the set of bytes [0..255], etc.), the edit distance d(a, b) is the minimum-weight series of edit operations that transforms a into b.

Insertion of a single symbol. Deletion of a single symbol changes uxv to uv (x→ε). Substitution of a single symbol x for a symbol y ≠ x changes uxv to uyv (x→y). Additional primitive operations have been suggested. Example[edit] Autocomplete. Autocomplete, or word completion, is a feature provided by many web browsers, e-mail programs, search engine interfaces, source code editors, database query tools, word processors, and command line interpreters.

Autocomplete

Autocomplete is also available for, or already integrated in, general text editors. Autocomplete involves the program predicting a word or phrase that the user wants to type in without the user actually typing it in completely. Damn Cool Algorithms: Levenshtein Automata. Posted by Nick Johnson | Filed under python, coding, tech, damn-cool-algorithms In a previous Damn Cool Algorithms post, I talked about BK-trees, a clever indexing structure that makes it possible to search for fuzzy matches on a text string based on Levenshtein distance - or any other metric that obeys the triangle inequality. Today, I'm going to describe an alternative approach, which makes it possible to do fuzzy text search in a regular index: Levenshtein automata.

Introduction The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word. We can then feed in any word, and the automaton will accept or reject it based on whether the Levenshtein distance to the target word is at most the distance specified when we constructed the automaton.

Damn Cool Algorithms, Part 1: BK-Trees. Posted by Nick Johnson | Filed under coding, tech, damn-cool-algorithms This is the first post in (hopefully) a series of posts on Damn Cool Algorithms - essentially, any algorithm I think is really Damn Cool, particularly if it's simple but non-obvious.

Damn Cool Algorithms, Part 1: BK-Trees

BK-Trees, or Burkhard-Keller Trees are a tree-based data structure engineered for quickly finding near-matches to a string, for example, as used by a spelling checker, or when doing a 'fuzzy' search for a term. The aim is to return, for example, "seek" and "peek" if I search for "aeek". Algorithms. Algorithm Tutorials. Disjoint-set Data Structures By vlad_DTopCoder Member Introduction Many times the efficiency of an algorithm depends on the data structures used in the algorithm.

Algorithm Tutorials

A wise choice in the structure you use in solving a problem can reduce the time of execution, the time to implement the algorithm and the amount of memory used. During SRM competitions we are limited to a time limit of 2 seconds and 64 MB of memory, so the right data structure can help you remain in competition. While some Data Structures have been covered before, in this article we'll focus on data structures for disjoint sets.

The problem Let’s consider the following problem: In a room are N persons, and we will define two persons are friends if they are directly or indirectly friends. In the end there are 2 groups of friends: one group is {1, 2, 4, 5}, the other is {3}. The solution This problem can be solved using BFS, but let’s see how to solve this kind of problem using data structures for disjoint sets. FIND-SET(x) If (x !