approximate_string_matching

TwitterFacebook
Get flash to fully experience Pearltrees
http://en.wikipedia.org/wiki/Bitap_algorithm The bitap algorithm (also known as the shift-or , shift-and or Baeza–Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance — if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal.

Bitap algorithm - Wikipedia, the free encyclopedia

In computer science , approximate string matching (often colloquially referred to as fuzzy string searching ) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately. [ edit ] Overview

Approximate string matching - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Approximate_string_matching

Hamming distance - Wikipedia, the free encyclopedia

http://en.wikipedia.org/wiki/Hamming_distance Two example distances: 0100->1001 has distance 3 (red path); 0110->1110 has distance 1 (blue path)
Dice's coefficient , named after Lee Raymond Dice [ 1 ] and also known as the Dice coefficient , is a similarity measure over sets: http://en.wikipedia.org/wiki/Dice%27s_coefficient

Dice's coefficient - Wikipedia, the free encyclopedia

This page exists because the original home page seems to have disappeared.

pylevenshtein - A fast implementation of Levenshtein Distance (and others) for Python - Google Project Hosting

http://code.google.com/p/pylevenshtein/

SimString - A fast and simple algorithm for approximate string matching/retrieval

http://www.chokkan.org/software/simstring/ A fast and simple algorithm for approximate string matching/retrieval

7.4. difflib — Helpers for computing deltas — Python v2.7.2 documentation

http://docs.python.org/library/difflib.html#difflib.get_close_matches New in version 2.1. This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs.
http://www.sciencedirect.com/science/article/pii/0196677489900102

Journal of Algorithms : Fast parallel and serial approximate string matching

Department of Computer Science, School of Mathematical Sciences, Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel Consider the string matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern, or a superfluous character in the text, or a superfluous character in the pattern.