Directed Acyclic Word Graphs. A Directed Acyclic Word Graph, or DAWG, is a data structure that permits extremely fast word searches. The entry point into the graph represents the starting letter in the search. Each node represents a letter, and you can travel from the node to two other nodes, depending on whether you the letter matches the one you are searching for. It's a Directed graph because you can only move in a specific direction between two nodes. In other words, you can move from A to B, but you can't move from B to A.
The description is a little confusing without an example, so imagine we have a DAWG containing the words CAT, CAN, DO, and DOG. C --Child--> A --Child--> N (EOW) | | | Next Next | | v | T (EOW) v D--Child--> O (EOW) --Child --> G (EOW) Now, imagine that we want to see if CAT is in the DAWG. One of the tricks with making a DAWG is trimming it down so that words with common endings all end at the same node. D --Child--> O --Child--> G(EOW) | ^ Next | | | v | L --Child---- Creating a DAWG. Graph DB. It’s pretty clear to computer science geeks that Directed Edge is supposed to be doing groovy things with graphs. In fact our recommendation engine, and some of the things that are unique about our approach to recommendations, are built on our super-fast graph database. When we went live yesterday with the latest version of our recommendations web services, another, much bigger thing happened behind the scenes for us: we cut over to the new version of our graph database.
Every time that Directed Edge gets mentioned in nerdier circles we get a lot of questions about this fabled graph-engine, so we thought we’d indulge our techie friends with some background info. When we first decided to build the Directed Edge engine, we’d built some in-memory and RDF-store based prototypes to start hashing out the recommendations algorithms, but the RDF stores couldn’t keep up performance-wise and staying memory-based obviously wasn’t an option for persistent data. So, on to geekery. Column Locking. Playing with Scala 4: Abstract types and Self-types. Disjoint-set data structure. MakeSet creates 8 singletons. After some operations of Union, some sets are grouped together. Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.Union: Join two subsets into a single subset. In order to define these operations more precisely, some way of representing the sets is needed.
One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Disjoint-set linked lists[edit] A simple approach to creating a disjoint-set data structure is to create a linked list for each set. MakeSet creates a list of one element. This can be avoided by including in each linked list node a pointer to the head of the list; then Find takes constant time, since this pointer refers directly to the set representative. When the length of each list is tracked, the required time can be improved by always appending the smaller list to the longer. We now explain the bound above.