Efficient substring searching There are many times when programmers need to search for a substring, for example when parsing text. This is commonly referred to as searching for a needle (substring) in a haystack (the string to search in). The most straightforward way to do this is by using search functions that your language provides: C: strchr()/memchr(), strstr()/memmem()C++: string::find()Ruby: String#index or regular expressionsPython: string.find() or regular expressions However those functions are usually implemented in a naive way. They usually go through every index in the haystack and try to compare the substring at that index with the given needle. In this article we’ll examine smarter algorithms, in particular Boyer-Moore and its variants. Before we move on, it should be noted that Python’s string.find(), Ruby regular expressions and glibc’s implementation of memmem() actually can use smarter algorithms when conditions are right, but that is besides the main point of this article. Smarter algorithms The code
Damn Cool Algorithms: Levenshtein Automata Posted by Nick Johnson | Filed under python, coding, tech, damn-cool-algorithms In a previous Damn Cool Algorithms post, I talked about BK-trees, a clever indexing structure that makes it possible to search for fuzzy matches on a text string based on Levenshtein distance - or any other metric that obeys the triangle inequality. Today, I'm going to describe an alternative approach, which makes it possible to do fuzzy text search in a regular index: Levenshtein automata. Introduction The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word. We can then feed in any word, and the automaton will accept or reject it based on whether the Levenshtein distance to the target word is at most the distance specified when we constructed the automaton. Of course, if that were the only benefit of Levenshtein automata, this would be a short article. Indexing
OLAP cubes, outdated BI technology? As businesses demand that more employees have access to the benefits of expansive, real-time data analysis, it seems that the latency and complexity associated with OLAP cubes will soon see them rendered to the pages of technological history by a new breed of operational Business Intelligence (BI) tools. – Tools that cater for pervasive and virtually instantaneous data analysis and reporting through in-memory analytics. But they’ll survive; for now. What do OLAP cubes offer? The ability of OLAP cubes to facilitate multifaceted data analysis in response to complex business queries, will see them maintain some degree of usefulness, as businesses accumulate increasingly large data volumes of increasing complexity. Because OLAP cubes can be made up of more than three dimensions (hypercube), in-depth analysis is enabled, allowing users to gain comprehensive and valuable business insights. OLAP cubes can also perform data analysis without internet connectivity. In-memory analytics
Create UML diagrams online in seconds, no special tools needed. Actor Actor and Use Case Notes Many Use Cases Actor Inheritance Multiple Actors And Inheritance <<Extends>> <<Includes>> <img src=" Registration)" > Meaty Example Create Your Own >> Damn Cool Algorithms, Part 1: BK-Trees - Nick's Blog - Vimperator Posted by Nick Johnson | Filed under coding, tech, damn-cool-algorithms This is the first post in (hopefully) a series of posts on Damn Cool Algorithms - essentially, any algorithm I think is really Damn Cool, particularly if it's simple but non-obvious. BK-Trees, or Burkhard-Keller Trees are a tree-based data structure engineered for quickly finding near-matches to a string, for example, as used by a spelling checker, or when doing a 'fuzzy' search for a term. The aim is to return, for example, "seek" and "peek" if I search for "aeek". BK-Trees were first proposed by Burkhard and Keller in 1973, in their paper "Some approaches to best match file searching". Before we can define BK-Trees, we need to define a couple of preliminaries. Now we can make a particularly useful observation about the Levenshtein Distance: It forms a Metric Space. These three criteria, basic as they are, are all that's required for something such as the Levenshtein Distance to qualify as a Metric Space.
Visualising sorting algorithms This is another one of my rare technical posts, as opposed to news of which countries I've been visiting. If you're in computer science, you've probably seen an animation of sorting algorithms, maybe heard a rendition, or seen a visual representation. I have, somewhat by accident, discovered a different way to visualise a sorting algorithm: plot points for memory accesses, with address on the X axis and time (counted by accesses) on the Y axis, and different colours for reads and writes. It produces some rather pretty pictures. Ye olde bubblesort. Insertion sort - the version optimized for mostly sorted content. Shellsort, clearly showing the phases. Selection sort: Heapsort: the solid lines at the top are the heap-building phase, while the rest shows the extraction. Divide-and-conquer algorithms have a pretty fractal nature. Mergesort: this diagram is twice as wide as the others because it uses temporary storage on the right.
Software Development AntiPatterns Good software structure is essential for system extension and maintenance. Software development is a chaotic activity, therefore the implemented structure of systems tends to stray from the planned structure as determined by architecture, analysis, and design. Software refactoring is an effective approach for improving software structure. The resulting structure does not have to resemble the original planned structure. The structure changes because programmers learn constraints and approaches that alter the context of the coded solutions. For example, the solution for the Spaghetti Code AntiPattern defines a software development process that incorporates refactoring. Development AntiPatterns utilize various formal and informal refactoring approaches. The BlobProcedural-style design leads to one object with a lion’s share of the responsibilities, while most other objects only hold data or execute simple processes. Read next All of the AntiPatterns are compiled there. Learn more
Google Code Jam - Rotate It’s time for some basic finger exercise. The Google Code Jam Rotate is very trivial, so relax and fire up your IDE. I was a bit lazy, so there is no reading of the input sets, just a two-dimensional array and two functions Rotating As the Google solution pointed out, there is actually no need to really rotate the 2dim array. Just push everything to the right, as if gravity would be to the right. So here is the “gravity from the right” code public static void fakeRotate ( char [][] board ) { for ( int i = 0 ; i < N ; i ++) { for ( int j = N - 1 ; j >= 0 ; j --) { if ( board [ i ][ j ] ! // push to right int m = 1 ; while (( j + m ) < N && board [ i ][ j + m ] == '.' ) { board [ i ][ j + m ] = board [ i ][ j + ( m - 1 )]; board [ i ][ j + ( m - 1 )] = ' m ++; Checking for a winner Now we have everything ready to look for a winner. Progressing this way, I only need to check in four directions. public static void checkForWinner ( char [][] board ) { boolean redWins = false ; if ( board [ i ][ j ] !
10 Great Tips for Writing Better And More Comprehensive CSS There are many different coding styles, some do not like indentation, some like to capitalize certain things, others like to add more than one element on a line, the main train of thought is they are all after one common thing: organization and better code. Without influencing my coding style, we’ll discuss ten tips for writing better CSS. Let us know what you think in the comment section! See you there ;) Comments Commenting throughout your style sheet significantly helps you locate certain code blocks of CSS quickly and efficiently. Indentation Indentation is another key to keeping your code neatly organized and easy to flip through. Shorthand Code I have seen many style sheets consist of non-shorthand code. One Line per Rule One of the most irritating things to see is multiple rules written out on a single line as if the person who coded it was running out of lines and crammed everything onto several. Hacks Should Stay Out Meaningful Names for Classes and IDs Alphabetical Order
Coin Tosses, Binomials and Dynamic Programming Today someone asked about the probability of outcomes in relation to coin tosses on Stackoverflow. It’s an interesting question because it touches on several areas that programmers should know from maths (probability and counting) to dynamic programming. Dynamic programming is a divide-and-conquer technique that is often overlooked by programmers. It can sometimes be hard to spot situation where it applies but when it does apply it will typically reduce an algorithm from exponential complexity (which is impractical in all but the smallest of cases) to a polynomial solution. I will explain these concepts in one of the simplest forms: the humble coin toss. Bernoulli Trials A Bernoulli trial is an event (or experiment) that randomly has two outcomes. What constitutes success is arbitrary. Assume a fair coin (p = 0.5). If you ignore the actual values and reduce it to the number of permutations: Divide the number of desired outcomes by the total number of outcomes and you have your probability.
10 Common Mistakes Made by API Providers Twitter was one of the first to see what happened when traffic to the site came more from the API than the Web. It now has more than 65 million tweets per day, most coming from services that use the Twitter API. Twitter has made numerous changes to fix its API. But there is still a lot for providers to learn. Considering this, we asked developers and service providers to help us prepare a list of 10 common mistakes made by API providers. Our group of commentators include Adam DuVander executive editor at Programmable Web; Mike Pearce, a developer out of the United Kingdom who writes a lot about scrum and Agile; Mashery's Clay Loveless and Sonoa Systems Sam Ramji. 1. "Databases fail, backend dependencies get slow, and/or someone somewhere along the line doesn't escape output properly. 2. "Sometimes we see providers expecting the API alone to attract developers. 3. "APIs are about bringing the scale of the Internet to bear on your business - but what if that scale actually happens? 4. 5. 6.