
Hadoop
Get flash to fully experience Pearltrees
EC2
In computer science , MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are.
MinHash
TokenNGramTokenizerFactory (LingPipe API)
Following on from Breck’s straightforward LingPipe-based application of Jaccard distance over sets (defined as size of their intersection divided by size of their union) in his last post on deduplication , I’d like to point out a really nice textbook presentation of how to scale the process of finding similar document using Jaccard distance.
Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashing « LingPipe Blog
GettingStartedWithHadoop - Hadoop Wiki
Note: for the 1.0.x series of Hadoop the following articles will probably be easiest to follow: The below instructions are primarily for the 0.2x series of Hadoop. Hadoop can be downloaded from one of the Apache download mirrors .How to read all files in a directory in HDFS using Hadoop filesystem API - Hadoop and Hive
Install Hadoop and Hive on Ubuntu Lucid Lynx
If you've got a need to do some map reduce work and decide to go with Hadoop and Hive, here's a brief tutorial on how to get it installed.Using Hadoop’s DistributedCache - Nube Technologies
While working with Map Reduce applications, there are times when we need to share files globally with all nodes on the cluster. This can be a shared library to be accessed by each task, a global lookup file holding key value pairs, jars or archives containing executable code.Map Reduce Secondary Sort Does It All | Mawazo
i 5 Votes I came across a question in Stack Overflow recently related to calculating a web chat room statistics using Hadoop Map Reduce .A6
CS246: Mining Massive Data Sets
Mining Massive Data Sets Winter 2011 Course information:Graph partitioning in MapReduce with Cascading - Ware Dingen
29 January 2012package forma ;
Hadoop input format for swallowing entire files.
Found New API Revised Classes of the Hadoop Definitive Guide Examples here by Oct 11
Is the cluster set up correctly?

