background preloader

Compression, Searching

Facebook Twitter

SNA LinkedIn. Zoie. Kamikaze. What is Kamikaze ? Kamikaze is a utility package for effectively compressing sorted integer arrays, which are represented as docIdSets, and performing highly efficient operations on the compressed arrays or docIdSets. Kamikaze represents the compressed integer arrays as integer sets and calls them docIdSets (the docIdSet concept is similar to that used in Lucene). Kamikaze can achieve an extremely fast decompression speed with a decent compression ratio on sorted arrays (or docIdSets).

It can efficiently find the intersection or the union of N compressed arrays (or docIdSets), quickly detect the existence of an given integer in the compressed arrays (or docIdSets), etc. Why is Kamikaze useful ? Traditionally, the compression techniques are used to save storage space on disks. Where can Kamikaze be used ? Search indexes, graph algorithms, and certain sparse matrix representations make heavy use of compressed integer arrays. The Magic of Kamikaze: P4Delta Compression Kamikaze @Linkedin.