background preloader


Facebook Twitter

Kyoto Cabinet: a straightforward implementation of DBM. Copyright (C) 2009-2012 FAL Labs Last Update: Fri, 04 Mar 2011 23:07:26 -0800 Overview Kyoto Cabinet is a library of routines for managing a database.

Kyoto Cabinet: a straightforward implementation of DBM

The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Kyoto Cabinet runs very fast. Kyoto Cabinet is written in the C++ language, and provided as API of C++, C, Java, Python, Ruby, Perl, and Lua. Documents The following are documents of Kyoto Cabinet. Packages The following are the source packages of Kyoto Cabinet. Source Packages of the core library (C/C++) Binary Packages for Windows (C/C++/Java) Information. Performance comparison: key/value stores for language model counts - Brendan O'Connor's Blog. I’m doing word and bigram counts on a corpus of tweets.

Performance comparison: key/value stores for language model counts - Brendan O'Connor's Blog

I want to store and rapidly retrieve them later for language model purposes. So there’s a big table of counts that get incremented many times. The easiest way to get something running is to use an open-source key/value store; but which? There’s recently been some development in this area so I thought it would be good to revisit and evaluate some options. Here are timings for a single counting process: iterate over 45,000 short text messages, tokenize them, then increment counters for their unigrams and bigrams. Eventually, I’ll want a purely in-memory, distributed table. More details on the options: Python dictionary: defaultdict(int) is the simplest and most obvious implementation. I can’t say this evaluation tells us too much about the server systems, since it’s all for a single process, which really isn’t their use case. Shades of Gray: Tokyo Cabinet's Key-Value Database Types.

We've taken a good look at Tokyo Cabinet's Hash Database, but there's a lot more to the library than just that.

Shades of Gray: Tokyo Cabinet's Key-Value Database Types

Tokyo Cabinet supports three other kinds of databases. In addition, each database type accepts various tuning parameters that can be used to change its behavior. Each database type and setting involves different tradeoffs so you really have a lot of options for turning Tokyo Cabinet into exactly what you need. Let's look into some of those options now. Tokyo Cabinet: a modern implementation of DBM. Copyright (C) 2006-2011 FAL Labs Last Update: Thu, 05 Aug 2010 15:05:11 +0900 BTW, do you know Kyoto Cabinet?

Tokyo Cabinet: a modern implementation of DBM

Actually, it is more powerful and convenient library than Tokyo Cabinet. At this distance of time, Kyoto Cabinet surpasses Tokyo Cabinet in every aspects. I strongly recommend you to use Kyoto Cabinet. Overview Tokyo Cabinet is a library of routines for managing a database. Tokyo Cabinet is developed as the successor of GDBM and QDBM on the following purposes. Improves space efficiency : smaller size of database file.improves time efficiency : faster processing speed.improves parallelism : higher performance in multi-thread environment.improves usability : simplified API.improves robustness : database file is not corrupted even under catastrophic situation.supports 64-bit architecture : enormous memory space and database file are available.

Tokyo Cabinet is written in the C language, and provided as API of C, Perl, Ruby, Java, and Lua. Documents Fundamental Specifications. Tokyo {Cabinet, [Py]Tyrant} Tokyo Cabinet: Beyond Key-Value Store. By Ilya Grigorik on February 13, 2009 It took Ruby some time to go from an infant research project by Matz to a language we've all come to know so well.

Tokyo Cabinet: Beyond Key-Value Store

Now, another Japanese developer (Mikio Hirabayashi) has all the potential to repeat this cycle with his new database project: Tokyo Cabinet. Developed and sponsored by Mixi Inc. (Japanese Facebook), it is an incredibly fast, and feature rich database library. In fact, given the maturity of the project, it is surprising how little information is available on it outside of Japan. Reliable and efficient key. I've had good luck with the Tokyo Cabinet/pytc solution.

Reliable and efficient key

It's very fast (a bit faster than using the shelve module using anydbm in my implementation), both for reading and writing (though I too do far more reading). The problem for me was the spartan documentation on the python bindings, but there's enough example code around to figure out how to do what you need to do. Additionally, tokyo cabinet is quite easy to install (as are the python bindings), doesn't require a server (as you mention) and seems to be actively supported. You can open files in read-only mode, allowing concurrent access, or read/write mode, preventing other processes from accessing the database. I was looking at various options over the summer, and the advice I got then was this: try out the different options and see what works best for you. (That said, it'd be useful to others if you shared what ended up working the best for you, and why you chose that solution over others!)