background preloader

Technology

Facebook Twitter

Language detection with Google's Compact Language Detector. Google's Chrome browser has a useful translate feature, where it detects the language of the page you've visited and if it differs from your local language, it offers to translate it. Wonderfully, Google has open-sourced most of Chrome's source code, including the embedded CLD (Compact Language Detector) library that's used to detect the language of any UTF-8 encoded content.

It looks like CLD was extracted from the language detection library used in Google's toolbar. It turns out the CLD part of the Chromium source tree is nicely standalone, so I pulled it out into a new separate Google code project, making it possible to use CLD directly from any C++ code. I also added basic initial Python binding (one method!) , and ported the small C++ unit test (verifying detection of known strings for 64 different languages) to Python (it passes!). So detecting language is now very simple from Python: import cld topLanguageName = cld.detect(bytes)[0] Generated by dsites 2008.07.07 from 10% of Base. PlayN. Mincemeat.py: MapReduce on Python. PhoneGap Build. Web Services (Deutsch) Heroku | Cloud Application Platform.