Hadoop
< Cloud
< IT
< laparicio
Get flash to fully experience Pearltrees
Text Analytics and Natural language
FileMap is a lightweight system for applying Unix-style file processing tools to large amounts of data stored in files. It provides full map-reduce functionality without requiring that you switch your processing to any particular language or runtime environment, install any special software, or have root on your storage and processing nodes. Features
Inspired by Google's MapReduce and Starfish for Ruby, octo.py is a fast-n-easy MapReduce implementation for Python. Octo.py doesn't aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there's a good chance that you can make it distributed with just a few small changes.
Warning Some of this text is out of date and refers to an older version of Galago.
Status: Beta
MySpace Qizmt is a mapreduce framework for executing and developing distributed computation applications on large clusters of Windows servers. The MySpace Qizmt project develops open-source software for reliable, scalable, super-easy, distributed computation software. MySpace Qizmt core features include:
One night at the pub we discussed whether one could replace Hadoop (a massive and comprehensive implementation of Mapreduce) with a single bash script, an awk command, sort, and a sprinkling of netcat.
Cloud MapReduce was initially developed at Accenture Technology Labs.
one simple way might be to simply add TRACE level log messages at every collect() call with the current values of every index plus the spill number [...] That could be an interesting visualization.