Scalable distributed computing/storage
< Performances web
Get flash to fully experience Pearltrees
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Hive is an open source volunteer project under the Apache Software Foundation. Previously it was a subproject of Hadoop , but has now graduated to become a top-level project of its own. We encourage you to learn about the project and contribute your expertise.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures. A wide variety of companies and organizations use Hadoop for both research and production.
Murder is a method of using Bittorrent to distribute files to a large amount of servers within a production environment. This allows for scaleable and fast deploys in environments of hundreds to tens of thousands of servers where centralized distribution systems wouldn't otherwise function. A "Murder" is normally used to refer to a flock of crows, which in this case applies to a bunch of servers doing something. For the impatient, gem install murder and add these lines to your Capfile: require 'murder' set :deploy_via, :murder after 'deploy:setup', 'murder:distribute_files' before 'murder:start_seeding', 'murder:start_tracker' after 'murder:stop_seeding', 'murder:stop_tracker'
MirrorBrain is an open source framework to run a content delivery network using mirror servers. It solves a challenge that many popular open source projects face - a flood of download requests, often magnitudes more than a single site could practically handle. A central (and probably the most obvious) part is a “download redirector” which automatically redirects requests from web browsers or download programs to a mirror server near them. Choosing a suitable mirror for a users request is the key, and MirrorBrain uses geolocation and global routing data to make a sensible choice, and achieve load-balancing for the mirrors at the same time.