The computing industry is seeing dramatic growth in the use of " shared nothing " database architectures where each node functions independently of one another and is self-sufficient ( Hadoop Distributed File System for example). For the sake of performance, contention among nodes for shared disk resources (SAN and NAS) is one of the things these architectures avoid by dedicating storage resources to each node, i.e. no shared disk. While these computing architectures are best-known in the context of Web-based applications and development activities, they are no longer confined to the Web.
Bixo Labs has merged with Scale Unlimited and is now providing complete consulting and training services for a wide range of big data problems, including web crawling, data mining and search. Why Did Bixo Labs Merge With Scale Unlimited? During client engagements, we repeatedly saw the need for mentoring and training to ensure a smooth hand-off of projects to internal team members. In addition, Bixo Labs has already been teaching Hadoop classes under contract with Scale Unlimited. Given our increased focus on mentoring and training, it made sense for Bixo Labs to acquire and merge with Scale Unlimited, to provide complete consulting solutions that include support for bringing our clients’ internal staff up-to-speed on the open source technologies we use, such as Hadoop, Solr and Cascading.
How do you query hundreds of gigabytes of new data each day streaming in from over 600 hyperactive servers? If you think this sounds like the perfect battle ground for a head-to-head skirmish in the great MapReduce Versus Database War , you would be correct. Bill Boebel, CTO of Mailtrust (Rackspace’s mail division), has generously provided a fascinating account of how they evolved their log processing system from an early amoeba’ic text file stored on each machine approach, to a Neandertholic relational database solution that just couldn’t compete, and finally to a Homo sapien’ic Hadoop based solution that works wisely for them and has virtually unlimited scalability potential. Rackspace faced a now familiar problem. Lots and lots of data streaming in. Where do you store all that data?