background preloader

Big Data

Facebook Twitter

Shared storage in a 'shared nothing' environment. The computing industry is seeing dramatic growth in the use of "shared nothing" database architectures where each node functions independently of one another and is self-sufficient (Hadoop Distributed File System for example).

Shared storage in a 'shared nothing' environment

For the sake of performance, contention among nodes for shared disk resources (SAN and NAS) is one of the things these architectures avoid by dedicating storage resources to each node, i.e. no shared disk. While these computing architectures are best-known in the context of Web-based applications and development activities, they are no longer confined to the Web. EMC Greenplum, IBM Netezza, and ParAccel are all examples of shared-nothing database architectures that are being used increasingly in "big data" business analytics applications and within corporate data centers.

That brings us to shared storage--seen in the context of shared nothing as a single point of contention. Elastic Web Mining. Scale Unlimited is based in Nevada City, California and provides consulting and training services for big data analytics, search, and web mining.

Elastic Web Mining

The company was founded in 2008 by Stefan Groschupf, Chris Wensel, and Ken Krugler, three of the world’s leading experts in scalable, reliable data analytics, workflow design and web mining. All are well-known community members and contributors to key open source projects, including Hadoop, Bixo, Cascading, Solr, Lucene, Katta and Tika. Solutions from Scale Unlimited are built using these and other widely used and well supported open source packages, providing maximum flexibility with no commercial lock-in.

Inspiration Scale Unlimited solves three major problems that the founders experienced first-hand at previous startups and consulting projects. First, processing big data requires a workflow system that is efficient, reliable and scalable. With Scale Unlimited, solutions are built using Hadoop and Cascading-based workflows. Team Technical Advisors. Mobclix Selects Aster Data to Move Analytics to the Cloud. [repost]How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data « New IT Farmer. How do you query hundreds of gigabytes of new data each day streaming in from over 600 hyperactive servers?

[repost]How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data « New IT Farmer

If you think this sounds like the perfect battle ground for a head-to-head skirmish in the great MapReduce Versus Database War, you would be correct. Bill Boebel, CTO of Mailtrust (Rackspace’s mail division), has generously provided a fascinating account of how they evolved their log processing system from an early amoeba’ic text file stored on each machine approach, to a Neandertholic relational database solution that just couldn’t compete, and finally to a Homo sapien’ic Hadoop based solution that works wisely for them and has virtually unlimited scalability potential. Rackspace faced a now familiar problem.

Lots and lots of data streaming in. Where do you store all that data? Facing exponential growth they spent about 3 months building a new log processing system using Hadoop (an open-source implementation of Google File System and MapReduce), Lucene and Solr. Welcome to Hive! Big Data 2011 by GigaOM - Infrastructure - Web- Eventbrite. Invalid quantity.

Big Data 2011 by GigaOM - Infrastructure - Web- Eventbrite

Please enter a quantity of 1 or more. The quantity you chose exceeds the quantity available. Please enter your name. Please enter an email address. Please enter a valid email address. Please enter your message or comments. Please enter the code as shown on the image. Please select the date you would like to attend. Please enter a valid email address in the To: field. Please enter a subject for your message. Please enter a message. You can only send this invitations to 10 email addresses at a time. $$$$ is not a properly formatted color.

Please limit your message to $$$$ characters. $$$$ is not a valid email address. Please enter a promotional code. Sold Out Pending You have exceeded the time limit and your reservation has been released. The purpose of this time limit is to ensure that registration is available to as many people as possible. This option is not available anymore. Please read and accept the waiver. All fields marked with * are required. US Zipcodes need to be 5 digits. Map.