Cluster

> > >

In-Memory Database

Aeron: Do we really need another messaging system? Do we really need another messaging system? We might if it promises to move millions of messages a second, at small microsecond latencies between machines, with consistent response times, to large numbers of clients, using an innovative design. And that’s the promise of Aeron (the Celtic god of battle, not the chair, though tell that to the search engines), a new high-performance open source message transport library from the team of Todd Montgomery, a multicast and reliable protocol expert, Richard Warburton, an expert on compiler optimizations, and Martin Thompson, the pasty faced performance gangster.

The claims are Aeron is already beating the best products out there on throughput and latency matches the best commercial products up to the 90th percentile. Aeron can push small 40 byte messages at 6 million messages a second, which is a very difficult case. Here’s a talk Martin gave on Aeron at Strangeloop: Aeron: Open-source high-performance messaging. That sums up Aeron is nutshell. Homepage | Celery: Distributed Task Queue. Several Solutions for Queue & Worker Systems. Almost every dating site we’ve built has a job queue of some kind. We enqueue email sends, statistic updates, logging, fraud detection and more. Anything that doesn’t immediately impact the response we’re preparing for the user can be run in the background. Over the years we’ve tried several different queue/worker systems and we’d like to share our findings to help others decide on a technology.

The solutions distinguish themselves in a few key areas: Scalability There are several separate scalability concerns to consider, such as the throughput of the queue, the number of the workers it can support and the maximum queue size. Priority Sending a password reset email shouldn’t wait until a batch of hundreds of thousands of “Your Daily Picks” emails have been processed. Administrative Tools Some solutions provide web based administration tools to check queue lengths, contents of the queue and to resubmit failed jobs. Scheduling Not all jobs should be executed immediately. Redundancy Beanstalk. Gone Fishin': Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud. All in all this is still my favorite post and I still think it's an accurate vision of a future.

Not everyone agrees, but I guess we'll see... "But it is not complicated. [There's] just a lot of it. " -- Richard Feynman on how the immense variety of the world arises from simple rules. Contents: We have not yet begun to scale. The world is still fundamentally disconnected and for all our wisdom we are still in the earliest days of learning how to build truly large planet-scaling applications.

Today 350 million users on Facebook is a lot of users and five million followers on Twitter is a lot of followers. Tomorrow the numbers foreshadow a new Cambrian explosion of connectivity that will look as different as the image of a bare lifeless earth looks to us today. If you aren't Google, or a very few other companies, how can you possibly compete? What are we looking at here? Notice how global, how plentiful, and how fast the flashes flicker. For a while I thought this might be true. Radical? The Perils of Asynchrony.

Every time you come across anything more than a rudimentary system that has some moderately serious performance needs, someone somewhere on the team considers using asynchronous processing to help reduce (perceived) response time. That person needs to be identified and quickly locked in a padded room . . . . Just kidding! Often that person is me and often I end up writing the code and relearning why doing things in an asynchronous way (that also meets a certain "near guarantee" SLA including DR and HA needs) is very very hard. So it was with some chagrin that I was tasked with coding some infrastructure components to implement a Task Queue for my current team. The goal was to satisfy a need to improve response time of the application when it was doing some back-end tasks that required 100ms or more. The tasks themselves aren't time critical but they are important e.g. replicating data to a remote site.

Anyway the default solutions in Java for Asynchronous processing are JMS in a Nutshell. In Memory Data Grid Technologies. After winning a CSC Leading Edge Forum (LEF) research grant, I (Paul Colmer) wanted to publish some of the highlights of my research to share with the wider technology community. What is an In Memory Data Grid? It is not an in-memory relational database, a NOSQL database or a relational database.

It is a different breed of software datastore. In summary an IMDG is an ‘off the shelf’ software product that exhibits the following characteristics: The data model is distributed across many servers in a single location or across multiple locations. All servers can be active in each site.All data is stored in the RAM of the servers.Servers can be added or removed non-disruptively, to increase the amount of RAM available.The data model is non-relational and is object-based. There are also hardware appliances that exhibit all these characteristics. There are six products in the market that I would consider for a proof of concept, or as a starting point for a product selection and evaluation:

Srinath's Blog :My views of the World: List of Known Scalable Architecture Templates. For most Architects, "Scale" is the most illusive aspect of software architectures. Not surprisingly, it is also one of the most sort-out goals of todays software design. However, computer scientists do not yet know of a single architecture that can scale for all scenarios. Instead, we design scalable architectures case by case, composing known scalable patterns together and trusting our instincts. Simply put, building a scalable system has become more an art than a science. We learn art by learning masterpieces, and scale should not be different! In this post I am listing several architectures that are known to be scalable.

Often, architects can use those known scalable architectures as patterns to build new scalable architectures. LB (Load Balancers) + Shared nothing Units - This model includes a set of units that does not share anything with each other fronted with a load balancer that routes incoming messages to a unit based on some criteria (round-robin, based on load etc.). 35+ Use Cases for Choosing Your Next NoSQL Database.

Hazelcast

Load Balancing FAQ. Rundeck.org - Open Source Automation. MySQL at Facebook. No SQL ? NOSQL Patterns. Over the last couple years, we see an emerging data storage mechanism for storing large scale of data. These storage solution differs quite significantly with the RDBMS model and is also known as the NOSQL. Some of the key players include ...GoogleBigTable, HBase, HypertableAmazonDynamo, Voldemort, Cassendra, RiakRedisCouchDB, MongoDB These solutions has a number of characteristics in commonKey value storeRun on large number of commodity machinesData are partitioned and replicated among these machinesRelax the data consistency requirement.

(because the CAP theorem proves that you cannot get Consistency, Availability and Partitioning at the the same time)The aim of this blog is to extract the underlying technologies that these solutions have in common, and get a deeper understanding on the implication to your application's design. I am not intending to compare the features of these solutions, nor to suggest which one to use. API model The basic form of API access is Data replication.

Filesystem

Brewer's CAP Theorem. On Friday 4th June 1976, in a small upstairs room away from the main concert auditorium, the Sex Pistols kicked off their first gig at Manchester’s Lesser Free Trade Hall. There’s some confusion as to who exactly was there in the audience that night, partly because there was another concert just six weeks later, but mostly because it’s considered to be a gig that changed western music culture forever. So iconic and important has that appearance become that David Nolan wrote a book, I Swear I Was There: The Gig That Changed the World, investigating just whose claim to have been present was justified. Because the 4th of June is generally considered to be the genesis of punk rock6. We know three chords but you can only pick two The Sex Pistols had shown that barely-constrained fury was more important to their contemporaries than art-school structuralism, giving anyone with three chords and something to say permission to start a band.

Brewer’s (CAP) Theorem The Significance of the Theorem. ProActive - Professional Open Source Middleware for Parallel, Distributed, Multi-core Programming. Fura - Welcome. Clustering with the Shoal Framework. Shoal is an open source, Java-based generic clustering framework. It can be used in your applications to add clustering functionalities like load balancing, fault tolerance, or both. Applications using Shoal can share data, communicate via messages with other cluster nodes across the network, and notify of relevant events like the joining, shutdown, and failure of a node or group of nodes. You can take appropriate measures and perform monitoring tasks when these events occur; Shoal forwards a signal to your code to track these notifications.

Shoal is the clustering framework used by the " project to implement its application server clustering. Shoal is a lightweight component; you can embed Shoal not only in Java EE applications, but in SE applications too. In this article, we'll cover the Shoal architecture and its basic concepts. Learning Shoal's Basic Concepts and Architecture Understanding Shoal's Architecture Understanding Shoal's Design. TIBCO Silver™ TIBCO Cloud<object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" id="ooyalaPlayer_9picp_hgq65rj1" width="640" height="360" codebase=" name="movie" value=" /><param name="bgcolor" value="#000000" /><param name="allowScriptAccess" value="always" /><param name="allowFullScreen" value="true" /><param name="flashvars" value="embedType=noscriptObjectTag&embedCode=JmM3htYjrB-nYGji774bVOKv3RPjjfjT&videoPcode=FiYm06Tu_hQLSGqTeVTwSS1L4vV7" /><embed src=" The promise of cloud is real: it allows for greater efficiency and faster time-to-market of new capabilities at reduced cost.

TIBCO Cloud makes this possible – providing a clear path to fast results that enables the business to get ideas up and running without having to secure extra monetary or resource support. IceGrid. As a mature and proven distributed computing technology, Ice provides a host of solutions for accelerating the development and deployment of large-scale distributed applications, including a secure router and bidirectional protocol for firewall traversal, and a data distribution service. IceGrid is a service that expedites the creation of a new class of applications. Grid computing has become increasingly more important to mainstream information technology endeavors as large SMP servers are replaced by many networked commodity servers.

Grids are being used today to solve pressing scientific and corporate challenges. By assembling available resources into an ad-hoc grid, an organization has a powerful computational tool at its disposal. From an engineering perspective, the challenge is building an application that leverages this new-found wealth of computing power in an efficient, secure and manageable fashion.

Grid applications take many forms. Location Deployment Resource Allocation. Running Hadoop On Ubuntu Linux (Single-Node Cluster) @ Michael G. Noll. In this tutorial I will describe the required steps for setting up a pseudo-distributed, single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux. Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System (GFS) and of the MapReduce computing paradigm. Hadoop’s HDFS is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets. The main goal of this tutorial is to get a simple Hadoop installation up and running so that you can play around with the software and learn more about it.

This tutorial has been tested with the following software versions: Ubuntu Linux 10.04 LTS (deprecated: 8.10 LTS, 8.04, 7.10, 7.04) Hadoop 1.0.3, released May 2012 Sun Java 6 Disabling IPv6. Shoal Java Clustering Framework Overview — Project Kenai. Shoal – A Generic Clustering Framework About Shoal Framework Shoal Capabilities The Shoal Framework, written in Java, can be used as an in-process component for cluster communications and cluster event management. As an in-process component, Shoal's core service, the Group Management Service (GMS), provides the ability for the process (JVM) to become a communicating member of a predefined cluster. As a result, within each process, clients implementing Shoal's client API can get notifications of cluster events.

By becoming a communicating cluster member, the process gets the ability to be notified of members' joining, failures, and planned shutdowns among cluster members ( see Shoal Group Event Notifications), be notified of the process's selection as a recovery member for any recovery operations on another member on occurrence that other member's failure. Membership in Multiple Groups A single process can be a member of multiple groups to represent different groups. Shoal GMS Use Cases. Java Fast Sockets: Enabling high-speed Java communications on high performance clusters. If you've ever tried to write a high-performance Java program that does network communication you know that it's hard to do this in an efficient manner.

Researchers at the University of A Coruña in Spain have created a library called Java Fast Sockets (JFS) that dramatically increases throughput while reducing latency compared to the normal Java socket and NIO APIs. And they did it while retaining compatibility with Java sockets. In a paper published late last year they wrote: This paper presents Java Fast Sockets (JFS), an optimized Java socket implementation on clusters for high performance computing.

Current socket libraries do not efficiently support high-speed cluster interconnects and impose substantial communication overhead. (click image for full size) The open source library works by eliminating copies and unnecessary serialization steps. For more information: Tribes - The Tomcat Cluster Communication Module - Apache Tribes - Introduction. JGroups - The JGroups Project. "Goodbye Google App Engine" - TheServerSide.com. Carlos Ble has posted "Goodbye App Engine," meant to be a quick summary for friends about why his company switched away from GAE. It's a good summary of the problems with GAE and PAAS. It has a laundry list of problems: python 2.5 instead of 3, no https, no C (so pure python libs only), request time limits, a really ugly database (slow, no joins), memcached limitations, just a huge set of things that on the surface look like things you can handle but they end up making you feel like you've been stung to death by gnats.

He ported away from GAE in one week, and finishes by saying "developing on GAE introduced such a design complexity that working around it pushes us 5 months behind schedule" and listed it as a 15 thousand euro mistake. Ouch. Some people say it's Carlos' fault for not knowing everything about GAE going in, or for ignoring the risk disclaimers that GAE puts out; Carlos points out that stability was a crucial problem, which a risk disclaimer wouldn't make obvious. Hadoop Study Reveals Usage Stats, Benefits, and Challenges. EC2 Pricing. Java Heterogenous distributed computing. A Simple Introduction To Playing With Big Data. GlusterFS File System - Wikip. Global File System - Wikip.

Distributed caching

Scalability. Clustering a Java Web Application With Amazon ElasticLoad Balanc. Load Balancing for Web Application Performance and Scalability. Enterprise Java Community: Can Java EE Deliver The Asynchronous. Yale Researchers Create Hadoop-database Cross. SemiSpace: an open source JavaSpaces inspired tuple space. TCP Tuning Guide - Linux TCP Tuning. The Role of Caching in Large Scale Architecture | Architects Zon.

Clustering Your Comet Application Using Atmosphere. How to Scale Tomcat in the Cloud with RabbitMQ and JMX | TomcatE.