background preloader

The LMAX Architecture

The LMAX Architecture
LMAX is a new retail financial trading platform. As a result it has to process many trades with low latency. The system is built on the JVM platform and centers on a Business Logic Processor that can handle 6 million orders per second on a single thread. The Business Logic Processor runs entirely in-memory using event sourcing. The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design. Over the last few years we keep hearing that "the free lunch is over"[1] - we can't expect increases in individual CPU speed. So I was fascinated to hear about a talk at QCon London in March last year from LMAX. Overall Structure Figure 1: LMAX's architecture in three blobs At a top level, the architecture has three parts Business Logic Processor

Related:  swablermb99

Systems Research Group – NetOS: Practical lock-free data structures Introduction Through careful design and implementation it's possible to build data structures that are safe for concurrent use without needing to manage locks or block threads. These non-blocking data structures can increase performance by allowing extra concurrency and can improve robustness by avoiding some of the problems caused by priority inversion in local settings, or machine and link failures in distributed systems. The best overall introduction to our non-blocking algorithms is the paper Concurrent programming without locks, currently under submission, which covers our designs for multi-word compare-and-swap, word-based software transactional memory and object-based software transactional memory. The papers Language support for lightweight transactions and Exceptions and side-effects in atomic blocks cover the integration of a software transactional memory with a managed run-time environment. Source code

A First RESTful Example - Java Web Services: Up and Running, 2nd Edition [Book] As befits a first example, the implementation is simple but sufficient to highlight key aspects of a RESTful web service. The implementation consists of a JSP (Java Server Pages) script and two backend JavaBeans that the JSP script uses to get the data returned to the client (see Figure 1-6). The data is composed of sage corporate predictions. Here is a sample: E (programming language) Here is a recursive function for computing the factorial of a number, written in E. Functions are defined using the def keyword. In the first line, :int is a guard that constrains the argument and result of the function. A guard is not quite the same thing as a type declaration; guards are optional and can specify constraints. The first :int ensures that the body of the function will only have to handle an integer argument. Without the second :int above, the function would not be able to return a value.

Running Time Graphs The graph below compares the running times of various algorithms. Linear -- O(n) Quadratic -- O(n2) Cubic -- O(n3) Logarithmic -- O(log n) Exponential -- O(2n) Square root -- O(sqrt n) Comparison of algorithms in terms of the maximum problem size they can handle: MORAL: Cheaper, faster computers mean bigger problems to solve.Bigger problems to solve mean efficiency is more important. The basic shape of a polynomial function is determined by the highest valued exponent in the polynomial (called the order of the polynomial). Multiplicative constants do not affect the fundamental shape of a curve.

Solaris Troubleshooting and Performance Tuning Disk I/O Components What we blithely call a "Disk I/O" is actually made up of several components, each of which may have an impact on overall performance. These layers may be broken down as follows for a typical I/O operation: Neo4j Internals: File Storage NOTE: This post is quite outdated, stuff has changed since i wrote this. While you can somewhat safely ignore the alterations for increased address space of entities, the Property store has changed in a fundamental way. Please find the new implementation here. Inter-socket communication with less than 2 microseconds latency Non-blocking I/O through selectors is the part of networking that I like the most. The Java NIO API is not easy, but once you understand the reactor pattern and abstract away its complexities you end up with a powerful and re-usable network multiplexer. The classic one-thread-per-socket approach does not scale, has a lot of overhead and almost always lead to complex code. It does not scale because threads have to compete for limit CPU cores. Having 32 threads competing for 4 logical processors in a quad-core CPU does not make your code any faster, but instead sends its latencies through the roof.

Virtual Panel: Using Java in Low Latency Environments Java is increasingly being used for low latency work where previously C and C++ were the de-facto choice. InfoQ brought together four experts in the field to discuss what is driving the trend, and some of the best practices when using Java in these situations. The participants: Peter Lawrey is a Java consultant interested in low latency and high throughput systems.

PRODUCTISE.IN: The Successor to FOSS.IN… Update: Please see this post for updated information about this event This is possibly the fastest that Team FOSS.IN has ever put together an event. As promised in my last post, here is some information about the new event series that we are putting together.

Memory ordering Memory ordering is a group of properties of the modern microprocessors, characterising their possibilities in memory operations reordering. It is a type of out-of-order execution. Memory reordering can be used to fully utilize different cache and memory banks. On most modern uniprocessors memory operations are not executed in the order specified by the program code. But in singlethreaded programs from the programmer's point of view, all operations appear to have been executed in the order specified, with all inconsistencies hidden by hardware.

Offloading data from the JVM heap (a little experiment) Last time, I wrote about the possibility of using Linux shared memory to offload cacheable/reference data from the JVM. To that end I wrote a small Java program to see if it was practical. The results were better (some even stranger) than I had expected. Here's what the test program does: Create a bunch of java.nio.ByteBuffers that add up to 96MB of storageWrite ints starting from the first buffer, all the way to the last one - that's writing a total of 96MB of some contrived dataFor each test, the buffer creation, writing and deletion is done 24 times (JIT warm up)For each such test iteration, measure the memory (roughly) used in the JVM heap, the time taken to create those buffers and the time taken to write 96MB of dataObviously, there are things here that sound fishy to you - like why use ByteBuffers instead of just writing to an OutputStream or why write to the buffers in sequence. Interpretation of the results: