background preloader

The LMAX Architecture

The LMAX Architecture
LMAX is a new retail financial trading platform. As a result it has to process many trades with low latency. The system is built on the JVM platform and centers on a Business Logic Processor that can handle 6 million orders per second on a single thread. The Business Logic Processor runs entirely in-memory using event sourcing. The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. Over the last few years we keep hearing that "the free lunch is over"[1] - we can't expect increases in individual CPU speed. So I was fascinated to hear about a talk at QCon London in March last year from LMAX. Given the shift to multi-core thinking, this kind of demanding performance would naturally suggest an explicitly concurrent programming model - and indeed this was their starting point. Overall Structure Figure 1: LMAX's architecture in three blobs At a top level, the architecture has three parts Related:  swablermb99

The Disruptor – Lock-free publishing « In case you’ve been living on another planet, we recently open-sourced our high performance message passing framework. I’m going to give a quick run-down on how we put messages into the ring buffer (the core data structure within the Disruptor) without using any locks. Before going any further, it’s worth a quick read of Trish’s post, which gives a high-level overview of the ring buffer and how it works. The salient points from this post are: The ring buffer is nothing but a big array.All “pointers” into the ring buffer (otherwise known as sequences or cursors) are Java longs (64 bit signed numbers) and count upward forever. (Don’t panic – even at 1,000,000 messages per second, it would take the best part of 300,000 years to wrap around the sequence numbers).These pointers are then “mod’ed” by the ring buffer size to figure out which array index holds the given entry. Basic ring buffer structure The ring buffer maintains two pointers, “next” and “cursor”: Claiming a slot Summary

Solaris Troubleshooting and Performance Tuning Disk I/O Components What we blithely call a "Disk I/O" is actually made up of several components, each of which may have an impact on overall performance. These layers may be broken down as follows for a typical I/O operation: POSIX: Application calls a POSIX library interface. McDougall, Mauro and Gregg suggest that the best way to see if I/O is a problem at all is to look at the amount of time spent on library and system calls via DTrace. For example, the DTrace Toolkit's procsystime utility tracks time spent on each system call. If the system call statistics reveal a problem, we should look at the raw disk I/O performance. Physical Disk I/O The primary tool to use in troubleshooting disk I/O problems is iostat. sar -d provides useful historical context. vmstat can provide information about disk saturation. To start, use iostat -xn 30 during busy times to look at the I/O characteristics of your devices. Disk Utilization Disk Saturation Reducing sd_max_throttle is a temporary quick fix.

Seeing in the Dark: The Rise of Dark Pools, and the Danger Below the Surface :: TabbFORUM - Where Capital Markets Speak How Dark Pools Work A trading pool is a venue where buyers and sellers can be brought together to trade a position in equities, foreign exchange, futures, etcetera. This was traditionally done on the floor of an exchange like the New York Stock Exchange and facilitated by market makers (Patterson, 2012: 7). With the developments in computer technology, this system began to be seen as arcane and has been largely replaced by electronic trading. Electronic trading can be seen as disruptive technology that has had an incredible impact on markets (Neil Crammond, 2013). The move to off-exchange trading has facilitated dark trading. The extent of the lack of knowledge is revealed in a NY Sun article that reported Erik Sirri, Head of the SEC’s Division of Market Regulation as stating that the “SEC believes the [dark pools] are available to all market participants.” [Related: “Can the FBI and SEC Stop Market Manipulation Together?”] “They have spent hundreds of millions of dollars on systems.

A First RESTful Example - Java Web Services: Up and Running, 2nd Edition [Book] As befits a first example, the implementation is simple but sufficient to highlight key aspects of a RESTful web service. The implementation consists of a JSP (Java Server Pages) script and two backend JavaBeans that the JSP script uses to get the data returned to the client (see Figure 1-6). The data is composed of sage corporate predictions. Here is a sample: There is an Ant script (see An Ant script for service deployment) that automates the deployment of this and other service examples. In the predictions service, each prediction has an associated human predictor. How the Predictions Web Service Works When the predictions service is deployed to a web server such as Tomcat, the server translates the JSP script predictions.jsp (see Example 1-6) into a servlet instance. Example 1-6. As requests come to the JSP script, the script first checks the request’s HTTP method. String verb = request.getMethod();if (! out.println(preds.getPredictions()); Example 1-7. Example 1-8. return toXML();}

Exchange Quality Execution Benefits for FX Trading | LMAX Exchange The unique LMAX Exchange business model addresses the fundamental changes happening within the FX market, and solves two key industry problems: Lack of transparency of the true cost of OTC traded FX Lack of precise, consistent & reliable FX trade execution LMAX Exchange delivers conflict free, neutral execution and transparent cost of trade to both, the buy-side and sell-side. LMAX Exchange is not a market-maker, and unlike some ECNs, the open order book is driven by streaming, non ‘last look’ limit orders supplied by top tier banks and institutional liquidity providers. LMAX Exchange - precise, consistent and reliable Unparalleled execution quality - average trade latency is 4 ms more With an internal latency of 500 micro seconds, we deliver consistent and precise quality of execution which radically improves participants’ best execution objectives. close Award-winning FX trading venue - recognised for excellence and innovation more LMAX Exchange - connectivity and access Learn more

Offloading data from the JVM heap (a little experiment) Last time, I wrote about the possibility of using Linux shared memory to offload cacheable/reference data from the JVM. To that end I wrote a small Java program to see if it was practical. The results were better (some even stranger) than I had expected. Here's what the test program does: Create a bunch of java.nio.ByteBuffers that add up to 96MB of storageWrite ints starting from the first buffer, all the way to the last one - that's writing a total of 96MB of some contrived dataFor each test, the buffer creation, writing and deletion is done 24 times (JIT warm up)For each such test iteration, measure the memory (roughly) used in the JVM heap, the time taken to create those buffers and the time taken to write 96MB of dataObviously, there are things here that sound fishy to you - like why use ByteBuffers instead of just writing to an OutputStream or why write to the buffers in sequence. Interpretation of the results:

Disruptor by LMAX-Exchange Download here... Discussion, Blogs & Other Useful Links Presentations Introduction to the Disruptor Read This First To understand the problem the Disruptor is trying to solve, and to get a feel for why this concurrency framework is so fast, read the Technical Paper. And now for some words from our sponsors... What is the Disruptor? LMAX aims to be the fastest trading platform in the world. The Disruptor is the result of our research and testing. This is not a specialist solution, it's not designed to work only for a financial application. It works in a different way to more conventional approaches, so you use it a little differently than you might be used to. If you prefer real, live people explaining things instead of a dry paper or content-heavy website, there's always the presentation Mike and Martin gave at QCon San Francisco. What's the big deal? It's fast. Note that this is a log-log scale, not linear. Great What do I do next?

Virtual Panel: Using Java in Low Latency Environments Java is increasingly being used for low latency work where previously C and C++ were the de-facto choice. InfoQ brought together four experts in the field to discuss what is driving the trend, and some of the best practices when using Java in these situations. The participants: Peter Lawrey is a Java consultant interested in low latency and high throughput systems. Martin Thompson is a high performance and low latency specialist, with over two decades working with large scale transactional and big-data systems, in the automotive, gaming, financial, mobile, and content management domains. Todd L. Dr Andy Piper recently joined Push Technology as Chief Technology Officer, from Oracle. The Questions: What do we mean by low latency? Q1: What do we mean by low latency? Lawrey: A system with a measured latency requirement which is too fast to see. Q3. Q4. Lawrey: Java allows you to write, test and profile your application with limited resources more effectively. Q5. Q6. Q7. Q9. Q10. Todd L.

Fragmentation Needed: Dispatches From The Trading Floor - MoldUDP My pricing networks post has gotten a lotoffeedback. Because of its popularity, I've decided to write up a case study detailing one of the interesting problems I was asked to solve. The Incident One morning around 10:00, a pricing support guy cornered me in the hallway: "Hey, did something happen at 9:34 this morning? We lost some data on the NASDAQ ITCH feed... Did you notice anything?" When I got back to my desk, I found that pricing support had left some feed handler logs in my inbox. Background At that time, the NASDAQ ITCH data feed was delivered as a stream of IP multicast packets containing UDP datagrams. MoldUDP is a simple encapsulation protocol for small messages which are intended to be delivered in sequential order. The MoldUDP packet header includes: The sequence number of the first message in this packet.The count of messages in this packet.A "session ID" which allows receiving systems to distinguish multiple MoldUDP flows from one another. What Went Wrong? No matter.

Effective Java Recently, I've re-read awesome java book Effective Java by Joshua Bloch. The book contains 78 independent items, discussing various aspects of programming in java. Something like mini-design patterns with emphasis on their pros and cons. Few notes from each item as a refresher. Item 1: Consider static factory methods instead of constructors Static factory methods have more informative names than constructorsSame parameters list could be appliedNot required to create new objects, could return cached instanceStatic factory methods could return object subtypeReduced verbosity for generics due to type inferenceClasses without public/private constructor can't be subclassed, but it is good, because it enforces to "favor composition over inheritance"Hard to distinguish from other static methods. Item 2: Consider a builder when faced with many constructor parameters Item 3: Enforce the singleton property with a private constructor or an enum type enum Singleton { INSTANCE} Item 7: Avoid finalizers

Fragmentation Needed: Pricing and Trading Networks: Down is Up, Left is Right My introduction to enterprise networking was a little backward. I started out supporting trading floors, backend pricing systems, low-latency algorithmic trading systems, etc... I got there because I'd been responsible for UNIX systems producing and consuming multicast data at several large financial firms. Inevitably, the firm's network admin folks weren't up to speed on matters of performance tuning, multicast configuration and QoS, so that's where I focused my attention. It amazes me how little I knew in those days. More incredible is how my ignorance of "normal" ways of doing things (AVVID, SONA, Cisco Enterprise Architecture, multi-tier designs, etc...) gave me an advantage over folks who had been properly indoctrinated. The trading floor is a weird place, with funny requirements. Redundant Application Flows The first thing to know about pricing systems is that you generally have two copies of any pricing data flowing through the environment at any time. Losing One Packet Is Bad