Scaling Twitter: Making Twitter 10000 Percent Faster

Update 6: Some interesting changes from Twitter's Evan Weaver: everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally.Update 5: Twitter on Scala. A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. Twitter started as a side project and blew up fast, going from 0 to millions of page views within a few terrifying months.

How does Twitter search work What can I use to profile C++ code in Linux Type and search, how to improve the performance 网卡性能分析-Intel8257X芯片手册读后感 - 从Java到C再到Linux - CSDN博客引：在《《OpenVPN性能》之后，我进一步阅读了硬件的解决方案，希望能得到一些思想，然后进一步的改进我的设计，由于工作的便利性和实际工作的需要，我阅读了intel的82571EB，82574L，82575等以太网芯片的datesheet的相关特性描述部分(由于我不打算亲自写驱动，因此我没有阅读寄存器以及存储器细节，更多的是我不相信自己的驱动比intel的工程师们的更高效)，得到了很多感觉，以下是我的一些摘录和读后感。一，网络应用开销 1.协议栈处理开销：分层模型各个层次的OS实现开销 2.内存拷贝开销：网卡，内核内存，用户态内存之间的拷贝 3.系统层面的开销：中断，缓存管理，系统调用二，局部的解决方案 1.dma-针对内存拷贝，需要锁定总线，此时cpu就好像被拔除了一样(《Intel微处理器》中原话)，如果网卡数据芯片没有cpu高效，除了省去了一次cpu中转之外，性能反而降低。 2.硬中断负载均衡-针对多cpu的利用率(希望多cpu全部用来处理协议栈)，此带来软中断负载均衡(在硬中断cpu上触发软中断)，做的不好没有效果，造成某个cpu高负载，其它cpu空闲，做的好的话，会造成基于顺序的协议包乱序，这就是cpu并行和tcp串行之间的冲突。 3.TOE-针对pci总线访问延迟过高，tcp卸载引擎，这种方案对于经常传输小包的情形来说无疑是不好的，因此在网卡中大量处理tcp协议会造成收发速率降低。 ******************* For cases where the 82575 is connected to a small number of clients, it is desirable to initiate the interrupt as soon as possible with minimum latency. ******************* 2.针对header split，82575手册上有很好的描述， This feature consists of splitting or replicating a packet’s header to a different memory space. *************** 如果不是很理解，微软的msdn上也有描述，截图如下： **************

XMLHttpRequest (XHR) Uses Multiple Packets for HTTP POST? | Joseph Scott A recent Think Vitamin article, The Definitive Guide to GET vs POST, mentioned something that I hadn’t seen before about XMLHttpRequest (XHR). Their Rule #4 states: When using XMLHttpRequest, browsers implement POST as a two-step process (sending the headers first and then the data). This means that GET requests are more responsive – something you need in AJAX environments. The claim is that even the smallest XHR will be sent using two packets if the request is done over HTTP POST instead of HTTP GET. I don’t remember ever having heard this claim before. Let me first say that performance issues for POST vs. I wasn’t the only one who wanted to find out more about XHR POST using multiple packets. 2. Simon Willison did some looking around and found more links for this. So I updated my install of Wireshark on Windows XP, turned off all of the packet reassembly options for HTTP decoding and started testing browsers. Here are the results of the tests that I ran: Related

c++ - Overloading operators in derived class Building Fast Client-side Searches Yesterday we released a new people selector widget (which we’ve been calling Bo Selecta internally). This widget downloads a list of all of your contacts, in JavaScript, in under 200ms (this is true even for members with 10,000+ contacts). In order to get this level of performance, we had to completely rethink how we send data from the server to the client. Server Side: Cache Everything To make this data available quickly from the server, we maintain and update a per-member cache in our database, where we store each member’s contact list in a text blob — this way it’s a single quick DB query to retrieve it. Testing the Performance of Different Data Formats Despite the fact that our backend system can deliver the contact list data very quickly, we still don’t want to unnecessarily fetch it for each page load. With this goal in mind, we started testing various data formats, and recording the average amount of time it took to download and parse each one. Going Custom Searching The End Result

Multi-Core HTTP Server with NodeJS NodeJS has been garnering a lot of attention late. Briefly, NodeJS is a server-side JavaScript runtime implemented using Google's highly performant V8 engine. It provides an (almost) completely non-blocking I/O stack which, when combined with JavaScript closures and anonymous functions, makes it an excellent platform for implementing high throughput web services. In case you missed it, NodeJS author Ryan Dahl (@ryah) gave an excellent talk at a Bayjax event hosted by Yahoo! Here at Yahoo! Performance characterization of NodeJS vs. other web stacks in various workloads: straight HTTP proxy at different latencies, scatter/gather HTTP proxy, etc. Of course, I would be remiss if I didn't mention that Mail is hiring! The case for multi-core But all is not sunshine and lollipops in NodeJS land. While Node is relatively solid, it does still crash occasionally, adversely impacting availability if you're running only a single NodeJS process. Taking advantage of multiple cores Using a load balancer