Amazon Architecture

This is a wonderfully informative Amazon update based on Joachim Rohde's discovery of an interview with Amazon's CTO. You'll learn about how Amazon organizes their teams around services, the CAP theorem of building scalable systems, how they deploy software, and a lot more. Many new additions from the ACM Queue article have also been included. Amazon grew from a tiny online bookstore to one of the largest stores on earth. Site: Information Sources Early Amazon by Greg Linden How Linux saved Amazon millions Interview Werner Vogels - Amazon's CTO Asynchronous Architectures - a nice summary of Werner Vogels' talk by Chris Loosley Learning from the Amazon technology platform - A Conversation with Werner VogelsWerner Vogels' Weblog - building scalable and robust distributed systemsPlatform Linux Oracle C++ Perl Mason Java Jboss ServletsThe Stats More than 55 million active customer accounts.

Public Data Sets on Amazon Web Services (AWS) Click here for the detailed list of available data sets. Here are some examples of popular Public Data Sets: NASA NEX: A collection of Earth science data sets maintained by NASA, including climate change projections and satellite images of the Earth's surfaceCommon Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages1000 Genomes Project: A detailed map of human genetic variation Google Books Ngrams: A data set containing Google Books n-gram corpusesUS Census Data: US demographic data from 1980, 1990, and 2000 US CensusesFreebase Data Dump: A data dump of all the current facts and assertions in the Freebase system, an open database covering millions of topics The data sets are hosted in two possible formats: Amazon Elastic Block Store (Amazon EBS) snapshots and/or Amazon Simple Storage Service (Amazon S3) buckets. If you have any questions or want to participate in our Public Data Sets community, please visit our Public Data Sets forum .

11 Top Open-source Resources for Cloud Computing digg Open-source software has been on the rise at many businesses during the extended economic downturn, and one of the areas where it is starting to offer companies a lot of flexibility and cost savings is in cloud computing. Cloud deployments can save money, free businesses from vendor lock-ins that could really sting over time, and offer flexible ways to combine public and private applications. The following are 11 top open-source cloud applications, services, educational resources, support options, general items of interest, and more. Eucalyptus. Red Hat’s Cloud. Traffic Server. Cloudera. Puppet. Enomaly. Joyent. Zoho. Globus Nimbus. Reservoir. OpenNebula. It’s good to see open-source tools and resources competing in the cloud computing space.

Scaling Twitter: Making Twitter 10000 Percent Faster | High Scal Update 6: Some interesting changes from Twitter's Evan Weaver: everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally.Update 5: Twitter on Scala. A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. Twitter started as a side project and blew up fast, going from 0 to millions of page views within a few terrifying months.

Amazon Web Services Developer Community : Amazon EC2 Announces General Availability, SLA, and Windows Dear AWS Developers, We are excited to announce that Amazon Elastic Compute Cloud (Amazon EC2) is now Generally Available and includes a Service Level Agreement (SLA). AWS is also releasing, available today, a public beta of Amazon EC2 running Microsoft Windows Server and Microsoft SQL Server. In addition, we're giving you a sneak peek at some upcoming features that will make Amazon EC2 even easier to operate. Please see details below on these announcements. Amazon EC2 today is entering General Availability (GA), after just over two years of operation in beta and the addition of many highly-requested features. Also beginning today, customers can employ Amazon EC2 running Windows Server or SQL Server with all of the performance, reliability, and scalability benefits of Amazon EC2. We are excited to share these exciting new announcements with you, and invite you to visit aws.amazon.com/ec2 for full details.

BOINC Hypervisor A hypervisor or virtual machine monitor (VMM) is a piece of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor is running one or more virtual machines is defined as a host machine. Each virtual machine is called a guest machine. The hypervisor presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems. Classification[edit] Type-1 and type-2 hypervisors In their 1974 article "Formal Requirements for Virtualizable Third Generation Architectures" Gerald J. Type-1: native or bare-metal hypervisors These hypervisors run directly on the host's hardware to control the hardware and to manage guest operating systems. Type-2: hosted hypervisors These hypervisors run on a conventional operating system just as other computer programs do. However, the distinction between these two types is not necessarily clear. Mainframe origins[edit] Unix and Linux servers[edit]

7 Scaling Strategies Facebook Used to Grow to 500 Million Users Robert Johnson, a director of engineering at Facebook, celebrated Facebook's monumental achievement of reaching 500 million users by sharing the scaling principles that helped reach that milestone. In case you weren't suitably impressed by the 500 million user number, Robert ratchets up the numbers game with these impressive figures: 1 million users per engineer500 million active users100 billion hits per day50 billion photos2 trillion objects cached, with hundreds of millions of requests per second130TB of logs every day How did Facebook get to this point? People Matter Most. These principles are not really new, but I think when you see them all laid out together like this it's easy to see how they all work together to make a self-reinforcing virtuous circle. Will these principles be enough to grow the next 500 million users?

The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition (2nd Edition): Frederick P. Brooks: Books Supercomputer Predicts Civil Unrest In Isaac Asimov's "Foundation" series, the future of masses of people can be predicted with "psychohistory," a method of predicting future political and social trends, using a device called the "Prime Radiant." In the 1950s, there wasn't the math or the computational power available to make such a thing reality. Now there might be. Supercomputers, such as the Nautilus at the University of Tennessee's Center for Remote Data Analysis and Visualization, may have brought the world closer to Asimov's vision, though it is still early days. SCIENCE CHANNEL: Hacker Quiz: Can you distinguish a "phish" from a "pharm"? Leetaru used a database of 100 million news articles spanning the period from 1979 to early 2011. It isn't just the tone of the articles, however; it's also the change in tone over time. Another pattern the supercomputer was able to tease out was evidence of Osama bin Laden living in Pakistan. BLOG: Bin Laden Conspiracies Rely on Complex Scenarios Image: Kalev H.

Home » OpenStack Open Source Cloud Computing Software Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS, text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages: HBase. HBase beat out MySQL, Cassandra, and a few others. Why a surprise? HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data. Facebook chose HBase because they monitored their usage and figured out what the really needed. A short set of temporal data that tends to be volatileAn ever-growing set of data that rarely gets accessed Makes sense. Some key aspects of their system: I wouldn't sleep on the idea that Facebook already having a lot of experience with HDFS/Hadoop/Hive as being a big adoption driver for HBase.

The Attention Economy: Understanding the New Currency of Business: Books: Thomas H. Davenport,John C. Beck Prof Promises Supercomputer on Every Desktop | Wired Enterprise Virginia Tech researcher Wu Feng hopes his work on the HokieSpeed supercomputer will help make supercomputing more accessible (Photo:Virginia Tech) When Wu Feng looks at an iPad, he sees something more than a great way to play Fruit Ninja. To him, Apple’s sleek device looks more like a compute node on a supercomputer of the future: 1.5 gigaflops of computer power just waiting to be harnessed. Feng — an associate professor of computer science at Virginia Tech — hopes to one day bring the supercomputer to an entirely new audience. He’s known for building very small supercomputers. By today’s standards, HokieSpeed is not exactly an elite supercomputer. But Feng and his fellow researchers developing techniques and software to allow HokieSpeed take maximum advantage of its 2,500 Xeon central processing units and its 185,000 Nvidia graphics chips. “A lot of the capability, right now, of personal computers is just latent,” Feng says. What might a small-business supercomputer look like?