
How SmugMug survived the Amazonpocalypse « SmugMug's Don MacAskill tl;dr: Amazon had a major outage last week, which took down some popular websites. Despite using a lot of Amazon services, SmugMug didn’t go down because we spread across availability zones and designed for failure to begin with, among other things. Sorry about that, that was probably our fault for deploying SkyNet there in the first place. We’ve been getting a lot of questions about how we survived (SmugMug was minimally impacted, and all major services remained online during the AWS outage) and what we think of the whole situation. We’re heavy AWS users with many petabytes of storage in their Simple Storage Service (S3) and lots of Elastic Compute Cloud (EC2) instances, load balancers, etc. I wish I could say we had some sort of magic bullet that helped us stay alive. First, all of our services in AWS are spread across multiple Availability Zones (AZs). Second, we designed for failure from day one.
The AWS Outage: The Cloud's Shining Moment So many cloud pundits are piling on to the misfortunes of Amazon Web Services this week as a response to the massive failures in the AWS Virginia region. If you think this week exposed weakness in the cloud, you don't get it: it was the cloud's shining moment, exposing the strength of cloud computing. In short, if your systems failed in the Amazon cloud this week, it wasn't Amazon's fault. You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon's cloud computing model. The strength of cloud computing is that it puts control over application availability in the hands of the application developer and not in the hands of your IT staff, data center limitations, or a managed services provider. The AWS outage highlighted the fact that, in the cloud, you control your SLA in the cloud—not AWS. The Dueling Models of Cloud Computing The Amazon model is the "design for failure" model. Most cloud providers follow some variant of the "design for failure" model.
5 Lessons We’ve Learned Using AWS In my last post I talked about some of the reasons we chose AWS as our computing platform. We’re about one year into our transition to AWS from our own data centers. We’ve learned a lot so far, and I thought it might be helpful to share with you some of the mistakes we’ve made and some of the lessons we’ve learned. 1. Dorothy, you’re not in Kansas anymore. If you’re used to designing and deploying applications in your own data centers, you need to be prepared to unlearn a lot of what you know. Many examples come to mind, such as hardware reliability. Another example: in the Netflix data centers, we have a high capacity, super fast, highly reliable network. 2. When designing customer-facing software for a cloud environment, it is all about managing down expected overall latency of response. Your best bet is to build your systems to expect and accommodate failure at any level, which introduces the next lesson. 3. One of the first systems our engineers built in AWS is called the Chaos Monkey.
Lessons From a Cloud Failure: It’s Not Amazon, It’s You | Epicenter Amazon’s cloud hosted Web Services experienced a catastrophic failure last week, knocking hundreds of sites off the web. Some developers saw the AWS outage as a warning about what happens when we rely too much on the cloud. But the real failure of Amazon’s downtime is not AWS, but the sites that use it. That’s not to say that Amazon didn’t fail rather spectacularly, taking out huge sites like Quora, Reddit, FourSquare and Everyblock, but as Paul Smith of Everyblock admits, while Amazon bears some of the responsibility, Everyblock failed as well: Frankly, we screwed up. But perhaps the most instructive lesson comes from those sites that were not affected, notably Netflix, SimpleGeo and SmugMug. Among Netflix’s suggestions is to always design for failure: “we’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. To ensure that each system can stand on its own, Netflix uses something it calls the Chaos Monkey (no relation).
Explaining Cloud Computing to Your Pointy Haired Boss Introduction to Cloud Computing Cloud computing is all the rage today. Each new day brings some announcement from yet another company desperately trying to get on the bandwagon. You see this every time there is a major movement in our industry. Because of all of this, your pointy haired boss is likely to come up to you someday and expect you to know all about cloud computing and ask for you to look into it. Defining Cloud Computing There are also several industry definitions, and these are good, but like all industry standards, there are many, and most are hard to understand. To help you with this problem I am going to break it down into a simple definition. "Cloud computing is just another place to run your applications. There is a spectrum of computing that you pick from when you decide where and how to deploy your applications. Generally speaking, the more you moving from the left to the right the less control over the environment you have, while gaining economies of scale.
Google App Engine vs. Amazon Web Services: The Developer Challenges Sure, cloud computing platforms free developers from scalability and deployment issues, allowing them to spend more time actually writing web applications and services. But when choosing between two prominent platforms, Google App Engine and Amazon Web Services (AWS), which platform is best for you? The choice can be simple if you fall into one of the following two scenarios: If your application can be architected to run within the limited Google App Engine runtime environment, then take advantage of Google's lower hosting costs.If you need a more flexible cloud deployment platform, then AWS is a good fit for your needs. Of course, you need to consider more issues than these when making a choice. In my development work, I've had the opportunity to use both and I can tell you that they require very different architecture decisions and present different sets of challenges. I assume you have a good understanding of Google App Engine and AWS. Google App Engine Challenges Page 1 of 2
Java Cloud Development: What Developers Need to Know If you are a Java developer and your organization is jumping on the cloud computing bandwagon, you have to change the way you build and deploy applications. In this article, I will examine what is in store for you with each cloud delivery model and with both public and private cloud scenarios. Cloud Computing Delivery Models: IaaS, PaaS and SaaS The delivery model for cloud infrastructure can be broadly categorized as Infrastructure as Service (Iaas), Platform as a Service (PaaS) or Software as a Service (SaaS). Infrastructure as a Service (IaaS) Cloud computing vendors provide infrastructure services such as computers, storage devices, and routers to deploy your application. If you use Infrastructure as a Service, you may have to deal with installation and configuration of the software platforms such as application servers, databases, and so on. Platform as a Service (PaaS) The cloud vendor provides the application platform such as middleware, database, messaging system, and so on.
Three Ways Developers Can Leverage Cloud Computing The rise of cloud technologies has given the developer immense resources not known in the past. You can now bring unprecedented richness to your application by hooking into cloud resources. In this article I will outline three key ways in which the developer can leverage the cloud. PaaS: Cloud as Development Workspace No longer do you need to be installing all sorts of SDKs and IDEs on your desktop. One up and coming PaaS provider is Heroku. Figure 1. And beyond just offering a development environment in the cloud, Heroku is also your portal to the world. Other vendors of PaaS include Amazon with Amazon Web Services (AWS) and Google with the Google App Engine. Google offers a model where application development itself is still something done locally. Cloud APIs Most major internet services such as Facebook, LinkedIn, Amazon, Salesforce and Twitter offer APIs so that programmers can create software that tap into their offerings. Figure 2. Software Development Life Cycle in the Cloud