Amazon kills the Internet. Amazon breaks lonely hearts – so lonely hearts decide to break Amazon. It was a dark and stormy evening in Virginia on 29 June.
And somewhere out there, a number of lonely souls were left wondering why they could not pay for a date. According to WhatsYourPrice.com, a dating website in the US which allows people to bid for first dates, the online dating community is one of the most boisterous when it comes to downtime (come on, who needs downtime when there is dating to be done!). When Amazon’s cloud went down in the East-Coast 2 region as a result of severe thunderstorms which brought down connections to its main power supplier, it was thought its backup generator would kick in.
But it failed. And thousands of singles (at least we hope they were singles) raced to complain. It was not the first time such an incident had occurred. The dating service requires 100% uptime, and a good reputation. Amazon suffers another outage. Amazon has experienced more problems at its North Virginia data center, with some customers in its US-EAST-1 Region experiencing downtime until 12:32 PDT yesterday.
Cloud application platform Heroku experienced downtime, along with Reddit and Netflix, among others according to company status reports. Reddit said the issue “appears to be the network-related”. Services were lost for several hours, according to a blog post on Systems Watch, a site designd t to “provide transparency into modern day network computing”. How an AWS Outage Can Cause Issues Even When You're in Multi-Availability Zones. Today Amazon Web Services experienced multiple degradations of service in one availability zone in Virginia region (us-east-1), which caused sites like Reddit, Netflix, and many more sites to lose service for several hours.
Most posts and comments on the topic on Twitter or various articles bash developers or systems folks for being architected in a single zone. We have spoken and experienced first hand that a lot of companies regardless of being architected in multi-availability zones have been affected by a so called AWS single zone issue. How is this possible? One of the most common complaints is that the glue that holds Amazon Web Services together is their management API stack. Some use the literal programmatic API libraries, others maybe AWS Management Console, and some RightScale, however they all are layers built on the AWS API Platform and when that goes all bets are off. AWS API Errors and AWS Management Console Errors. Down Goes The Internet… Again. Amazon EC2 Outage Takes Down Foursquare, Instagram, Quora, Reddit, Etc.
Amazon EC2 outage: summary and lessons learned. Last Thursday's Amazon EC2 outage was the worst in cloud computing's history.
It made the front page of many news pages, including the New York Times, probably because many people were shocked by how many web sites and services rely on EC2. Seeing so much affected was a very graphical illustration of how pervasive cloud computing has become. I will try to summarize what happened, what worked and didn't work, and what to learn from it. Widespread Application Outage. Starting last Thursday, Heroku suffered the worst outage in the nearly four years we've been operating.
Large production apps using our dedicated database service may have experienced up to 16 hours of operational downtime. Some smaller apps using shared databases may have experienced up to 60 hours of operational downtime. Code deploys were unavailable across some parts of the platform for almost 76 hours - over three days. In short: this was an absolute disaster. Amazon EC2 outage calls 'availability zones' into question. Network World - For cloud customers willing to pony up a little extra cash, Amazon has an enticing proposition: Spread your application across multiple availability zones for a near-guarantee that it won't suffer from downtime.
"By launching instances in separate Availability Zones, you can protect your applications from failure of a single location," Amazon says in pitching its Elastic Compute Cloud service. FAQ: Cloud computing, demystified Customers who build applications in just one availability zone are more likely to suffer outages. But what happens when multiple availability zones go dark at the same time? We found out today when an outage forced websites such as Foursquare, Reddit, Quora and Hootsuite offline.
Cloud Failure, FUD, and The Whole AWS Oatage… « Composite Code. Ok.
First a few facts. AWS has had a data center problem that has been ongoing for a couple of days.AWS has NOT been forthcoming with much useful information.AWS still has many data centers and cloud regions/etc up and live, able to keep their customers up and live.Many people have NOT built their architecture to be resilient in the face of an issue such as this. It all points to the mantra to “keep a backup”, but many companies have NOT done that.Cloud Services are absolutely more reliable than comparable hosted services, dedicated hardware, dedicated virtual machines, or other traditional modes of compute + storage.Cloud Services are currently the technologically superior option for compute + storage. Now a few personal observations and attitudes toward this whole thing. If you’re site is down because of a single point of failure that is your bad architectural design, plain and simple.
I’m honestly not trying to defend AWS either. Like this: