Amazon

Summary of the December 24, 2012 Amazon ELB Service Event in the US-East Region. We would like to share more details with our customers about the event that occurred with the Amazon Elastic Load Balancing Service (“ELB”) earlier this week in the US-East Region.

While the service disruption only affected applications using the ELB service (and only a fraction of the ELB load balancers were affected), the impacted load balancers saw significant impact for a prolonged period of time. The service disruption began at 12:24 PM PST on December 24th when a portion of the ELB state data was logically deleted. This data is used and maintained by the ELB control plane to manage the configuration of the ELB load balancers in the region (for example tracking all the backend hosts to which traffic should be routed by each load balancer). The data was deleted by a maintenance process that was inadvertently run against the production ELB state data. This process was run by one of a very small number of developers who have access to this production environment. Why EBS was a bad idea - blog dot lusis. Since I just tweeted about this and I know people would want an explaination, I figured I’d short circuit 140 character hell and explain why I think EBS was the worst thing Amazon ever did to AWS.

First time I’ve had to do this but: the following is my personal opinion and in no way reflects any policy or position of my employer I remember when EC2 was first unleashed. At the time I was working at Roundbox Media (later Roundbox Global - because we had an office in Costa Rica!). I was asked frequently if we could possibly host some of our production stuff there. It was pretty much a no go from the start: No persistent IPsNo persistent storage Sure we could bake a bunch of stuff into the AMI root but the ephemeral storage wasn’t big enough to hold a fraction of our data. After I left RBX in Feb of 2008, I didn’t get an opportunity to work with AWS for a year or so and by then quite a bit had changed. For Amazon, EBS is NOT a bad thing.

The problem is it’s not. Latency Shared. Broken PMTUD on Amazon EC2. While at Amazon re:invent I had the opportunity to complain to some Amazonians again about an EC2 bug which has been annoying me for a long time: The default firewall rulset is broken.

I discovered this three years ago while debugging odd problems experienced by a Tarsnap user — sending a small amount of traffic worked fine, but as soon as large amounts of traffic started moving around, the TCP connection got stuck — and I've been complaining from time to time ever since; but somehow face-to-face communications tend to produce better results than mere emails.

As most standards-aware network administrators know, blocking ICMP is evil. Sure, there are certain ICMP packets which should be blocked; but to quote RFC 2979: and yet this is exactly what the default EC2 firewall does: It blocks all incoming ICMP packets, including ICMP "fragmentation needed" packets. The fix for EC2 users is easy, and I recommend applying it universally: Run ec2-authorize default -P icmp -t 3:4.