EC2 is built for scalability and distributed systems. The article seems to give a negative spin on something which is the authors own fault.
In most traditional hosting environments you have e.g. one web server. If this one server goes down your website stops working. A solution would be to have two web servers behind a load balancer. If one web server goes down, the other takes over and your site continues to work.
A lot of people who are hosting on EC2 place all their application components in the Virginia data-centers (because its the cheapest data center for reasons the article points out). If the Virginia data center is down nothing works any more. However, EC2 gives you the option to distribute your website over multiple data centers in case of an event like this.
If you choose not to take advantage of this architecture you're no different than running your website on a single web server. E.g. a single point of failure. With EC2 you have to ability to set up a website that never goes down.
Of course, distributing your web site over multiple data centers can be costly. But I guess it's pick and choose, not bitch and moan.
You're confusing AWS regions with AWS availability zones. You're not technically wrong - it is possible to have your application spread across regions and doing so would have protected against this outage - but doing so is slow and expensive, and the actual failover to a different region is difficult. Amazon explicitly recommends that if you distribute across multiple availability zones within the same region, you should be robust to the majority of outages, which should only take out one AZ. The current AWS outage is affecting all AZs in the US-East region, which Amazon claims should never happen.
This earlier post [1] (HN discussion [2]) discusses this in more detail.
No, I'm not confusing regions with availability zones. What I'm saying is that AWS gives you all the components to set up a website that never goes down. If you don't take advantage of this then there is no else to blame but yourself. Of course doing this can be very expensive and it's choice you have to make.
Some people keep saying things like; "Well, Amazon promised us that zones don't have a single point of failure". Well, sucked in I guess. Apparently they do.
Ok, come on. Now you're just trolling. Amazon also promises not to share your credit card information. It should be obvious that trust is necessary in business.
Well, US East (North Virginia) is actually 4 datacenters. As pointed out in the article, multiple availability zones (=multiple datacenters) are affected by the outage, meaning that even if you deploy on multiple datacenters, you can go down.
Now, if you're talking about deploying to e.g. both US East and US West, I totally agree: it would be a good thing to do. But EC2 does not give you that option - not easily, at least, because there is no convenient way to move volumes or snapshots between regions.
Setting up HA between close datacenters (e.g. 50 miles from each other) is easy, because the latency remains low. Setting up HA between datacenters coast to coast is a whole different story, and the only help brought by EC2 is the fact that you can use the same API to deploy your machines here and there.
With EC2 you have to ability to set up a website that never goes down.
I find it hard to believe that you have any practical experience to back your claim. Today's incident affected sites that scrupulously respected all HA best practice.
At least according to the blog post, it seems like part of the issue may be a stampede of requests due to the complete outtage of a single AZ. If every single service hosted in the Amazon cloud did some sort of solution where they did failover to another Region in the case of multiple AZ outtages, how can we be sure that that stampede of requests wouldn't take out that other region? Could Amazon handle the entirety of the US East workload being dumped onto Europe/US West?
Just to be clear. DotCloud is in fact designed to withstand instances randomly crashing.
So far however, it has not been designed for instances randomly crashing across multiple datacenters. I will add that neither is the canonical high-availability designed recommended by Amazon.
Why not? That's the whole point of cloud computing, that you can cater, or have to ability to cater, for situations like this. If you don't take advantage of those abilities, well, then thats is something you have to sort out with yourself.
Choosing to be on AWS or any other cloud provider means you accept some risk of things going down. Build to fail and when that fails, it all comes down to whether or not you can do it better and how much cost you're willing to bear to get to your goal of HA. For me, I know AWS can do a better job at hosting than I can and can accept multiple AZs going down.