Last week, Amazon Web Services experienced a lengthy outage of their Elastic Block Store that took down both popular websites like Reddit but also a number of online services providers such as Heroku that lacked an alternative location to host their database. Many observers say the outage demonstrates the difference between what users expect from an online service, and what cloud computing can actually deliver.
One company that suffered from the outage was Architectural Overflow, a seller of blueprints and architectural planning services to builders and architects. The company used Heroku to provide their application platform, which gave it the advantage of not having to run or maintain its own servers. After the outage, Lee Buescher, Archtectural Overflow's CEO, says that his confidence in Heroku is badly shaken.
"I assumed incorrectly they knew what they were doing," he admitted. "I've never been out that long. I host other [Ruby on Rails] applications and we have never had outages." While he sympathizes with Heroku's problems, he says that the lengthy outage, and the issues it caused, should never have happened. “I'm paying for it, so yeah, I expect it to work.” Buescher has started looking for alternatives to Heroku, and he's not alone.
The outage began on April 21, killing the EBS services at Amazon's flagship data center in Ashburn, Virginia. The worst was over about 12 hours after it started, with Amazon stating that functionality to all but one of its Availability Zones had been restored. Even so, it wasn't until April 25 that Amazon Web Services showed the US-EAST Region to be operating correctly.
Amazon received major criticism from its customers about its lack of communication concerning the outage. While the company did post regular reports on its status page, it made no public statement about the issue. This seems negligent when you consider that the problem may have knocked out thousands of businesses and websites during the 12 hours it was at its worst.
Buescher noted that he has received much better communication from some of the other online services he uses, such as ERP Software as a Service (SaaS) Netsuite. "They have an outage and they're communicating, even though their stuff isn't business critical," he observed. As a result, at least one AWS customer posted (falsely) in Amazon's forums that the outage threatened the lives of hundreds of cardiac patients because he felt he could not get through to the company in any other way. AWS staff never did respond to his post. Other users on Amazon's forums discussed alternative providers and services that might offer better customer service and business support.
The clear take-home lesson from this is that cloud computing may not be ready for mission-critical business applications. If you are thinking about using cloud computing for your business, you need to have a plan in place in case of outages. Worse, you may need to EXPECT to deal with outages from time to time. Furthermore, you can't assume that a company that offers cloud computing will get it right all the time, even if it is as large and reputable as Amazon.