Some Amazon hosted web sites experienced a 40 minute outage the morning of 12/28/2008 (9:30am PST to 10:10am PST) as network connectivity to one of Amazon’s “EC2 availability Zone” dropped.  About 40 minutes into the outage, and just before it was fixed, Amazon posted a note on their dashboard page.

“One of our EC2 Availability Zone is experiencing reduced connectivity to the Internet at present. We are shifting traffic to another provider and expecting fully restored connectivity shortly.

A couple of observations:

  1. This frustrating kind of outage, while usually infrequent, can happen at any datacenter.  This one get’s put in the “this better not happen very often” mental bucket.
  2. I am disappointed with how long it took Amazon to post any kind of announcement.  It took me more than a few minutes to truly convince myself that it was Amazon’s issue and not a burp in our own operations.  I checked one other site, userscripts.org, that I know  hosted in EC2 and that site didn’t seem to be affected.  Apparently they are hosted in a different “EC2 availability zone”.
  3. I’m pleased with how our own monitoring of the website worked.  Most of our monitoring is done with Nagios from within the Amazon Cloud.  Since the issue was connectivity TO the cloud, that set of monitoring didn’t notice anything wrong.  According to it’s point of view, everything was operating normally.  Sort of surprisingly, RightScale’s, the service we use to help us host our site on EC2, monitoring also reported everything as normal.  Fortunately, we have a second set of monitoring hosted on a Linode to monitor the health of the main Nagios monitor.  This guy alarmed as it was suppose to (~$20/month well spent) and at 9:30am PST, I knew our site was down (though there wasn’t much I could do about it).

And a note or two:

  1. With infinite time and funds we’d probably have the site load balanced between multiple clouds.  That will happen some day.
  2. Always have a second set of monitoring completely independent from your main site operations.  It can just be a heartbeat monitor but it is worth the time and money to set up.

Leave a Reply