Like most websites, we at FanSnap make heavy use of MySQL as a database engine.   Despite it’s popularity, it definitely has it’s quirks.  Full disclosure, I am by no means a DBA though I often get to play one at work.  Last we week we implemented some much needed monitoring to alarm if some expected database activity didn’t happen.  Basically, if a timestamp field called updated_at doesn’t have a value less than 2 hours old, our Nagios monitoring system will send out an alert email.  Seemed like a simple query to me:

select now() - updated_at from the_table order by updated_at desc limit 1

and as long as “now() – updated_at” was less then 7200, Nagios wouldn’t complain.

Since updated_at could be updated a fairly random intervals, simple tests directly against the database seemed to prove the above query worked.  Well… Looking at slice of Nagios monitoring results showed something odd:

Time of Monitor   TimeDiff
========================
12/2/08 9:38       118
12/2/08 9:43       618
12/2/08 9:48       231
12/2/08 9:53       731
12/2/08 9:58       418
12/2/08 10:06      5199
12/2/08 10:11      88
12/2/08 10:16      588

Within the 5 minutes between monitoring checks at 9:48 and 9:53 the difference went up 500 seconds.  Huh?  There aren’t 100 seconds in an minute.  It wasn’t a fluke since it the same 500 seconds would show up in other intervals such as at 10:11 and 10:16.  Even more prenounced is the jump in seconds between 9:58 and 10:06.  It seems if there were no updates over an hour boundry the number would jump significantly.   Reading a litle documentation and realizing I was only interested in the number of seconds I converted both numbers to seconds using unix_timesteamp():

select unix_timestamp() - unix_timestamp(updated_at) from the_table order by updated_at desc limit 1

This worked much better. I scratched my head a bit and turned to FanSnap’s senior engineering team and queried “Does MySQL know how to do date arithmetic?” I was quickly admonished for not using DATE_SUB() and my pleas referencing pg 489 of the MySQL Reference Manual where it says, and I quote,

“In MySQL Version 3.23, you can use + and – instead of DATE_ADD() and DATE_SUB() if the expression on the right side is a date or datetime column.”

were dismissed as Noobish.  What made this harder to understand was the SQL query didn’t fail with any errors.  It went along its merry way returning results that almost looked like they made sense.

Can any MySQL experts enlighten me?

MySQL, Peace and Happiness,
Mike

About eight weeks ago, after two-plus years of service, my tenure at Flock, a social web browser built on Mozilla’s Firefox technology, was unfortunately and unexpectedly terminated for “corporate restructuring” reasons.  Fortunately I have been given an opportunity to work with a very experienced engineering team at FanSnap, an event ticket search engine that has just launched it’s first beta.

At this gig I have experienced the future of web service hosting and it is Cloud Computing.  The term has morphed with a few definitions over time but specifically I mean renting virtual machine time via a service like Amazon Web Services. More than the ability to scale rapidly as demand goes up (also scale back if demand goes down) at an unbelievably cheap price, it is the structure that it enforces you to adhere to that unleashes it’s power.  The temporal nature of these virtual machines (they can come ago at any time) creates an environment where the cost of not following system administration best practices is high enough to make you do it.  How many times have you tweaked a configuration scriptWhat does this mean?

  1. The configuration of record is NOT what is currently running on your systems. Rather it is the scripts, packages, and configurations that are stored in a repository and deployed to build your systems that is the Truth.
  2. Back up your data.  You can’t be paranoid enough with traditional data center hosting about the safety, integrity and availability of your data.  Being in The Cloud only heightens this.

The Source Of Truth
Doing the work up front to have #1 in place, rapidly pays itself off.  With the Truth as set of configurations and packages, we are able to spin up servers and have them in service in 15 to 30 minutes.  I’m looking forward to implementing auto-scaling on our site that will automatically put servers into production based on certain load conditions.

Cloud computing also creates a bit of mind twist for us old timers. Say you want to do a code update on your servers. Traditionally you would take your systems out of service (in some strategy that hopefully does not make your website go dark), update the code and put them back into service.  With the virtual machines, you can spin up new ones, validate and if all is ok, flip, or migrate depending on your code update strategy, the traffic over to these new machines.  Finally, decommission the old servers when they are no longer serving traffic. This also provides a very quick rollback path if needed.   Just back out the change to the load balancer that flipped the traffic to the new pool of servers and the old code is back in production.

We’ve also spun up high powered VMs for short term, CPU intensive tasks and spun them back down when we were done.  Instead of spending $5K on a machine, we can rent it for a few dollars for a few hours.

It’s The Data, Stupid
While having the ability for machines to come and go at will can make you giddy, making sure your data doesn’t disappear down a virtual black hole can keep you up at night.  What we’ve implemented, and this will likely evolve over time, is having two MySQL master/slave pairs (a primary and a standby) all kept in sync with MySQL Replication in different “Amazon Availability Zones” .  The standby pair is also regularly snapshotting its LVM volume to the S3 repository.  Finally, in case of a complete Amazon melt down, we use MySQL replication to create a copy of our database off the cloud and onto a physical(ish) server.

Log files from the web and app tiers are also sync’d to S3 for future processing and debugging.

Cloud Computing is here.  There are still valid concerns about stability and availability (then again, I’ve been hosted in datacenters such as Rackspace that have gone dark as well).  Monitoring is a pain in the arse since a server that was here one day may not exist the next.  Nevertheless, It is clearly the future of web service hosting.  The economies and associated processes are just too compelling.  We’re probably paying $.25 on the dollar of a fully managed solution such as Rackspace (I’m not dissing, Rackspace.  I found their service to be top notch when I used them).  It will be interesting to watch as the technology and market evolve over the next couple years.  I expect the trend of providing computing power as just another utility to become more realized over time as more AWS like services enter the market and the tools to use these services continue to mature.