Monday, December 27, 2010

An EC2 Conundrum

Whenever I would lecture on Amazon's EC2, I would point out that Amazon's internal infrastructure is (effectively) EC1, and when they need capacity, it comes from EC2. In the past few weeks, I've seen this first hand. Of course, we need to remember that this is Amazon's peak period, so a resource crunch should be expected as part of the normal patterns of business activity (BPA), but I was caught by surprise in this one respect.

The Amazon cloud, known as Amazon Web Services (AWS), is billed on three meters:
* VM Resources, such as CPU and memory
* Storage, either block volumes or web based files
* Bandwidth, both into and out of the cloud

I have a set of VMs that I launch as needed, so I am not always billed for cycles or bandwidth. When I need the VM, I don't want to have to upload all the supporting applications and data, or go through a complex configuration procedure. The solution was to grab an Elastic Block Store (EBS), which looks to a Linux VM as a disk device. I provision the VM, connect the volume, log in, mount the device, where I have a set of scripts that rebuild the application server in less than 1 minute.

Here's where I got burned: The EBS is actually a LUN on a SAN, which resides in a data center, somewhere in the world. Amazon has four regions: Virginia, California, Ireland, and Singapore. I picked Virginia. But in Virginia, they have four data centers, called availability zones, labeled A, B, C, and D. My volume is in Virgina "B". Unfortunately, they have insufficient capacity in VA-B to launch a VM, as of about 21 December.

This means I've got stuff on a disk, somewhere across the Potomac, that I can't get to, because I don't have a machine to access it. I could launch a VM in VA-C or VA-D, but there is no native mechanism to allow VMs to mount disks that live in another data center. Thus the conundrum: How do we protect against this situation?

The answer is obvious: clustered replication. Two EBS volumes in different data centers, with one VM acting as the master node, and another VM acting as the replication node. Unfortunately, this doubles the cost of the system... From $15 a month to $30 a month. Not really that much... and that assumes my data is critically important, which it isn't.

But you'd think Amazon would have provided a way to prevent this from happening. After all, its not like me paying twice as much on a monthly basis is something they'd actually want to happen.

No comments:

Post a Comment