Sunday, February 20, 2011

MySQL Replication - Pt 2

Through the wonders of technology, I've already been advised that my proof of concept is full of crap because:
* Replication does not provide automated fail over.
* After a fail over, there is no way to sync back to master.

Bullet one is true, but it is an excuse. Without replication, a catastrophic failure means getting a backup server online with data as old as my last backup or rsync. (And remember: rsync'ing MySQL is notoriously unreliable.) With replication, I'm within a few transactions. And don't bother me with your silliness about moving the RAID disks to a spare machine you had laying around.

Bullet two is only moderately true. The simple answer is to switch slave to read only, dump the database, transfer it to repaired master, and restore. Now we're synced, so we stop slave, and restore the original relationship. Of course, there is no way of knowing how long we'll be in read only.

As it turns out, we can make a few changes to deal with bullet two. As described in 5.0 16.3.6 we swap the relationship, allow the repaired system to sync as a slave. Once the repaired system has ingested the writes, we restore the relationship. Planned outage should be minutes.

On slave, add to /etc/my.cnf:
log-bin=mysql-bin
On master, add to /etc/my.cnf:
report-host=sync_back
Effectively, we have added the ability for the slave to be master and master to be slave. If we add these at build time, the slave does not needed to be restarted at time of failure to become a master. Remember to add a replication user to the slave.

At time of master failure, we want to write to slave. From slave's MySQL console:
mysql> STOP SLAVE;
mysql> RESET MASTER;
Switch front ends to write to slave. Once master is repaired issue the CHANGE TO MASTER command described in the previous post, modified as needed. Master will slave the missed writes. Once caught up, we fail back to master, and on slave issue START SLAVE.

So nah!

No comments:

Post a Comment