Minimize Downtime for MySQL Upgrades: How to Work Around RDS Limitations

Kishore Nallan
Kishore Nallan / November 30, 2017

If you've worked on a web application backed by a relational database, you will understand how frequently the underlying schema changes. You might add or remove columns, change the type of existing columns, or add new indices as your business evolves.

While modern ORMs make such schema changes fairly simple, the resulting migrations on our production database at Zapier was becoming painful. We were using MySQL 5.5, and ALTER statements on 5.5 involve a "lock and copy" of the entire table. With a large database having several hundred GB of data, this meant an interruption to our production traffic lasting for at least 30 minutes. To reduce the impact on our customers, we had to schedule these migrations during a planned maintenance window in the middle of the night every few weeks.

In addition to being a painful process, the scheduled downtime also slowed our engineering team's momentum. Features that required a schema migration took longer to ship as we had to buffer and plan these periodic migrations in advance.

As a company which otherwise shipped code to production multiple times a day, this was simply not good enough. We decided to upgrade to MySQL 5.6, which has online DDL migrations that run ALTER statements on-the-fly without locking the tables. Here's how we switched, and what we learned from upgrading to a new MySQL version.

The Naive Way to Upgrade MySQL

Amazon RDS recommends the following process for upgrading a MySQL 5.5 instance:

  1. Create a new read replica of the MySQL 5.5 database in production
  2. Modify the replica's MySQL version to 5.6 to upgrade it
  3. Schedule a maintenance window to promote the new 5.6 read-replica to a master
  4. Enable multi-AZ on the new 5.6 master, create additional 5.6 replicas
  5. Point our systems to the new 5.6 master and replicas

Although RDS makes the process simpler than it would be with a self-hosted MySQL set-up, it's still a very time-consuming process because of the amount of data and replicas involved. Can we do better?

Create a Better MySQL Upgrade with "Master-Master" Replication

To keep the maintenance window for the upgrade small, we decided to explore if we can get the new MySQL 5.6 master and replicas replicating off the MySQL 5.5 master database and running in parallel before the maintenance window begins.

This way, switching to the 5.6 instances or rolling back would be simple. In addition, we can also canary live traffic to the new databases well in advance to identify any performance bottlenecks that might derail the migration.

Here's what we wanted to achieve:

               |  5.5   |
               |        |  (Production set-up)
               |        |
               |        |
                   |    replication
       |               |                        |
   +---+----+      +---+----+               +---+----+
   |  5.5   |      |  5.5   |               |  5.6   |
   |--------|      |--------|               |--------|
   |        |      |        |               |        |  (New 5.6 set-up)
   |        |      |        |               |        |
   |        |      |        |               |        |
   +--------+      +--------+               +---+----+
                                                | replication
                                          |               |
                                          |               |
                                      +---+----+      +---+----+
                                      |  5.6   |      |  5.6   |
                                      |--------|      |--------|
                                      |        |      |        |
                                      |        |      |        |
                                      |        |      |        |
                                      +--------+      +--------+

Note how the 5.6 master is actually replication off the 5.5 master. This sounds great, but unfortunately RDS console does not support a master-master replication directly. So, we had to find another way to do it.

In our exploration, we came across a mysql.rds_set_external_master command that would configure a RDS instance to replicate from an instance of MySQL running external to Amazon RDS.

That sounded promising—if we can replicate from an external MySQL instance, we can do the same from an RDS instance right?

It seemed like it should, but we ran into a small problem. The mysql.rds_set_external_master command required a user with replication slave permissions to be present on the MySQL 5.5 master instance. When we created a new mysql56_repl user and tried to give it the required permission, we encountered an error:

mysql> CREATE USER 'mysql56_repl'@'%' IDENTIFIED BY '<Password>';

Query OK, 0 rows affected (0.00 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'mysql56_repl'@'%' IDENTIFIED BY '<Password>';

ERROR 1045 (28000): Access denied for user 'muser'@'%' (using password: YES)

The master user on RDS did not have the privileges to assign this permission on MySQL 5.5. Thankfully, when we contacted AWS support, they were able to grant the replication slave permission to the mysql56_repl user from their end.

With that out of the way, we were able to create a completely parallel set-up of MySQL 5.6 master and replicas, replicating off the 5.5 MySQL DB on production. If you are looking to do just that, here's what you should do, step-by-step:

  1. Launch a new read-replica of the MySQL 5.5 DB instance.
  2. When the read replica is available, upgrade the replica to MySQL 5.6.
  3. Pause replication on this 5.6 replica.
  4. Login to the 5.6 replica and run SHOW SLAVE STATUS\G.
  5. Note the Master_Log_File and Read_Master_Log_Pos values. We need this for running the mysql.rds_set_external_master configuration later.
  6. Create a snapshot of the MySQL 5.6 replica. You don't need this 5.6 replica anymore and can terminate it.
  7. Create a new RDS master MySQL 5.6 instance from the snapshot you just created.
  8. Login to the old 5.5 RDS master instance and create replication user and grant replication slave permission as we described earlier.
  9. Make sure that the security group of your 5.5 RDS master instance allows egress.
  10. Login to the new RDS master and configure it to replicate off the 5.5 master database this way:
    CALL mysql.rds_set_external_master (
    <host name of the 5.5 master DB>,
    <port of the 5.5 master DB>,
    <replication user name>,
    <replication user password>,
    <value of `Master_Log_File` noted earlier>,
    <value of `Read_Master_Log_Pos` noted earlier>
  11. Use the mysql.rds_start_replication to start the replication on the new 5.6 master instance. If all goes well, here's what
    you will see:
    mysql> CALL mysql.rds_start_replication;
    | Message |
    | Slave running normally. |
    1 row in set (1.18 sec)
  12. Your new 5.6 master is now a slave as well, by replicating asynchronously off the 5.5 master. You can create read replicas of this 5.6 master if you want to mirror your production set-up.

At this point, you can start canarying production read traffic to the MySQL master and replica to verify that everything is fine. Switching over to the new instances would now be as simple as updating your configuration to point to the new hosts, and will no longer involve a lengthy maintenance window anymore.

Load Comments...

Comments powered by Disqus