Tuesday, July 29, 2008

MySQL High Availability, Sandbox + Proxy

I've been doing a lot of research and testing lately in regard to getting a high availability MySQL solution in place. Namely, I've been looking into moving from a master/slave topology to a multi-master replication ring. This article is not intended to be a comprehensive howto; rather, a cursory overview of some of the available technologies and how they fit together.

Multi-master replication is one area where MySQL leads the pack of open-source DB solutions. Until 5.x, there wasn't really a satisfying solution in regard to avoiding primary-key collissions across masters. Fortunately, this key obstacle has since been resolved with the addition of a few simple commands:

server 1:
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 1


server 2:
server-id = 2
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 2


By defining a unique increment and offest value per server, primary key collisions on inserts are avoided when the binlog is processed. Telling MySQL not to replicate the same server ID means that in a circular replication topology, once an event has made it's way around the ring, it stops at the server from which it originated.

Setting up a test environment with multiple MySQL servers used to be a pain. Luckily, a relatively new project on the scene allows one to easily setup an arbitrary number of MySQL install instances on a single machine. Enter MySQL sandbox. Setting up a replication ring consisting of two masters is a matter of downlading the tarball of the version you want to sandbox and running the following command:

./make_replication_sandbox --master_master /path/to/mysql/tarball.gz

Passing the appropriate command-line arguments allows you to specify the number of servers you want sandboxed, concurrent versions you want installed, the topology, etc...

The last piece of software I want to mention is MySQL proxy. Although still in pre-beta, proxy is one of the more impressive pieces of software I've seen lately. As most proxies are, it's a piece of middleware, in this case, delegating wire requests between a client and MySQL server. An embedded lua interpreter allows fine-grained control over all aspects of communication. Creative uses include things like re-writing query syntax and providing a SQL interface to the Unix shell; however, the real attractiveness of proxy in my opinion is that it provides two very valuable facilities not baked into MySQL by default. The first is load balancing, and the second is failover. Consider an invocation such as the following:

#!/bin/sh
LUA_PATH="/usr/share/mysql-proxy/?.lua" /usr/sbin/mysql-proxy \
--proxy-address=:3306 \
--proxy-lua-script=/usr/share/mysql-proxy/rw-splitting.lua \
--proxy-backend-addresses=127.0.0.1:19101 \
--proxy-backend-addresses=127.0.0.1:19102 \
--proxy-read-only-backend-addresses=127.0.0.1:19103

This would tell proxy to listen on port 3306 in place of your standard MySQL server. Two read-write backends are given to proxy (ports 19101 and 19102) and one read-only backend (port 19103) is provided. Proxy is smart enough to delegate read-write requests to the masters and read-only requests to the slave. From my tests, requests seem to be sent to the first master by default until load picks up. From there, they seem to be load balanced in a round-robin type fashion. In the event that a server fails, proxy will gracefully remove a backend from the server pool until it comes back online. In regard to concurrency, it can scale well into 1000 connections from my own tests; although, real-world usage obviously depends on your hardware configuration among other things. Used in concert with MySQL sandbox, the possibilities are left to your imagination.

Obviously, I've only scratched the surface of each of these technologies. Really digging into their capabilities and learning their inherent limitations is left as an exercise to the reader. I leave you with the following links.

Dual-Master MySQL Replication Done Right


Advanced Replication Techniques

MySQL Sandbox

MySQL Proxy

2 comments:

Anonymous said...

How does the proxy determine if a query is "readonly" or "readwite"?

For instance:
SELECT create_new_session(?,?) bound with $username, $password

How does the proxy know if this select query actually changes anything in the database, or if "create_new_session" just did somethign repeatable like md5suming some user data form a table?

Travis Whitton said...

Ah, very good question. Obviously, deploying something like MySQL proxy in a production environment requires some careful consideration. Intimate knowledge of the intricacies of your schema is pretty much a requirement.

As I said in the post, proxy does most of it's work via an embedded lua interpreter. In the case of read-write splitting, it's accomplished using a lua script included with the distribution, rw-splitting.lua.

The script already takes certain query types into account that could cause issue such as SELECT SQL_CALC_FOUND_ROWS and SELECT LAST_INSERT_ID.

Before using proxy in production, you'd have to determine what queries are volatile and account for them in the splitting module. This isn't as hard as it sounds because a builtin tokenizer provides easy access to any portion of the query, but it obviously does require some work.

I guess it really boils down to whether the benefits provided by proxy outweigh the extra effort for your application. Until now, query splitting at the company I work for has been done in software. We have two codebases (PHP and Java), so taking advantage of splitting would require maintaining that logic in two locations. With proxy, we can simplify that to a single location. YMMV.