Bulletproofing MySQL replication with checksums. What and Why?
MySQL’s replication solution evolved as a statement based technology. Instead of sending actual block changes, MySQL just has to log committed transactions, and reapply those on the slave side. This affords a wonderful array of topologies and different uses, but has it’s drawbacks. The biggest occur when data does not get updated or changed in the same way on the slave. If you’re new to MySQL or coming from the Oracle world you might expect that this would flag an error. Mixed transactional and non-transactional tablesuse of non-deterministic functions such as uuid()stored procedures and functionsupdate with LIMIT clause There are others but suffice it to say if you want to rely on your slave being consistent, you need to check it! The solution – mathematical checksums If you’re a seasoned Linux user, you’re probably familiar with the md5sum command. It turns out that MySQL can checksum tables too.
Enter Percona’s pt-table-checksum tool formerly part of Maatkit. [client] Multi-Master Replication Manager for MySQL [MMM for MySQL Wiki] MySQL skip duplicate replication errors. Normally MySQL replication will stop whenever there is an error running a query on the slave.
This happens in order for us to be able to identify the problem and fix it, and keep the data consistent with the mater that has sent the query. You can skip such errors, even if this is not recommended, as long as you know really well what are those queries and why they are failing, etc. For example you can skip just one query that is hanging the slave using: There might be cases where you will want to skip more queries. For example you might want to skip all duplicate errors you might be getting (output from show slave status;): If you are sure that skipping those errors will not bring your slave inconsistent and you want to skip them ALL, you would add to your my.cnf: As shown above in my example 1062 is the error you would want to skip, and from here we have: _ Error: 1062 SQLSTATE: 23000 (ER_DUP_ENTRY) Message: Duplicate entry ‘%s’ for key %d_
Restarting MySQL master-master replication. Restarting MySQL master-master replication If your MySQL (5.0+) replication is broken, there’s two ways to fix it: The easy way, and the right way.
Run commands starting with $ on Unix. Run commands starting with mysql> in the MySQL client. The easy way: Skip the problem If you hit both databases at the same time, with the same INSERT, they will create their own record, and try and replicate to the other, which already has that record, causing a duplicate error. In a simple case like that, you just want to skip the offending statement: mysql>SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; More details on skipping MySQL duplicate errors Most of the time, you skip one statement, and replication breaks again straight away, because there’s a whole queue of problem statements coming up.
The right way: Rebuild First make sure your site is only using the master server. We have two database servers: Good Server: The good one, with the correct data.Rebuilding Server: The one we are fixing. 1. [Maia-users] Two-way replication with MySQL 5.x. 16.3.1.1 Replication and. A Better MySQL Replication Heartbeat. If you’ve used MySQL replication you’ve probably discovered that slave machines can lag behind the master.
Replication can also break completely, requiring hours (or days) for the slave hours to catch up. Monitoring is required to catch issues before the slaves get too far behind. Jeremy Zawodny has suggested a heartbeat mechanism to monitor the delay between the master and the slave. (I’m not sure if he came up with this solution). His suggestion is to periodically insert a row into a heartbeat table on the master. There are a few problems with this solution. A new solution You can get MySQL to do the hard work for use by taking advantage of the difference in behavior between SYSDATE and CURRENT_TIMESTAMP.