使用MySQL5.6和Xtrabackup的小心一个bug,http://bugs.mysql.com/bug.php?id=70307,这个bug在5.6.23中已经修复。
Xtrabackup备份的时候执行flushs tables with read lock和show slave status会有可能和SQL Thread形成死锁,导致SQL Thread一直被卡主,STOP也没有用,Kill我们测试会丢失数据,只有Restart Server才行。
原因是SQL Thread的DML操作完成之后,持有rli->data_lock锁,commit的时候等待MDL_COMMIT,而flush tables with read lock之后执行的show slave status会等待rli->data_lock;修复方法是rli->data_lock锁周期只在DML操作期间持有。
重现步骤:
一、创建表
CREATE TABLE test (
id int(10) NOT NULL AUTO_INCREMENT,
age int(11) DEFAULT '0',
PRIMARY KEY (id),
KEY idx_age (age)
) ENGINE=InnoDB
二、master上执行update test set value=sleep(20)+53 where id=1;(增加sleep(20)是为了模拟方便,所以需要是statement的binlog format,row格式不行)
三、等同步到slave,并且正在执行时;执行flush tables with read lock;show slave status;就会阻塞住。
官方详细的解释和说明:
Bug#19843808: DEADLOCK ON FLUSH TABLES WITH READ LOCK + SHOW SLAVE STATUS Problem: If a client thread on an slave does FLUSH TABLES WITH READ LOCK; then master does some updates, SHOW SLAVE STATUS in the same client will be blocked. Analysis: Execute FLUSH TABLES WITH READ LOCK on slave and at the same time execute a DML on the master. Then the DML should be made to stop at a state "Waiting for commit lock". This state means that sql thread is holding rli->data_lock and waiting for MDL_COMMIT lock. Now in the same client session where FLUSH TABLES WITH READ LOCK was executed issue SHOW SLAVE STATUS command. This command will be blocked waiting for rli->data_lock causing a dead lock. Once this happens it will not be possible to release the global read lock as "UNLOCK TABLES" command has to be issued in the same client where global read lock was acquired. This causes the dead lock. Fix: Existing code holds the rli->data_lock for the whole duration of commit operation. Instead of holding the lock for entire commit duration the code has been restructured in such a way that the lock is held only during the period when rli object is being updated.