一、问题提出 问题是由姜大师提出的、问题如下: 表: mysql> show create table c \G *************************** 1. row *************************** Table: c Create Table: CREATE TABLE `c` ( `a` int(11) NOT NULL AUTO_INCREMENT, `b` int(11) DEFAULT NULL, PRIMARY KEY (`a`), UNIQUE KEY `b` (`b`) ) ENGINE=InnoDB 1 row in set (0.01 sec) 开启两个会话不断的执行 replace into c values(NULL,1); 会触发死锁。问死锁触发的原因。
我使用的环境: MYSQL 5.7.14 debug版本、隔离级别RR、自动提交,很显然这里的c表中的可以select出来的记录始终是1条 只是a列不断的增大,但是这里实际存储空间确不止1条,因为从heap no来看二级索引中,heap no 已经到了 7,也就是有至少7(7-1)条记录,只是其他记录标记为del并且被purge线程放到了page free_list中。
MySQL thread id 2, OS thread handle 140737154311936, query id 642 localhost root cleaning up
---lock strcut(1):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
TABLE LOCK table `test`.`c4` trx id 184771 lock mode IX
---lock strcut(2):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 413 page no 4 n bits 72 index id2 of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X)
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0:len 4;hex 80000014;asc;;
1:len 4;hex 80000014;asc;;
---lock strcut(3):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 413 page no 3 n bits 72 index PRIMARY of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0:len 4;hex 80000014;asc;;
1:len 6;hex 00000002d1bd;asc;;
2:len 7;hex a600000e230110;asc # ;;
3:len 4;hex 80000014;asc;;
---lock strcut(4):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 413 page no 4 n bits 72 index id2 of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP)
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0:len 4;hex 8000001e;asc;;
1:len 4;hex 8000001e;asc;;
正常的版本只有
点击(此处)折叠或打开
---TRANSACTION 184771, ACTIVE 45 sec
4 lock struct(s), heap size 1160, 3 row lock(s)
MySQL thread id 2, OS thread handle 140737154311936, query id 642 localhost root cleaning up
部分后面的都是我加上的,其实修改很简单,innodb其实自己写好了只是没有开启,我开启后加上了序号来表示顺序。 上面是一个 select * from c where id2= 20 for update; b列为辅助索引的所有4 lock struct(s),可以看到有了这些信息分析 不那么难了。 这里稍微分析一下 表结构为: mysql> show create table c4; +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | c4 | CREATE TABLE `c4` ( `id1` int(11) NOT NULL, `id2` int(11) DEFAULT NULL, PRIMARY KEY (`id1`), KEY `id2` (`id2`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) 数据为: mysql> select * from c4; +-----+------+ | id1 | id2 | +-----+------+ | 1 | 1 | | 10 | 10 | | 20 | 20 | | 30 | 30 | +-----+------+ 4 rows in set (0.00 sec) 语句为: select * from c where id2= 20 for update; RR模式 从锁结构链表来看,这个语句在辅助索引分别锁定了 id2:20 id1:20 LOCK_X|LOCK_ORDINARY 也就是NEXT KEY LOCK 同时锁定了 id2:30 id1:30 LOCK_X|LOCK_GAP也就是gap lock不包含这一列 那么画个图容易理解黄色部分为锁定部分:
修改出现的问题:修改源码打印出所有lock struct 在线上显然是不能用的。因为打印出来后show engine innodb status 会非常 长,甚至引发其他问题,但是测试是可以,其次修改了打印死锁事物锁链表到日志后,每次只要遇到死锁信息可以打印 到日志,但是每次MYSQLD都会挂掉,但是不影响分析了。
三、预备知识(自我理解) 1、 Precise modes: #define LOCK_ORDINARY 0 /*!< this flag denotes an ordinary next-key lock in contrast to LOCK_GAP or LOCK_REC_NOT_GAP */ 默认是LOCK_ORDINARY及普通的next_key_lock,锁住行及以前的间隙 #define LOCK_GAP 512 /*!< when this bit is set, it means that the lock holds only on the gap before the record; for instance, an x-lock on the gap does not give permission to modify the record on which the bit is set; locks of this type are created when records are removed from the index chain of records */ 间隙锁,锁住行以前的间隙,不锁住本行 #define LOCK_REC_NOT_GAP 1024 /*!< this bit means that the lock is only on the index record and does NOT block inserts to the gap before the index record; this is used in the case when we retrieve a record with a unique key, and is also used in locking plain SELECTs (not part of UPDATE or DELETE) when the user has set the READ COMMITTED isolation level */ 行锁,锁住行而不锁住任何间隙 #define LOCK_INSERT_INTENTION 2048 /*!< this bit is set when we place a waiting gap type record lock request in order to let an insert of an index record to wait until there are no conflicting locks by other transactions on the gap; note that this flag remains set when the waiting lock is granted, or if the lock is inherited record */ 插入意向锁,如果插入的记录在某个已经锁定的间隙内为这个锁 2、参数innodb_autoinc_lock_mode的值为1,也许不能保证replace into的顺序。 3、infimum和supremum 一个page中包含这两个伪列,页中所有的行未删除(删除未purge)的行都连接到这两个虚列之间,其中 supremum伪列的锁始终为next_key_lock。 4、heap no 此行在page中的heap no heap no存储在fixed_extrasize 中,heap no 为物理存储填充的序号,页的空闲空间挂载在page free链表中(头插法)可以重用, 但是重用此heap no不变,如果一直是insert 则heap no 不断增加,并非按照KEY大小排序的逻辑链表顺序,而是物理填充顺序 5、n bits 和这个page相关的锁位图的大小如果我的表有9条数据 还包含2个infimum和supremum虚拟列 及 64+11 bits,及75bits但是必须被8整除为一个字节就是 80 bits 6、隐含锁(Implicit lock)和显示锁(explict) 锁有隐含和显示之分。隐含锁通常发生在 insert 的时候对cluster index和second index 都加隐含锁,如果是UPDATE(DELETE)对cluster index加显示锁 辅助 索引加隐含锁。目的在于减少锁结构的内存开销,如果有事务需要和这个隐含锁而不兼容,这个事务需要帮助 insert或者update(delete)事物将隐含 锁变为显示锁,然后给自己加锁,通常insert主键检查会给自己加上S锁,REPLACE、delete、update通常会给自己加上X锁。
我们可以隐隐约约看到row_ins_sec_index_entry_low和row_ins_clust_index_entry_low回检查是否有重复的行 分别代表是二级索引和聚集索引的相关检查,因为就这个案例主键不可能出现重复值,而二级索引这个例子中肯定是 重复的,索引row_ins_sec_index_entry_low触发了等待,其实我们知道这里的锁方式如下列子: ---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK chain! for this Trx RECORD LOCKS space id 406 page no 4 n bits 72 index b of table `test`.`c` trx id 177891 lock_mode X(LOCK_X) waiting(LOCK_WAIT) Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000001; asc ;; 1: len 4; hex 80000006; asc ;; LOCK_X|LOCK_ORDINARY|LOCK_WAIT:需要X的next_key lock处于等待状态他需要锁定(infimum,{1,6}]这个区间。 这也是死锁发生的关键一个环节。
MySQL thread id 5, OS thread handle 140734658983680, query id 4646 localhost root update
replace into c values(null,1)
---lock strcut(1):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
TABLE LOCK table `mysqlslap`.`c` trx id 289636 lock mode IX
---lock strcut(2):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X)
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0:len 8;hex 73757072656d756d;asc supremum;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0:len 4;hex 80000001;asc;;
1:len 4;hex 80001221;asc!;;
---lock strcut(3):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP)
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0:len 4;hex 80000001;asc;;
1:len 4;hex 80001220;asc;;
---lock strcut(4):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X)
---lock strcut(5):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X)
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0:len 4;hex 80000001;asc;;
1:len 4;hex 80001220;asc;;
---lock strcut(6):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 407 page no 3 n bits 104 index PRIMARY of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X) locks rec butnot gap(LOCK_REC_NOT_GAP)
Record lock, heap no 8 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0:len 4;hex 80001221;asc!;;
1:len 6;hex 000000046b64;asc kd;;
2:len 7;hex 30000001f00c97;asc 0 ;;
3:len 4;hex 80000001;asc;;
---lock strcut(7):(Add by gaopeng)In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP) insert intention(LOCK_INSERT_INTENTION) waiting(LOCK_WAIT)
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32