# 使用show processlist语句查看会话状态信息,发现DDL语句的state列值为Waiting for table metadata lock,表示在等待MDL元数据锁。根据MySQL 5.7及其之后的版本中的online ddl特性,该语句应该立即执行完成(它只会修改元数,因为这里只是修改了字段长度,并没有修改字段的其他属性),因此,故障现象确认
admin@localhost : (none) 11:48:22> show processlist;
+----+-------+-----------+--------+---------+------+---------------------------------+-----------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+--------+---------+------+---------------------------------+-----------------------------+
| 27 | admin | localhost | sbtest | Sleep | 123 | | NULL |
| 28 | admin | localhost | sbtest | Query | 102 | Waiting for table metadata lock | alter table sbtest1 modify column `pad` varchar(70) COLLATE utf8_bin NOT NULL DEFAULT '' |
| 29 | admin | localhost | NULL | Query | 0 | starting | show processlist |
| 30 | admin | localhost | NULL | Sleep | 93 | | NULL |
+----+-------+-----------+--------+---------+------+---------------------------------+-------------------------------+
4 rows in set (0.00 sec)
首先,我们查看主机负载信息,通过下图我们可以看到,主机基本处于空载状态,毫无压力
然后,我们查看数据库的活跃会话数量及其状态,我们可以看到数据库中并没有大量会话,也不存在正在执行的DML语句在操作表sbtest1,也不存在同时有其他会话同时使用DDL在操作相同的表,但这里无法确认是否存在未提交的事务
# 反复多执行几次show processlist语句
admin@localhost : (none) 11:49:10> show processlist;
+----+-------+-----------+--------+---------+------+---------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+--------+---------+------+---------------------------------+------------------------------------+
| 27 | admin | localhost | sbtest | Sleep | 149 | | NULL |
| 28 | admin | localhost | sbtest | Query | 128 | Waiting for table metadata lock | alter table sbtest1 modify column `pad` varchar(70) COLLATE utf8_bin NOT NULL DEFAULT '' |
| 29 | admin | localhost | NULL | Query | 0 | starting | show processlist |
| 30 | admin | localhost | NULL | Sleep | 119 | | NULL |
+----+-------+-----------+--------+---------+------+---------------------------------+---------------------------------------+
4 rows in set (0.00 sec)
先查看information_schema中记录的事务信息
# 发现并没有事务存在...
admin@localhost : sbtest 05:49:17> select * from information_schema.innodb_trx\G
Empty set (0.00 sec)
# 也可以顺便使用sys.innodb_lock_waits视图确认是否存在一些事务锁等待
admin@localhost : performance_schema 06:27:35> select * from sys.innodb_lock_waits\G
Empty set, 3 warnings (0.00 sec) # 查询结果为空
查看performance_schema下的MDL元数据锁记录信息
# WTF..居然为空
admin@localhost : sbtest 06:00:21> select * from performance_schema.metadata_locks;
Empty set (0.00 sec)
# 也可以顺便使用sys.schema_table_lock_waits视图查看表级别的锁等待
admin@localhost : performance_schema 06:28:12> select * from sys.schema_table_lock_waits\G
Empty set (0.00 sec) # 查询结果为空
查看performance_schema下的handle持有信息
# 发现表sbtest1的handle被thread_id=70的线程持有
admin@localhost : (none) 11:49:36> select * from performance_schema.table_handles where OWNER_THREAD_ID!=0;
+-------------+---------------+-------------+--------+-----------------+----------------+---------------+---------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | OWNER_THREAD_ID | OWNER_EVENT_ID | INTERNAL_LOCK | EXTERNAL_LOCK |
+-------------+---------------+-------------+---------+-----------------+----------------+---------------+---------------+
| TABLE | sbtest | sbtest1 | 140049018564288 | 70 | 6 | NULL | NULL |
+-------------+---------------+-------------+----------+-----------------+----------------+---------------+---------------+
1 row in set (0.00 sec)
# 通过performance_schema.threads表查看是哪个线程(thread_id是数据库内部的线程ID,我们需要看到与之对应的processlist id)
admin@localhost : (none) 11:50:03> select thread_id,processlist_id,type from performance_schema.threads where processlist_id is not null and type='FOREGROUND';
+-----------+----------------+------------+
| thread_id | processlist_id | type |
+-----------+----------------+------------+
| 43 | 1 | FOREGROUND |
| 69 | 27 | FOREGROUND |
| 70 | 28 | FOREGROUND | # 发现processlist id为28
| 71 | 29 | FOREGROUND |
| 72 | 30 | FOREGROUND |
+-----------+----------------+------------+
5 rows in set (0.00 sec)
# 通过show processlist再次查看一下id号,额。。发现id列为28的居然就是执行DDL语句被hang住那个会话,好吧,白忙活了
admin@localhost : (none) 11:50:26> show processlist;
+----+-------+-----------+--------+---------+------+---------------------------------+------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+--------+---------+------+---------------------------------+--------------------------------------+
| 27 | admin | localhost | sbtest | Sleep | 228 | | NULL |
| 28 | admin | localhost | sbtest | Query | 207 | Waiting for table metadata lock | alter table sbtest1 modify column `pad` varchar(70) COLLATE utf8_bin NOT NULL DEFAULT '' |
| 29 | admin | localhost | NULL | Query | 0 | starting | show processlist |
| 30 | admin | localhost | NULL | Sleep | 198 | | NULL |
+----+-------+-----------+--------+---------+------+---------------------------------+-----------------------------------------+
4 rows in set (0.00 sec)
通过show engine innodb status查看下锁与事务信息
# 仍然没有发现有效的锁信息
admin@localhost : performance_schema 06:14:13> show engine innodb status;
......
------------
TRANSACTIONS
------------
Trx id counter 11559
Purge done for trx's n:o < 11557 undo n:o < 0 state: running but idle
History list length 60
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421628104988048, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421628104987136, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421628104985312, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421628104984400, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421628104986224, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
......
排查到这里,视乎已经没招了,也许我们还可以用mysqladmin debug命令试试看
# 执行debug命令,执行该命令之后,一些锁信息可能会被debug出来打印到错误日志中
[root@physical-machine ~]# mysqladmin debug
# 很遗憾,在错误日志中仍然没有发现有效的锁信息
[root@physical-machine ~]# vim /data/mysqldata1/log/error.log
等等,我们好像忽略了点什么,既然DDL语句在等待MDL元数据锁,为啥在performance_schema.metadata_locks表中没有记录?查看一下MDL事件采集器试试看?
# 喔。。发现居然MDL锁信息的采集器开关并没有打开,难怪metadata_locks表中无法记录MDL元数据锁信息
admin@localhost : performance_schema 06:30:16> select * from performance_schema.setup_instruments where name like '%/mdl';
+----------------------------+---------+-------+
| NAME | ENABLED | TIMED |
+----------------------------+---------+-------+
| wait/lock/metadata/sql/mdl | NO | NO |
+----------------------------+---------+-------+
1 row in set (0.00 sec)
# 现在,我们启用mdl的采集器
admin@localhost : sbtest 07:18:52> call sys.ps_setup_enable_instrument('sql/mdl');
+-----------------------+
| summary |
+-----------------------+
| Enabled 6 instruments |
+-----------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
然后,我们重新查询information_schema下的事务和锁信息
# 查看information_schema.innodb_trx表,仍然没有事务信息
admin@localhost : sbtest 07:17:03> select * from information_schema.innodb_trx\G
Empty set (0.00 sec)
# 查看事务锁等待信息,仍然为空
admin@localhost : sbtest 07:17:30> select * from sys.innodb_lock_waits\G
Empty set, 3 warnings (0.01 sec)
查看performance_schema下的MDL元数据锁记录信息
# 查看metadata_locks表中的MDL锁信息,咦,有信息了!不过,有点乱啊(我们这里只查询sbtest库下的sbtest1表就可以了,因为我们的操作也只涉及到这张表)
admin@localhost : (none) 11:52:46> select * from performance_schema.metadata_locks where OBJECT_SCHEMA='sbtest' and OBJECT_NAME='sbtest1';
+-------------+---------------+-------------+-----+--------+---------------+-------------+--------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+---------------+-------------+-----+--------+---------------+-------------+--------+-----------------+----------------+
# 从这行信息我们可以看到,表sbtest1上有一把SHARED_WRITE锁处于GRANTED状态,为thread_id=69的线程所持有。SHARED_WRITE类型的MDL锁是一把意向排他IX锁,通常在执行MDL或select ... for update时产生
| TABLE | sbtest | sbtest1 | 140048817276288 | SHARED_WRITE | TRANSACTION | GRANTED | | 69 | 11 |
# 从这行信息我们可以看到,表sbtest1上有一把SHARED_UPGRADABLE的锁处于GRANTED状态,为thread_id=70的线程持有,从上文中的信息我们可以知道,thread_id=70的process id为28,也就是执行DDL语句被阻塞的那个会话id。SHARED_UPGRADABLE类型的MDL锁是一把共享升级锁,一般在执行online DDL语句时会产生。它的作用是在执行online ddl期间允许相同表的DML但防止DDL
| TABLE | sbtest | sbtest1 | 140049018604784 | SHARED_UPGRADABLE | TRANSACTION | GRANTED | | 70 | 6 |
# 从这行信息我们可以看到,表sbtest1上有一把EXCLUSIVE处于PENDING状态,为thread_id=70的线程在等待,从上文中的信息我们可以知道,thread_id=70的process id为28,也就是执行DDL语句被阻塞的那个会话id。EXCLUSIVE类型的MDL锁是一把排它X锁,用于阻止其他线程读写元数据信息,一般在执行DDL时产生
| TABLE | sbtest | sbtest1 | 140049018564112 | EXCLUSIVE | TRANSACTION | PENDING | | 70 | 6 |
+-------------+---------------+-------------+-----+---------+---------------+-------------+--------+-----------------+----------------+
3 rows in set (0.00 sec)
# 通过上述信息我们可以得知,thread_id=70的线程需要获取的EXCLUSIVE锁与thread_id=69的线程所持有的SHARED_WRITE锁冲突了,但thread_id=69线程的process id是多少呢?查看一下performance_schema.threads表
admin@localhost : (none) 11:53:47> select thread_id,processlist_id,type from performance_schema.threads where processlist_id is not null and type='FOREGROUND';
+-----------+----------------+------------+
| thread_id | processlist_id | type |
+-----------+----------------+------------+
| 43 | 1 | FOREGROUND |
| 69 | 27 | FOREGROUND | # 发现thread_id=69的线程process id为27
| 70 | 28 | FOREGROUND |
| 71 | 29 | FOREGROUND |
| 74 | 32 | FOREGROUND |
+-----------+----------------+------------+
5 rows in set (0.01 sec)
# 到这里,我们知道了DDL语句就是被process id为27的线程阻塞的,但,还不是很直观,查看起来比较繁琐,我们还是直接使用sys.schema_table_lock_waits视图查看表级别的锁等待试试看吧,可以发现,该视图打印的信息看起来就很清晰了(可以清晰看到谁在等待,谁持有锁),不过,怎么有两行?以哪行为准呢?
admin@localhost : (none) 11:59:04> select * from sys.schema_table_lock_waits\G
*************************** 1. row ***************************
object_schema: sbtest
object_name: sbtest1
waiting_thread_id: 70
waiting_pid: 28
waiting_account: admin@localhost
waiting_lock_type: EXCLUSIVE
waiting_lock_duration: TRANSACTION
waiting_query: alter table sbtest1 modify col ... E utf8_bin NOT NULL DEFAULT ''
waiting_query_secs: 744
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 69
blocking_pid: 27
blocking_account: admin@localhost
blocking_lock_type: SHARED_WRITE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 27 # 这一行表锁等待信息提示kill 掉process id为27的线程
sql_kill_blocking_connection: KILL 27
*************************** 2. row ***************************
object_schema: sbtest
object_name: sbtest1
waiting_thread_id: 70
waiting_pid: 28
waiting_account: admin@localhost
waiting_lock_type: EXCLUSIVE
waiting_lock_duration: TRANSACTION
waiting_query: alter table sbtest1 modify col ... E utf8_bin NOT NULL DEFAULT ''
waiting_query_secs: 744
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 70
blocking_pid: 28
blocking_account: admin@localhost
blocking_lock_type: SHARED_UPGRADABLE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 28 # 这一行表锁等待信息提示kill 掉process id为28的线程
sql_kill_blocking_connection: KILL 28
2 rows in set (0.01 sec)
DBA侧可通过在执行DDL语句的会话中,会话级别设置lock_wait_timeout系统变量为一个较小的值,在超过该时间值之后,仍然无法获得所需的锁时,自动放弃DDL操作(请自行评估需求)
root@localhost : sbtest 04:37:43> set lock_wait_timeout=10;
Query OK, 0 rows affected (0.00 sec)
root@localhost : sbtest 04:37:47> alter table sbtest1 modify column `pad` varchar(70) COLLATE utf8_bin NOT NULL DEFAULT '';
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
2)在会话1中,开启一个显式事务,并使用INSERT语句插入一行超过字段定义长度的数据行(模拟应用侧插入数据)
admin@localhost : sbtest:13: > begin;
Query OK, 0 rows affected (0.00 sec)
admin@localhost : sbtest:17: > insert into sbtest1 values(2,2,'40393031789-25132409365-58213491013-66541287984-65586459874-05762316127-59922091522-12151119251-49498591378-18011532520','test-29736863337-73672352543-26439979097-89323822066-87557735686');
ERROR 1406 (22001): Data too long for column 'pad' at row 1
3)在会话2中,将报错字段pad的长度加长到70个字符(模拟DBA侧使用DDL语句修改列长度定义)
# 查看表结构中的字段定义长度,可以发现,报错的pad列定义长度为varchar类型的60个字符长度
root@localhost : sbtest 04:12:03> show create table sbtest1;
+---------+------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------+-------------------------------------------------------------------------------------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT '0',
`c` char(120) COLLATE utf8_bin NOT NULL DEFAULT '',
`pad` varchar(60) COLLATE utf8_bin NOT NULL DEFAULT '', # pad字段的定义长度为60
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=20000001 DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+---------+-------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
# 查看INSERT语句中,pad列给出数据字符串的字符长度与字节长度
## INSERT语句中给出的pad列值字符长度为64
root@localhost : sbtest 11:01:33> select char_length('test-29736863337-73672352543-26439979097-89323822066-87557735686');
+---------------------------------------------------------------------------------+
| char_length('test-29736863337-73672352543-26439979097-89323822066-87557735686') |
+---------------------------------------------------------------------------------+
| 64 |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
## INSERT语句中给出的pad列值字节长度为64
root@localhost : sbtest 11:02:19> select length('test-29736863337-73672352543-26439979097-89323822066-87557735686');
+----------------------------------------------------------------------------+
| length('test-29736863337-73672352543-26439979097-89323822066-87557735686') |
+----------------------------------------------------------------------------+
| 64 |
+----------------------------------------------------------------------------+
1 row in set (0.00 sec)
# 使用alter语句修改pad列定义长度为70
root@localhost : sbtest 04:12:47> alter table sbtest1 modify column `pad` varchar(70) COLLATE utf8_bin NOT NULL DEFAULT '';
## 发生阻塞
4)在会话3中,查看数据库中的会话状态信息(模拟DBA侧排查故障问题)
# 可以看到info列的alter语句的state列值为Waiting for table metadata lock,表示在等待MDL锁
admin@localhost : (none) 11:50:55> show processlist;
+----+-------+-----------+--------+---------+------+---------------------------------+--------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+--------+---------+------+---------------------------------+--------------------------------+
| 27 | admin | localhost | sbtest | Sleep | 919 | | NULL |
| 28 | admin | localhost | sbtest | Query | 898 | Waiting for table metadata lock | alter table sbtest1 modify column `pad` varchar(70) COLLATE utf8_bin NOT NULL DEFAULT '' |
| 29 | admin | localhost | NULL | Query | 0 | starting | show processlist |
| 32 | admin | localhost | NULL | Sleep | 154 | | NULL |
+----+-------+-----------+--------+---------+------+---------------------------------+------------------------------------+
4 rows in set (0.00 sec)
PS:MDL元数据锁的类型有很多种,根据官方手册中对performance_schema.metadata_locks表的LOCK_TYPE字段的描述可得知,一共有9种(INTENTION_EXCLUSIVE、SHARED、SHARED_HIGH_PRIO、SHARED_READ、SHARED_WRITE、SHARED_UPGRADABLE、SHARED_NO_WRITE、SHARED_NO_READ_WRITE、EXCLUSIVE),但,官方手册中并未找到每一种MDL锁的具体含义和发生的场景,关于MDL锁更详细的信息可参考如下这两个链接
https://blog.csdn.net/finalkof1983/article/details/88063328
| 作者简介
罗小波·沃趣科技高级数据库技术专家
IT从业多年,主要负责MySQL 产品的数据库支撑与售后二线支撑。曾参与版本发布系统、轻量级监控系统、运维管理平台、数据库管理平台的设计与编写,熟悉MySQL体系结构,Innodb存储引擎,喜好专研开源技术,多次在公开场合做过线下线上数据库专题分享,发表过多篇数据库相关的研究文章。
亿速云「云数据库 MySQL」免部署即开即用,比自行安装部署数据库高出1倍以上的性能,双节点冗余防止单节点故障,数据自动定期备份随时恢复。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。
原文链接:http://blog.itpub.net/28218939/viewspace-2658975/