当前位置: 首页 > news >正文

c++排查线程hang住_show status和set gtid_mode 导致线程死锁案例

特别感谢远在重庆的高鹏和我们组的小伙伴王航威一起分析本次案例,推荐横屏查看

一 现象

我们数据库组今年上半年的计划之一是将所有数据库实例打开GTID特性。在线上进行灰度开启GITD过程中遇到数据库hang。具体表现是执行如下命令时:

set global gtid_mode=OFF_PERMISSIVE;

数据库hang住,阻塞读写请求,新建连接失败,业务不可用。

二 复现

高频率执行show global status 和 set gtid_mode ,MySQL实例很容易就hang住了。

for i in `seq  1 50000`;
do
my 3307 -e 'show global status' > /tmp/1 ;
sleep 1
done
for i in `seq  1 50000`;
do
my 3307 -e 'SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE;';
my 3307 -e 'SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE;';
my 3307 -e 'SET @@GLOBAL.GTID_MODE = ON;';
my 3307 -e 'SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE;';
my 3307 -e 'SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE;';
my 3307 -e 'SET @@GLOBAL.GTID_MODE = OFF;';
done

三 分析

3.1 获取数据库hang的时候的堆栈

pstack $pidofmysql

核心堆栈信息分别是 thread 2 执行show  global status 命令 thread 4 执行set global gtid_mode 命令

f3be110a2a1f394106044ed1062dc5df.png

1251c13c5f2bb2db3aa1724c53583753.png

3.2 使用gdb 分析当时阻塞到什么地方

注意: 堆栈信息的执行顺序实际上是从下往上的,#n大的先执行,我们分析的时候也需要向上到这来,比如thread 4 从#23开始

3.3 show global status 的堆栈

(gdb) thread 4
[Switching to thread 4 (Thread 0x7f8675f0b700 (LWP 49624))]#0 0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb)
[Switching to thread 4 (Thread 0x7f8675f0b700 (LWP 49624))]#0 0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f86758ea60e in _L_lock_995 () from /lib64/libpthread.so.0
#2 0x00007f86758ea576 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00000000019219b9 in native_mutex_lock (mutex=0x2e20e48) at /root/percona-server-5.7.22-22/include/thr_mutex.h:84
#4 0x0000000001921c21 in safe_mutex_lock (mp=0x2e20e20, try_lock=0 '\000', file=0x21f2168 "/root/percona-server-5.7.22-22/sql/sql_show.cc", line=3127)
at /root/percona-server-5.7.22-22/mysys/thr_mutex.c:109
#5 0x000000000162e890 in my_mutex_lock (mp=0x2e20e20, file=0x21f2168 "/root/percona-server-5.7.22-22/sql/sql_show.cc", line=3127)
at /root/percona-server-5.7.22-22/include/thr_mutex.h:180
#6 0x000000000162e98b in inline_mysql_mutex_lock (that=0x2e20e20, src_file=0x21f2168 "/root/percona-server-5.7.22-22/sql/sql_show.cc", src_line=3127)
at /root/percona-server-5.7.22-22/include/mysql/psi/mysql_thread.h:730
#7 0x0000000001636798 in show_status_array (thd=0x7f83a7c19000, wild=0x0, variables=0x2da0780, value_type=OPT_GLOBAL, status_var=0x7f8675f07ef0,
prefix=0x7f8675f07dc0 "Innodb", tl=0x7f83a7c27c58, ucase_names=false, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:3127
#8 0x0000000001636700 in show_status_array (thd=0x7f83a7c19000, wild=0x0, variables=0x7f8673c60420, value_type=OPT_GLOBAL, status_var=0x7f8675f07ef0, prefix=0x21f2f84 "",
tl=0x7f83a7c27c58, ucase_names=false, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:3116
#9 0x0000000001648deb in fill_status (thd=0x7f83a7c19000, tables=0x7f83a7c27c58, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:8084
#10 0x000000000164b2e9 in do_fill_table (thd=0x7f83a7c19000, table_list=0x7f83a7c27c58, qep_tab=0x7f83a7c28d78) at /root/percona-server-5.7.22-22/sql/sql_show.cc:8791
#11 0x000000000164b7f3 in get_schema_tables_result (join=0x7f83a7c28698, executed_place=PROCESSED_BY_JOIN_EXEC) at /root/percona-server-5.7.22-22/sql/sql_show.cc:8921
#12 0x000000000161fc39 in JOIN::prepare_result (this=0x7f83a7c28698) at /root/percona-server-5.7.22-22/sql/sql_select.cc:909
#13 0x000000000157cb71 in JOIN::exec (this=0x7f83a7c28698) at /root/percona-server-5.7.22-22/sql/sql_executor.cc:124
#14 0x000000000161e7ff in handle_query (thd=0x7f83a7c19000, lex=0x7f83a7c1b630, result=0x7f83a7c285d0, added_options=0, removed_options=0)
at /root/percona-server-5.7.22-22/sql/sql_select.cc:185
#15 0x00000000015d044f in execute_sqlcom_select (thd=0x7f83a7c19000, all_tables=0x7f83a7c27c58) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:5445
#16 0x00000000015c8704 in mysql_execute_command (thd=0x7f83a7c19000, first_level=true) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:2892
#17 0x00000000015d14b6 in mysql_parse (thd=0x7f83a7c19000, parser_state=0x7f8675f0a5c0) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:5901
#18 0x00000000015c504a in dispatch_command (thd=0x7f83a7c19000, com_data=0x7f8675f0ad30, command=COM_QUERY) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1490
#19 0x00000000015c3dd7 in do_command (thd=0x7f83a7c19000) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1021
#20 0x000000000170c998 in handle_connection (arg=0x7f83ad7ff460) at /root/percona-server-5.7.22-22/sql/conn_handler/connection_handler_per_thread.cc:312
#21 0x0000000001943acc in pfs_spawn_thread (arg=0x7f8673d87120) at /root/percona-server-5.7.22-22/storage/perfschema/pfs.cc:2190
#22 0x00007f86758e8aa1 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f8674852bcd in clone () from /lib64/libc.so.6
(gdb) f 7
#7 0x0000000001636798 in show_status_array (thd=0x7f83a7c19000, wild=0x0, variables=0x2da0780, value_type=OPT_GLOBAL, status_var=0x7f8675f07ef0,
prefix=0x7f8675f07dc0 "Innodb", tl=0x7f83a7c27c58, ucase_names=false, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:3127
3127 mysql_mutex_lock(&LOCK_global_system_variables);

分析

根据线程执行的顺序来分析:

关键函数 fill_status 先获取mutex锁 mysql_mutex_lock(&LOCK_status)

代码位置 sql_show.cc:8019 

int fill_status(THD *thd, TABLE_LIST *tables, Item *cond)
{
if (thd->fill_status_recursion_level++ == 0)
mysql_mutex_lock(&LOCK_status);
if (thd->fill_status_recursion_level-- == 1)
mysql_mutex_unlock(&LOCK_status);
}

通 执行show global status的线程阻塞在申请mutex锁 mysql_mutex_lock(&LOCK_global_system_variables) 的地方。

关键函数:show_status_array: (代码片段) 每次获取一个系统状态,就请求一次mysql_mutex_lock(&LOCK_global_system_variables)锁。

 if (!(wild && wild[0] && wild_case_compare(system_charset_info,
name_buffer, wild)) &&
(!partial_cond || partial_cond->val_int()))
{
const char *pos;
size_t length;

mysql_mutex_lock(&LOCK_global_system_variables);
pos= get_one_variable(thd, var, value_type, show_type, status_var,
&charset, buff, &length);
table->field[1]->store(pos, (uint32) length, charset);
thd->count_cuted_fields= CHECK_FIELD_IGNORE;
table->field[1]->set_notnull();
mysql_mutex_unlock(&LOCK_global_system_variables);

而 LOCK_global_system_variables 则是 mysql_mutex_t 结构。代码位置 mysqld.cc:686

mysql_mutex_t
LOCK_status, LOCK_uuid_generator,
LOCK_crypt,
LOCK_global_system_variables,
LOCK_user_conn, LOCK_slave_list,
LOCK_error_messages;

3.4 修改gtid_mode的线程调用栈

gdb) thread 2
[Switching to thread 2 (Thread 0x7f8675ea9700 (LWP 27921))]#0 0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f86758ea60e in _L_lock_995 () from /lib64/libpthread.so.0
#2 0x00007f86758ea576 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00000000019219b9 in native_mutex_lock (mutex=0x2e20cc8) at /root/percona-server-5.7.22-22/include/thr_mutex.h:84
#4 0x0000000001921c21 in safe_mutex_lock (mp=0x2e20ca0, try_lock=0 '\000', file=0x22c7b38 "/root/percona-server-5.7.22-22/sql/binlog.cc", line=10145)
at /root/percona-server-5.7.22-22/mysys/thr_mutex.c:109
#5 0x0000000001873fa9 in my_mutex_lock (mp=0x2e20ca0, file=0x22c7b38 "/root/percona-server-5.7.22-22/sql/binlog.cc", line=10145)
at /root/percona-server-5.7.22-22/include/thr_mutex.h:180
#6 0x000000000187427e in inline_mysql_mutex_lock (that=0x2e20ca0, src_file=0x22c7b38 "/root/percona-server-5.7.22-22/sql/binlog.cc", src_line=10145)
at /root/percona-server-5.7.22-22/include/mysql/psi/mysql_thread.h:730
#7 0x000000000188a903 in MYSQL_BIN_LOG::publish_coordinates_for_global_status (this=0x2e68720) at /root/percona-server-5.7.22-22/sql/binlog.cc:10145
#8 0x0000000001885115 in MYSQL_BIN_LOG::rotate (this=0x2e68720, force_rotate=true, check_purge=0x7f8675ea6aff) at /root/percona-server-5.7.22-22/sql/binlog.cc:7777
#9 0x00000000016ba82c in Sys_var_gtid_mode::global_update (this=0x2e52820, thd=0x7f83a8419000, var=0x7f83a8427ce8) at /root/percona-server-5.7.22-22/sql/sys_vars.h:2951
#10 0x00000000014f848d in sys_var::update (this=0x2e52820, thd=0x7f83a8419000, var=0x7f83a8427ce8) at /root/percona-server-5.7.22-22/sql/set_var.cc:184
#11 0x00000000014f9d0c in set_var::update (this=0x7f83a8427ce8, thd=0x7f83a8419000) at /root/percona-server-5.7.22-22/sql/set_var.cc:836
#12 0x00000000014f94d7 in sql_set_variables (thd=0x7f83a8419000, var_list=0x7f83a841ba08, free_joins=true) at /root/percona-server-5.7.22-22/sql/set_var.cc:672
#13 0x00000000015cb410 in mysql_execute_command (thd=0x7f83a8419000, first_level=true) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:3858
#14 0x00000000015d14b6 in mysql_parse (thd=0x7f83a8419000, parser_state=0x7f8675ea85c0) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:5901
#15 0x00000000015c504a in dispatch_command (thd=0x7f83a8419000, com_data=0x7f8675ea8d30, command=COM_QUERY) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1490
#16 0x00000000015c3dd7 in do_command (thd=0x7f83a8419000) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1021
#17 0x000000000170c998 in handle_connection (arg=0x7f835efff220) at /root/percona-server-5.7.22-22/sql/conn_handler/connection_handler_per_thread.cc:312
#18 0x0000000001943acc in pfs_spawn_thread (arg=0x7f8673d87320) at /root/percona-server-5.7.22-22/storage/perfschema/pfs.cc:2190
#19 0x00007f86758e8aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f8674852bcd in clone () from /lib64/libc.so.6
(gdb) f 7
#7 0x000000000188a903 in MYSQL_BIN_LOG::publish_coordinates_for_global_status (this=0x2e68720) at /root/percona-server-5.7.22-22/sql/binlog.cc:10145
10145 mysql_mutex_lock(&LOCK_status);
(gdb) f 10
#10 0x00000000014f848d in sys_var::update (this=0x2e52820, thd=0x7fb63c019000, var=0x7fb63c027ce8) at /root/percona-server-5.7.22-22/sql/set_var.cc:184
184 (on_update && on_update(this, thd, OPT_GLOBAL));
(gdb) l
179 to make it safe we'll need value_ptr_unlock().
180 */
181 AutoWLock lock1(&PLock_global_system_variables);
182 AutoWLock lock2(guard);
183 return global_update(thd, var) ||
184 (on_update && on_update(this, thd, OPT_GLOBAL));
185 }
186 else
187 {
188 bool locked= false;

分析

根据线程执行的顺序来分析: sys_var::update 函数执行的时候需要申请 AutoWLock lock1(&PLock_global_system_variables);

代码位置 set_var.cc :32行

static PolyLock_mutex PLock_global_system_variables(&LOCK_global_system_variables);

其实PLock_global_system_variables 也是需要申请mutex锁 &LOCK_global_system_variables。

代码位置 binlog.cc:10145

/** Copy the current binlog coordinates to the variables used for the
not-in-consistent-snapshot case of SHOW STATUS */

void MYSQL_BIN_LOG::publish_coordinates_for_global_status(void) const
{
mysql_mutex_assert_owner(&LOCK_log);

mysql_mutex_lock(&LOCK_status);
strcpy(binlog_global_snapshot_file, log_file_name);
binlog_global_snapshot_position=
my_b_inited(&log_file) ? my_b_tell(&log_file) : 0;
mysql_mutex_unlock(&LOCK_status);
}

设置gtid_mode的线程等待在第7个步骤,进行binlog切换时,此时需要申请mysql_mutex_lock(&LOCK_status)。然而&LOCK_status已经被show globa status的 fill_status 函数先申请获得。show global status需要申请mutex锁&LOCK_global_system_variables 被set gtid_mode的sys_var::update函数持有。于是相互等待死锁。

一图胜千言:

f63d4bfd6a6595b2edd768e2d6baa711.png

3.4 解释为什么会hang住,其他新的请求无法建立连接。

连接的初始化需要LOCK_global_system_variables, 此mutex被占用,导致新连接无法初始化

代码位置 sql_class.cc:1629

void THD::init(void)
{
mysql_mutex_lock(&LOCK_global_system_variables);
...
mysql_mutex_unlock(&LOCK_global_system_variables);

3.5 SQL查询时也需要LOCK_global_system_variables, 此mutex被占用,导致查询被阻塞.

代码位置 item_func.cc:7275

void Item_func_get_system_var::fix_length_and_dec()
{
...
mysql_mutex_lock(&LOCK_global_system_variables);
}

四 总结

首先 再次感谢远在重庆的高鹏不辞辛苦的帮忙分析定位以及给出各种解释。通过这次分析我们对gdb和MySQL 源码又有进一步了解。

其次如何规避,我们目前的规避方式是减少show 命令的执行,以后做gtid的变更的时候 会关闭监控以及有赞专门开发的数据库性能快照系统以便和 set gtid_mode 命令冲突。

参考文章

1 深度分析 | GDB定位MySQL5.7特定版本hang死的故障分析 

2 使用gdb调试程序完全教程

https://blog.csdn.net/gatieme/article/details/51671430

3 一个 Linux 上分析死锁的简单方法

https://www.ibm.com/developerworks/cn/linux/l-cn-deadlock/index.html

相关文章:

  • 夯大力_【技术交流】通定高速路基边坡夯拍及三背回填振动液压夯实技术
  • vscode中安装webpack_GitHub - lpreterite/vscode-mocha-webpack-example: 用webpack写个现代的JavaScript包...
  • c语言课设代写一般多少钱_cs代写 math代写
  • vs2019 值可能为0_【完整版】针对零基础小白的VS2019安装攻略
  • bin和cue怎么合并_用foobar进行APE分割及合并,cue的制作
  • vsftp如何确定地址_西门子S71500数字量输入模块如何编址?
  • adsense 注册_三天google AdSense帐户申请经验分享
  • plc和pc串口通讯接线_基础教程--RS232通讯用到串口调试软件的应用
  • host ntrip 千寻rtk_合众思壮新品RTK UFO U5连接千寻cors账号的方法教程
  • ipad鼠标圆圈变成箭头_iPad当作Mac mini 的显示器的方法
  • mc有什么红石机器人_Minecraft游戏理论:远古“巨石阵”之谜,探索mc史前高科技文明...
  • 整活插件 炉石传说_炉石传说:整活被盯上了,卡德加可能被移除,拔网线也被点名!...
  • android中断言_PageObject设计模式在 UI 自动化中的实践(QQ 邮箱登陆为例)
  • 广西中专机器人应用与维护_工业机器人工程师|记工信部考证班结课啦!
  • spring项目中target项目是什么‘’_手把手教你搭建一个SSM项目
  • 【RocksDB】TransactionDB源码分析
  • 30秒的PHP代码片段(1)数组 - Array
  • 4个实用的微服务测试策略
  • codis proxy处理流程
  • ERLANG 网工修炼笔记 ---- UDP
  • extract-text-webpack-plugin用法
  • Java 最常见的 200+ 面试题:面试必备
  • Redash本地开发环境搭建
  • windows-nginx-https-本地配置
  • 从0搭建SpringBoot的HelloWorld -- Java版本
  • 给第三方使用接口的 URL 签名实现
  • 力扣(LeetCode)21
  • 云大使推广中的常见热门问题
  • 回归生活:清理微信公众号
  • ​ssh-keyscan命令--Linux命令应用大词典729个命令解读
  • #include<初见C语言之指针(5)>
  • #大学#套接字
  • (02)Hive SQL编译成MapReduce任务的过程
  • (TipsTricks)用客户端模板精简JavaScript代码
  • (十一)手动添加用户和文件的特殊权限
  • (转)fock函数详解
  • (转)JVM内存分配 -Xms128m -Xmx512m -XX:PermSize=128m -XX:MaxPermSize=512m
  • (转)甲方乙方——赵民谈找工作
  • *Algs4-1.5.25随机网格的倍率测试-(未读懂题)
  • .bat批处理(十):从路径字符串中截取盘符、文件名、后缀名等信息
  • .NET 8.0 发布到 IIS
  • .NET CORE 第一节 创建基本的 asp.net core
  • .NET 命令行参数包含应用程序路径吗?
  • .NET 指南:抽象化实现的基类
  • .NET国产化改造探索(一)、VMware安装银河麒麟
  • .NET简谈设计模式之(单件模式)
  • .NET框架类在ASP.NET中的使用(2) ——QA
  • .net使用excel的cells对象没有value方法——学习.net的Excel工作表问题
  • .stream().map与.stream().flatMap的使用
  • .vimrc php,修改home目录下的.vimrc文件,vim配置php高亮显示
  • @entity 不限字节长度的类型_一文读懂Redis常见对象类型的底层数据结构
  • @zabbix数据库历史与趋势数据占用优化(mysql存储查询)
  • [【JSON2WEB】 13 基于REST2SQL 和 Amis 的 SQL 查询分析器
  • [2023-年度总结]凡是过往,皆为序章
  • [Android] Android ActivityManager