当前位置：首页 > news >正文

mysql自动备份快结束很卡_mysql备份慢以及自动重启问题——调整NUMA

news 来源：原创 2024/5/3 6:35:42

问题

数据文件大约200GB，平时备份(mysqldump)压缩后的大小大约20GB，正常备份时间在2个小时内。一台服务器上，安装了多个mysql实例，这个数据库是其中一个。从库用mysqldump 备份时，error log 中出现很多InnoDB: Warning: a long semaphore wait信息(详见：附1)

备份是每天定时执行的，自从error log 出现InnoDB: Warning: a long semaphore wait后，备份时间一天比一天慢，本来正常2个小时内备份完成的，到后面需要5、6小时，而且有时会导致mysql重启。

环境：IBM X3650M3、redhat5.4、mysql 5.1.50多实例部署

分析

这台服务器上，安装了多个mysql实例。其他实例也是从库，mysqldump备份的时候也正常，未出现任何warning 和重启mysql的情况，cpu、io 基本上处于比较空闲的状态，swap 使用较少。从系统资源上看好像一切正常，只跟这个数据库有关？参数不对？

再看看mysql日志，有很多InnoDB: Warning: a long semaphore wait，从日志上看好像是加锁导致innodb存储引擎自杀。但mysqldump的时候，已停止同步的线程，备份数据的时候，只有mysqldump进程，应该不至于加锁等待这么久。

在备份的时候，show processlist 中 state 处于 Sending data ，此时的磁盘IO并不高。

假设一：服务器硬件有问题？

服务器有问题，不可能只有一个实例受影响。而且一直监控服务器的硬件相关信息，并无发现异常，dmesg、message等未发现异常。IBM的raid卡无电，使硬盘读写变慢？

通过查看raid卡信息，发现Current Cache Policy: WriteBack，并非这个原因。

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL |grep -i "Current Cache Policy"

Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU

假设二：受mysqldump备份参数影响？

尝试调整参数：mysqldump 时加上single-transaction、quick。现象依旧

假设三：受其他实例影响？

尝试将mysql锁到内存，调整参数 memlock=1这个参数会强迫mysqld进程的地址空间一直被锁定在物理内存上。现象依旧

尝试在备份的时候，重启其他实例，备份的速度恢复正常，但等下次备份时，又变得很慢，同时一直再报InnoDB: Warning: a long semaphore wait，隔了好几天，又出现一次mysql重启。

正当束手无策之时，在查找mysql多实例是否会互相影响，看到网上几篇博文是关于NUMA(Non-Uniform Memory Access)非一致内存访问。难道mysql备份变慢以及导致的重启跟numa有关？

先来看看NUMA的内存分配策略

有四种：

1.缺省(default)：总是在本地节点分配(分配在当前进程运行的节点上)；

2.绑定(bind)：强制分配到指定节点上；

3.交叉(interleave)：在所有节点或者指定的节点上交织分配；

4.优先(preferred)：在指定节点上分配，失败则在其他节点上分配。

因为NUMA默认的内存分配策略是优先在进程所在CPU的本地内存中分配，会导致CPU节点之间内存分配不均衡，当某个CPU节点的内存不足时，会导致swap产生，而不是从远程节点分配内存。这就是所谓的swap insanity 现象。

MySQL采用了线程模式，对于NUMA特性的支持并不好，如果单机只运行一个MySQL实例，我们可以选择关闭NUMA，关闭的方法有三种：

1.硬件层，在BIOS中设置关闭

2.OS内核，启动时设置numa=off；

3.可以用numactl命令将内存分配策略修改为interleave(交叉)。

如果单机运行多个MySQL实例，我们可以将MySQL绑定在不同的CPU节点上，并且采用绑定的内存分配策略，强制在本节点内分配内存，这样既可以充分利用硬件的NUMA特性，又避免了单实例MySQL对多核CPU利用率不高的问题。

再看看服务器上的numa信息

做一个简单收集numastat信息，没隔10分钟收集一次：

$ more /home/mysql/scripts/numastat_out.sh

#!/bin/sh

[ -f ~/.bash_profile ] && . ~/.bash_profile || export PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/mysql/bin

umask 027

logfile=$(cd $(dirname $0);pwd)/log/numastat_out.log

NS=`which numastat`

NC=`which numactl`

echo "############ `date` ############" >> $logfile

$NS >> $logfile

$NC --hardware >> $logfile

echo "######################################################" >> $logfile

结果如下：

############ Mon Mar 26 11:50:01 CST 2012 ############

node0 node1

numa_hit 8363816504 5592890156

numa_miss 1980772 878592

numa_foreign 878592 1980772

interleave_hit 2294388 2354739

local_node 8361757791 5573044252

other_node 4039485 20724496

available: 2 nodes (0-1)

node 0 size: 32275 MB

node 0 free: 151 MB

node 1 size: 32320 MB

node 1 free: 20 MB

node distances:

node 0 1

0: 10 21

1: 21 10

######################################################注：参数说明

Numa policy hit/miss statistics/sys/devices/system/node/node*/numastatAll units are pages. Hugepages have separate counters.numa_hit A process wanted to allocate memory from this node, and succeeded.numa_miss A process wanted to allocate memory from another node, but ended up with memory from this node.numa_foreign A process wanted to allocate on this node, but ended up with memory from another one.local_node A process ran on this node and got memory from it.other_node A process ran on this node and got memory from another node.interleave_hit Interleaving wanted to allocate from this node and succeeded.

通过收集2天的数据做对比，interleave_hit 所对应的次数变化很小。说明numa的分配策略是缺省(default)：总是在本地节点分配。在服务器上查询numa的 policy 是 default：

$ numactl --show

policy: default

preferred node: current

尝试更改numa策略

尝试将3306 备份慢的实例，启动时前面加上numactl --interleave=all 如下：

numactl --interleave=all /etc/init.d/mysqld_multi start 3306

效果

通过上面的脚本，收集到的 interleave_hit 的次数在备份时段增加得特别快：

############ Wed Mar 28 07:20:01 CST 2012 ############

interleave_hit 5558259 3760066

############ Wed Mar 28 07:30:01 CST 2012 ############

interleave_hit 7661687 5322428

############ Wed Mar 28 07:40:01 CST 2012 ############

interleave_hit 9686299 6776337

############ Wed Mar 28 07:50:01 CST 2012 ############

interleave_hit 11697365 8311399

############ Wed Mar 28 08:00:01 CST 2012 ############

interleave_hit 12499526 8949833

############ Wed Mar 28 08:10:01 CST 2012 ############

interleave_hit 14707608 10587130

############ Wed Mar 28 08:20:01 CST 2012 ############

interleave_hit 16442433 11951893

############ Wed Mar 28 08:30:01 CST 2012 ############

interleave_hit 17824729 12954106

并且，此时error log无任何Warning。此后几天，再未出现warning信息。

(可以考虑，将实例分别绑定到固定的节点，比如：在NUMA处理器绑定多实例到固定核心)

附1

InnoDB: Warning: a long semaphore wait:--Thread 1183631680 has waited at ../../storage/innobase/include/btr0btr.ic line 28 for 241.00 seconds the semaphore:S-lock on RW-latch at 0x2aacda28d898 created in file buf/buf0buf.c line 550a writer (thread id 1183631680) has reserved it in mode exclusivenumber of readers 0, waiters flag 1Last time read locked in file btr/btr0pcur.c line 246Last time write locked in file buf/buf0buf.c line 1823InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:InnoDB: Pending preads 1, pwrites 0=====================================120217 10:40:47 INNODB MONITOR OUTPUT=====================================Per second averages calculated from the last 27 seconds----------SEMAPHORES----------OS WAIT ARRAY INFO: reservation count 62196, signal count 62194--Thread 1183631680 has waited at ../../storage/innobase/include/btr0btr.ic line 28 for 256.00 seconds the semaphore:S-lock on RW-latch at 0x2aacda28d898 created in file buf/buf0buf.c line 550a writer (thread id 1183631680) has reserved it in mode exclusivenumber of readers 0, waiters flag 1Last time read locked in file btr/btr0pcur.c line 246Last time write locked in file buf/buf0buf.c line 1823Mutex spin waits 0, rounds 98353, OS waits 1500RW-shared spins 118450, OS waits 59162; RW-excl spins 1996, OS waits 1404------------TRANSACTIONS------------Trx id counter 0 91046918Purge done for trx's n History list length 21LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 0 91046818, not started, process no 4553, OS thread id 1183832384MySQL thread id 154, query id 294386 localhost root---TRANSACTION 0 91046875, ACTIVE 1300 sec, process no 4553, OS thread id 1183631680 fetching rows, thread declared inside InnoDB 500mysql tables in use 1, locked 0MySQL thread id 134, query id 294200 localhost backup Sending dataSELECT /*!40001 SQL_NO_CACHE */ * FROM `table1`Trx read view will not see trx with id >= 0 91046876, sees --------FILE I/O--------I/O thread 0 state: waiting for i/o request (insert buffer thread)I/O thread 1 state: waiting for i/o request (log thread)I/O thread 2 state: doing file i/o (read thread) ev setI/O thread 3 state: waiting for i/o request (write thread)Pending normal aio reads: 64, aio writes: 0,ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0Pending flushes (fsync) log: 0; buffer pool: 0192399 OS file reads, 256617 OS file writes, 7538 OS fsyncs1 pending preads, 0 pending pwrites0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s-------------------------------------INSERT BUFFER AND ADAPTIVE HASH INDEX-------------------------------------Ibuf: size 1, free list len 20875, seg size 20877,53381 inserts, 53381 merged recs, 31096 mergesHash table size 18563003, node heap has 1 buffer(s)0.00 hash searches/s, 0.00 non-hash searches/s---LOG---Log sequence number 43 3634229620Log flushed up to 43 3634229620Last checkpoint at 43 36342296200 pending log writes, 0 pending chkp writes125902 log i/o's done, 0.00 log i/o's/second----------------------BUFFER POOL AND MEMORY----------------------Total memory allocated 9945977032; in additional pool allocated 13424640Dictionary memory allocated 648760Buffer pool size 549824Free buffers 0Database pages 549823Modified db pages 0Pending reads 64Pending writes: LRU 0, flush list 0, single page 0Pages read 3798473, created 3084, written 1421240.00 reads/s, 0.00 creates/s, 0.00 writes/sNo buffer pool page gets since the last printout--------------ROW OPERATIONS--------------1 queries inside InnoDB, 0 queries in queue2 read views open inside InnoDBMain thread process no. 4553, id 1182828864, state: waiting for server activityNumber of rows inserted 90188, updated 33099, deleted 0, read 2499110550.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s----------------------------END OF INNODB MONITOR OUTPUT============================InnoDB: ###### Diagnostic info printed to the standard error stream其中还出现过重启：InnoDB: Error: semaphore wait has lasted > 600 secondsInnoDB: We intentionally crash the server, because it appears to be hung.120217 8:45:16 InnoDB: Assertion failure in thread 1162561856 in file srv/srv0srv.c line 2236InnoDB: We intentionally generate a memory trap.InnoDB: Submit a detailed bug report to http://bugs.mysql.com.InnoDB: If you get repeated assertion failures or crashes, evenInnoDB: immediately after the mysqld startup, there may beInnoDB: corruption in the InnoDB tablespace. Please refer toInnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.htmlInnoDB: about forcing recovery.120217 8:45:16 - mysqld got signal 11 ;This could be because you hit a bug. It is also possible that this binaryor one of the libraries it was linked against is corrupt, improperly built,or misconfigured. This error can also be caused by malfunctioning hardware.We will try our best to scrape up some info that will hopefully help diagnosethe problem, but since we have already crashed, something is definitely wrongand this may fail.key_buffer_size=267108864read_buffer_size=3145728max_used_connections=3max_threads=300threads_connected=1It is possible that mysqld could use up tokey_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2110308 Kbytes of memoryHope that's ok; if not, decrease some variables in the equation.thd: 0x0Attempting backtrace. You can use the following information to find outwhere mysqld died. If you see no messages after this, something wentterribly wrong...stack_bottom = (nil) thread_stack 0x30000/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x20) [0xa16470]/usr/local/mysql/bin/mysqld(handle_segfault+0x368) [0x609c34]/lib64/libpthread.so.0 [0x3d5d60e7c0]/usr/local/mysql/bin/mysqld(srv_error_monitor_thread+0x2f3) [0x94ca0d]/lib64/libpthread.so.0 [0x3d5d6064a7]/lib64/libc.so.6(clone+0x6d) [0x3d5cad3c2d]The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html containsinformation that should help you find out what is causing the crash.120217 08:45:17 mysqld_safe Number of processes running now: 0120217 08:45:17 mysqld_safe mysqld restarted120217 8:45:17 [Note] Plugin 'FEDERATED' is disabled.InnoDB: The log sequence number in ibdata files does not matchInnoDB: the log sequence number in the ib_logfiles!120217 8:45:22 InnoDB: Database was not shut down normally!InnoDB: Starting crash recovery.InnoDB: Reading tablespace information from the .ibd files...InnoDB: Restoring possible half-written data pages from the doublewriteInnoDB: buffer...InnoDB: Last MySQL binlog file position 0 547, file name /usr/local/mysql/log/mysql-bin.000035120217 8:45:27 InnoDB: Started; log sequence number 43 3565657939120217 8:45:27 [Note] Recovering after a crash using /usr/local/mysql/log/mysql-bin120217 8:45:27 [Note] Starting crash recovery...120217 8:45:27 [Note] Crash recovery finished.120217 8:45:42 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000037' at position 89665957, relay log '/usr/local/mysql/log/relay-bin.000238' position: 89666102120217 8:45:42 [Note] Slave I/O thread: connected to master 'user@host:3306',replication started in log 'mysql-bin.000037' at position 131046151120217 8:45:42 [Note] Event Scheduler: Loaded 0 events120217 8:45:42 [Note] /usr/local/mysql/bin/mysqld: ready for connections.Version: '5.1.50-log' socket: '/usr/local/mysql/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)