MYSQL高可用集群MHA架构
一、MHA概述
1、MHA简介
MHA是当 MASTER 出现故障,挑选一个 SLAVE 作为新的 MASTER 并构建成新的主从架构的管理工具。从 MASTER 出现故障到构建成新的主从架构时间是 10-30秒。在 MASTER 出现故障时可能会出现 SLAVE 同步的数据不一致的现象,此工具可以自动应用差异的中继日志到其他 SLAVE 上保证数据的一致性。
2、MHA优点
1. Master crash 时可以快速的进行故障切换
MHA 是当 MASTER 出现故障,挑选一个 SLAVE 作为新的 MASTER 并构建成新的主从架构的管理工具。从 MASTER 出现故障到构建成新的主从架构时间是 10-30秒。在 MASTER 出现故障时可能会出现 SLAVE 同步的数据不一致的现象,此工具可以自动应用差异的中继日志到其他 SLAVE 上保证数据的一致性。
2. Master crash 时可以最大化的减少数据丢失
当 master crash 时 MHA 自动检测选择数据同步最全的 slave,并把差异日志应用到其他 slave 上, 以保障数据的一致性。
3. Semi-Synchronous Replication 可以最大化的减少数据的丢失
MHA 的更改升级配置等不影响线上正在运行的数据库,使用 mha 不需要增加太多的服务器。MHA 由 MHA Manager 和 MHA Node 组成。 MHA Node 运行在 MYSQL 服务器上,所以不会因为 MHA node 增加新的服务器。
MHA Manager 通常需要独立运行在一台服务器上,所以你需要增加一台服务器用于监控管理运行 MHA Manager,但是一台服务器上的 MHA Manager 可以同时监控管理多达百台 master,所以总的来说服务器增加不会太多。MHA Manger 也可以运行在一台 slave 上,这样总的服务器数也不会增加。
4. 原来应用系统整体性能不会降低太多
MHA 工作在异步或半同步的主从架构上。当监控 master 时,MHA 会每隔几Mha 部署测试文档5秒 (默认 3 秒) 向 master 发出 ping 包并且不需要大的 sql 语句用于监控 master的健康状。Slave 需要开启 binlog,整体性能不会有太大的降低。
5. MHA 适合任何存储引擎
只要能主从复制的存储引擎它都支持,不限于支持事物的innodb引擎。
二、安装基础环境
1、软件下载
系统镜像盘链接:百度网盘 请输入提取码 密码:7vmr
软件包链接:
方式一:百度云:链接:https://pan.baidu.com/s/1mhSojWW 密码:bpky
方式二:官方下载https://code.google.com/p/mysql-master-ha/(需要自备梯子)
添加一个yum源:
wget http://dl.fedoraproject.org/pub/ ... ease-5-4.noarch.rpm
rpm -ivh epel-release-5-4.noarch.rpm(亲测,可用)
Mha分manager节点和node节点。可以下载源码包,或者rpm包,随个人喜好,注意,版本与系统内核要匹配。
2、角色清单
IP地址 | 主机名 | 角色 | 软件 |
172.28.7.202 | rhel6-64-2 | manager | mha4mysql-manager、mha4mysql-node |
172.28.7.203 | rhel6-64-3 | master | mha4mysql-node |
172.28.7.204 | rhel6-64-4 | Slave1 | mha4mysql-node |
172.28.7.205 | rhel6-64-5 | Slave2 | mha4mysql-node |
3、系统配置
配置如下:
系统 | 内核 | Mysql版本 | 内存 |
Red Hat release 6.4 | Linux version 2.6.32-358.el6.x86_64 | mysql Ver 14.14 Distrib 5.1.66 | 1G |
查看命令方式:
[root@rhel6-64-2 ~]# cat /proc/version
Linux version 2.6.32-358.el6.x86_64 (mockbuild@x86-022.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:47:41 EST 2013
[root@rhel6-64-2 ~]# mysql -V
mysql Ver 14.14 Distrib 5.1.66, for redhat-linux-gnu (x86_64) using readline 5.1
4、架构图
5、主机环境配置(IP/DNS/HOSTS/免密码)
主机名称:
[root@rhel6-64-2 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.28.7.202 manager
172.28.7.203 master
172.28.7.204 slave1
172.28.7.205 slave2
我这里已经配置好了免密码,正常情况下会要求输入密码:
[root@rhel6-64-2 ~]# scp /etc/hosts master:/etc/
hosts 100% 239 0.2KB/s 00:00
[root@rhel6-64-2 ~]# scp /etc/hosts slave1:/etc/
hosts 100% 239 0.2KB/s 00:00
[root@rhel6-64-2 ~]# scp /etc/hosts slave2:/etc/
hosts 100% 239 0.2KB/s 00:00
IP/路由/dns地址:
要保证可以访问外网,因为装软件包需要多种依赖包,默认光盘自带的软件包不够,需要用wget命令去公网下载。
SSH免密码登录:
[root@rhel6-64-2 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 回车
Enter passphrase (empty for no passphrase): 回车
Enter same passphrase again: 回车
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1c:e6:0e:0f:85:5b:34:97:35:20:7d:77:c5:bd:a2:97 root@rhel6-64-1
The key's randomart image is:
+--[ RSA 2048]----+
| +.o+o .+|
| o +o ... +|
| . = . . ..|
| B . . . |
| + S . o |
| = . E |
| o . |
| |
| |
+-----------------+
[root@rhel6-64-5 ~]# ssh-copy-id 172.28.7.202
The authenticity of host '172.28.7.202 (172.28.7.202)' can't be established.
RSA key fingerprint is 64:32:50:c4:06:be:b6:e7:95:13:76:48:3b:e7:8b:aa.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.28.7.202' (RSA) to the list of known hosts.
root@172.28.7.202's password: 输入7.202的秘钥
Now try logging into the machine, with "ssh '172.28.7.202'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@rhel6-64-5 ~]# ssh 172.28.7.202(再进行登录测试,以不输入密码直接登录为准)
Last login: Thu Sep 8 15:19:27 2016 from 192.168.255.251
其他台配置方法与以上一致 保证每台登录互相无交互,配置完成后每一个都登录一次 过程略配置后查看。
最后查看授权的秘钥:
[root@rhel6-64-2 ~]# cat .ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA9OJqUQpcmfym4xVhMhf4chNyNgzUzT0y+89A5KBXwIv1nN3X9bMoXQhVzDWB2MlzqbZdVw4TgYyINfNpQtuFEa7nEUen8KDKACTtTU99xf8Wrzhsvm6hREEfKml1zrg46TFk3Yjil+3QDq7wefk2g12HYbJpE946Ffr49NilcgUrh6HimEW4eGgTQSpyLyfrd2VS9bUcJGxoEbpoeniZgpbHqT3hidhrCLTYzu2GBaNunZTPqbfljPp0SpCUVCBouAzjSj0j7xT6wer7X1BkHQU4ipqlGu/t3EEFhkrhKqh94Az5oM+K6g7zXbL5qhG+4JVhu/zY6ZmaOaDZ16kgJQ== root@rhel6-64-5
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAuuBt+8vSrz+4PTPcBE5ButQmgBOX93opESAxAqU1q/9fQA0Ytj+m80l4GgaxJNxJxYfaMNBo3eDPMzzLqVMHtG8j6IhKo3BBxeT9vbtHvOrgnxNy2MOJ9D+EDqMQ0r8IKQBwJbPCEDGGoBelJYnBDNO17doJ0UM1i5n4tLIeFwczIDbR7TNB/tym26Bv+yOmyrVsG7gOQT3mCoK6RSBC6fv0OAse00cjzYpfZnRAfldWTfqBsaibHqnoMWJpiqZH/gH3LoeFW3d7zwaU6kY0Oqux49sXGmI2UEgNfJhLaJbyXkc1JCD3UlISVWxNGi7kUCKz/bOjDxt5D15NdO/e0w== root@rhel6-64-2
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzumZFvTs4bai/qOMTkjW5w+dL+hvyhnYsRbANVnmTbCTyB2sGW2VkY46r/FDcAtXYULdzwOpN9hE+rRb8HYT3YCjx5RdvY+wJzpd239lzwAY691AD7PBUOMbZUoyV/Bim5wFrAjLXUJ+ER4Cahcn/n5ZbkVmBH4E3X/xdHXQj6weAPawjLSUryRfFKPUg6G/8yjLPPR61WbznmAuOi0Lbw/U5L5Hf0SZ1CcT7R+GLyea6WxQ2H9MCdTCV5i+mrC8LqGaDBbuVXq5ZUeb7WjbkfpBkep1g+yFnSzTBXRSKXRyi9vK2CAD+zLhdrGKrmvtng0QeWxo77VkgQkD0CYzxQ== root@rhel6-64-3
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAxsAASxEYfJbn6/PCh3lXRwxAifrigpQsExpP/4WPRFa1aCw53zABEfbq8T0rjDS7oZXjbMZ/hnMA6okdG9hLypplW+UcMIfKiyaTAOhs2CDKvopsfATRZbALaewP7+LLf4haYUoNOK0DI3Y7YRzgXG0pEUgWMfeBlLldNIDv3b7+UzBIPSe1WComqqPlLav/PyS7VJxfNxDMZ5slkkmmi6UJYep5KvsRkF+Ci/cyaywJmzllH6OLDuI+cEJBH3KDxnui9td3/Llr3+LfLoxcyRZIXJD5NMQrtKYzW/OzAfeWsT4iMSK4d1Gqg+SXdNh+c3cqPyVh8qEvnCeIszBIoQ== root@rhel6-64-4
6、安装MYSQL
服务器主从配置:
在master/slave1/slave2安装MySQL数据库,简化实验过程使用yum 安装即可。
[root@rhel6-64-3 ~]# yum install -y mysql-server
……
Installed:
mysql-server.x86_64 0:5.1.66-2.el6_3
Dependency Installed:
mysql.x86_64 0:5.1.66-2.el6_3 mysql-libs.x86_64 0:5.1.66-2.el6_3
Complete!
[root@rhel6-64-3 ~]# service mysqld start
Starting mysqld: [ OK ]
yum安装过程manager/slave1/slave2完全相同,步骤略。
7、配置MYSQL主从
在master主机上修改my.cnf文件:
[root@rhel6-64-3 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-bin=mysql-bin
server-id = 1
#红色标记部分为新增,注意不要添加到【mysql_safe】里,开始做实验室的时候盲目添加,导致主从配置失败。
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
在slave1上修改my.cnf文件,添加红色自己内容:
[root@rhel6-64-4 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-bin=mysql-bin
server-id = 2
relay_log = /var/lib/mysql/mysql-relay-bin
log_slave_updates = 1
read_only = 1
#注意每台机器的server-id 一定互不相同
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
修改slave2中my.cnf文件:
[root@rhel6-64-5 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-bin=mysql-bin
server-id = 3
relay_log = /var/lib/mysql/mysql-relay-bin
log_slave_updates = 1
read_only = 1 #表示从服务器是只读的!
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
重启所有mysql服务:
[root@rhel6-64-3 ~]# service mysqld restart
Stopping mysqld: [ OK ]
Starting mysqld: [ OK ]
进入master服务器中,授权qianmen用户给从服务器:
[root@rhel6-64-3 ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.1.66-log Source distribution
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> grant replication slave, replication client on *.* to qianmen@'%' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
如果不放心可以slave主机进行登录测试,命令如下:
[root@rhel6-64-4 ~]# mysql -uqianmen -p123456 -hmaster
Welcome to the MySQL monitor.
出现welcome表明成功。
查看主机状态,要求记录file文件名称和position:
mysql> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000010 | 788 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
在2台从服务器上和主服务器进行同步:
[root@rhel6-64-4 ~]# mysql
Welcome to the MySQL monitor.
。。。。。。
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
将圈中的信息填写正确。
mysql> change master to master_host='master',master_port=3306,master_user='qianmen',master_password='123456',master_log_file='mysql-bin.000003',master_log_pos=344;
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: master
Master_User: qianmen
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000010
Read_Master_Log_Pos: 788
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 251
Relay_Master_Log_File: mysql-bin.000010
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
……
1 row in set (0.00 sec)
Slave2中进行相同操作,过程略!
之前操作一直失败,这里想说明2点:
1)授权用户不要给all,要有replication,MHA会在配置文件里要求能远程登录到数据库,所以要进行必要的赋权。
2)一定看准了主服务器的file和position,填错了 slave I/O装填会变成No。
然后再就是赋权了,之前的一步赋权操作是权限是只有replication,MHA会在配置文件里要求能远程登录到数据库,所以要进行必要的赋权。
在三台机器中都做如下操作:
mysql> grant all privileges on *.* to 'gongzi'@'%' identified by '123456';
mysql> flush privileges;
mysql相关用户,及权限的命令,以下命令无需敲打,仅供参考和验证!
mysql> select user,host from mysql.user; 查看用户表
mysql> show grants for 'user2'@'%'; 查看授权用户
mysql> use mysql; 应用mysql数据库
mysql> drop user user4@172.28.7.204; 删除用户
三、安装配置MHA
1、安装准备工作
配置YUM源和依赖关系包。
接下来就是开始正式安装MHA了,先安装节点包开始 四台机器都要安装!每台机器上安装yum的一个软件源EPEL,里面包含了许多基本源里没有的软件。
[root@rhel6-64-5 ~]# rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
Retrieving http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
warning: /var/tmp/rpm-tmp.vKNo5c: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing... ########################################### [100%]
package epel-release-6-8.noarch is already installed
安装完成后,会发现原来的yum仓库中多处2个文件:
打开epel.repo:
[root@rhel6-64-5 ~]# cat /etc/yum.repos.d/epel.repo
[epel]
name=Extra Packages for Enterprise Linux 6 - $basearch
baseurl=http://download.fedoraproject.org/pub/epel/6/$basearch #将注释的#去掉
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-6&arch=$basearch#前面加上#
failovermethod=priority
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
什么是epel?
如果既想获得 RHEL 的高质量、高性能、高可靠性,又需要方便易用(关键是免费)的软件包更新功能,那么 Fedora Project 推出的 EPEL(Extra Packages for Enterprise Linux)正好适合你。EPEL(http://fedoraproject.org/wiki/EPEL) 是由 Fedora 社区打造,为 RHEL 及衍生发行版如 CentOS、Scientific Linux 等提供高质量软件包的项目。
另外要把https改成http,如果不改会报错,报错原因写在最后解决中!
[root@rhel6-64-5 ~]# yum clean all
[root@rhel6-64-5 ~]# yum list
[root@rhel6-64-5 ~]# yum install perl-DBD-MySQL
MHA是perl编写的软件需要perl支持 之前yum安装mysql已经安装过了 如果没安装过需要安装这个依赖:
[root@rhel6-64-5 ~]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager --skip-broken
如果基础软件包不安装全,后续软件无法安装:
以上操作每台都需要做,以下操作管理节点(manger)需要两个都安装,在3台数据库节点只要安装MHA的node节点!
2、装Node
安装node节点!软件包可以本地rz上传,也可以使用:
[root@rhel6-64-5~]#rpm -Uvh http://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.52-0.noarch.rpm
去下载,但是需要梯子,我这里已经上传完成!
[root@rhel6-64-5 ~]# rpm -ivh mha4mysql-node-0.53-0.el6.noarch.rpm
Preparing... ########################################### [100%]
1:mha4mysql-node ########################################### [100%]
如果安装不成功,请检查是否安装了,该包不需要epel!
[root@rhel6-64-5 ~]# yum install -y perl-DBD-MySQL
3、安装manager
安装之前需要安装一些依赖包:
#yum install -y perl-DBD-MySQL
#yum install -y perl-Config-Tiny
#yum install -y perl-Log-Dispatch
#yum install -y perl-Parallel-ForkManager
#yum install -y perl-Config-IniFiles
也可以直接下载rpm包进行包装:
[root@rhel6-64-2 download]# wget -P /root/download/ http://ftp.muug.mb.ca/mirror/centos/7/os/x86_64/Packages/perl-5.16.3-286.el7.x86_64.rpm
--2016-09-12 22:02:36-- http://ftp.muug.mb.ca/mirror/centos/7/os/x86_64/Packages/perl-5.16.3-286.el7.x86_64.rpm
Resolving ftp.muug.mb.ca... 208.81.1.244, 2605:e200:3:4::244
Connecting to ftp.muug.mb.ca|208.81.1.244|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8352488 (8.0M) [application/x-redhat-package-manager]
Saving to: ?.root/download/perl-5.16.3-286.el7.x86_64.rpm?
100%[================>] 8,352,488 558K/s in 30s
2016-09-12 22:03:07 (268 KB/s) - ?.root/download/perl-5.16.3-286.el7.x86_64.rpm?.saved
[8352488/8352488]
[root@rhel6-64-2 download]# wget -P /root/download http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
--2016-09-12 22:05:06-- http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.26, 209.132.181.23, 209.132.181.24, ...
Connecting to dl.fedoraproject.org|209.132.181.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21740 (21K) [application/x-rpm]
Saving to: ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?
100%[================>] 21,740 4.13K/s in 5.1s
2016-09-12 22:05:15 (4.13 KB/s) - ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?.saved [21740/21740]
[root@rhel6-64-2 download]# wget -p /root/download/ http://downloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm
/root/download/: Scheme missing.
--2016-09-12 22:29:49-- http://downloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm
Resolving downloads.naulinux.ru... 91.151.181.167
Connecting to downloads.naulinux.ru|91.151.181.167|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 71564 (70K) [application/x-rpm]
Saving to: ?.ownloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm?
100%[================>] 71,564 119K/s in 0.6s
2016-09-12 22:29:50 (119 KB/s) -
?.ownloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm?.saved [71564/71564]
FINISHED --2016-09-12 22:29:50--
Downloaded: 1 files, 70K in 0.6s (119 KB/s)
[root@rhel6-64-2 download]# ls
downloads.naulinux.ru
perl-5.16.3-286.el7.x86_64.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
[root@rhel6-64-2 download]# cat downloads.naulinux.ru/
cat: downloads.naulinux.ru/: Is a directory
[root@rhel6-64-2 download]# cd downloads.naulinux.ru/
[root@rhel6-64-2 downloads.naulinux.ru]# ls
pub
[root@rhel6-64-2 downloads.naulinux.ru]# cd pub/
[root@rhel6-64-2 pub]# ls
NauLinux
[root@rhel6-64-2 pub]# mv NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm /root/download/
[root@rhel6-64-2 pub]# cd /root/download/
[root@rhel6-64-2 download]# ls
downloads.naulinux.ru #这是个路径,要把该路径最深处的rpm包安装,你也可以复制出来!
perl-5.16.3-286.el7.x86_64.rpm
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
[root@rhel6-64-2 download]# wget
[root@rhel6-64-2 download]# wget -P /root/download http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
--2016-09-12 22:05:06-- http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.26, 209.132.181.23, 209.132.181.24, ...
Connecting to dl.fedoraproject.org|209.132.181.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21740 (21K) [application/x-rpm]
Saving to: ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?
100%[================>] 21,740 4.13K/s in 5.1s
2016-09-12 22:05:15 (4.13 KB/s) - ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?.saved [21740/21740]
[root@rhel6-64-2download]#http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
--2016-09-12 22:34:57-- http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.26, 209.132.181.27, 209.132.181.23, ...
Connecting to dl.fedoraproject.org|209.132.181.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 55224 (54K) [application/x-rpm]
Saving to: ?.erl-Mail-Sender-0.8.16-3.el6.noarch.rpm?
100%[================>] 55,224 9.12K/s in 11s
2016-09-12 22:35:09 (4.87 KB/s) - ?.erl-Mail-Sender-0.8.16-3.el6.noarch.rpm?.saved [55224/55224]
[root@rhel6-64-2 download]# wget http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
--2016-09-12 22:38:53-- http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.24, 209.132.181.25, 209.132.181.26, ...
Connecting to dl.fedoraproject.org|209.132.181.24|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28448 (28K) [application/x-rpm]
Saving to: ?.erl-Mail-Sendmail-0.79-12.el6.noarch.rpm?
100%[================>] 28,448 6.61K/s in 4.2s
2016-09-12 22:39:03 (6.61 KB/s) - ?.erl-Mail-Sendmail-0.79-12.el6.noarch.rpm?.saved [28448/28448]
[root@rhel6-64-2 download]# wget http://mirror.centos.org/centos/6/os/x86_64/Packages/perl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm
--2016-09-12 22:41:37-- http://mirror.centos.org/centos/6/os/x86_64/Packages/perl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm
Resolving mirror.centos.org... 103.27.60.52, 2605:9000:401:102::2
Connecting to mirror.centos.org|103.27.60.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50352 (49K) [application/x-rpm]
Saving to: ?.erl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm?
100%[================>] 50,352 216K/s in 0.2s
2016-09-12 22:41:43 (216 KB/s) - ?.erl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm?.saved [50352/50352]
下载完毕,对比下下载的包:
全部安装比较不容易:
总结下需要依赖安装4个包:
perl-4:5.16.3-286.el7.x86_64
perl-Log-Dispatch-2.27-1.el6.noarch
perl-Time-HiRes-4:1.9721-141.el6_7.1.x86_64
这里我用了挺暴力的方式同时进行的安装,不是特别推荐以后有好的方法,我再回来修改!
即便是manager的包,也同样用—nodeps的参数:
[root@rhel6-64-2 ~]# rpm -qa|grep mha4mysql-
mha4mysql-manager-0.53-0.el6.noarch
mha4mysql-node-0.53-0.el6.noarch
安装完成。
这里没有使用tar包安装,因为之前用tar包进行编译安装,发现测试命令无法执行,不知道哪里出了问题,有机会请教下大神,昨天这里耽误了2个多小时!
4、配置
配置mha配置文件:
[root@rhel6-64-2 ~]# mkdir -p /etc/masterha/app1
[root@rhel6-64-2 ~]# mkdir -p /var/lib/mysql/
[root@rhel6-64-2 ~]# touch /etc/masterha/app1/manager.log
[root@rhel6-64-2 ~]# vi /etc/masterha/app1.cnf
[root@rhel6-64-2 ~]# cat /etc/masterha/app1.cnf
[server default]
manager_workdir=/etc/masterha/app1
manager_log=/etc/masterha/app1/manager.log
#日志文件及其路径,需要手动创建
user=gongzi
password=123456
#之前数据库远程登录的
ssh_user=root
#用于测试ssh登录的
repl_user=qianmen
repl_password=123456
#千门做主从的用户 这个也是每一台都要授权的
[server1]
hostname=172.28.7.203
port=3306
master_binlog_dir=/var/lib/mysql
candidate_master=1
check_repl_delay=0
#/var/lib/mysql这里路径也需要手工创建
[server2]
hostname=172.28.7.204
port=3306
master_binlog_dir=/var/lib/mysql
candidate_master=1
check_repl_delay=0
[server3]
hostname=172.28.7.205
port=3306
master_binlog_dir=/var/lib/mysql
ignore_fail=1
no_master=1
四、MHA测试
1、测试SSH
[root@rhel6-64-2 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Fri Sep 9 23:05:39 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 9 23:05:39 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:05:39 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:05:39 2016 - [info] Starting SSH connection tests..
Fri Sep 9 23:05:41 2016 - [debug]
Fri Sep 9 23:05:39 2016 - [debug] Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep 9 23:05:40 2016 - [debug] ok.
Fri Sep 9 23:05:40 2016 - [debug] Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep 9 23:05:41 2016 - [debug] ok.
Fri Sep 9 23:05:42 2016 - [debug]
Fri Sep 9 23:05:39 2016 - [debug] Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep 9 23:05:40 2016 - [debug] ok.
Fri Sep 9 23:05:40 2016 - [debug] Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep 9 23:05:42 2016 - [debug] ok.
Fri Sep 9 23:05:42 2016 - [debug]
Fri Sep 9 23:05:40 2016 - [debug] Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep 9 23:05:41 2016 - [debug] ok.
Fri Sep 9 23:05:41 2016 - [debug] Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep 9 23:05:42 2016 - [debug] ok.
Fri Sep 9 23:05:42 2016 - [info] All SSH connection tests passed successfully.
这里如果出现问题,一定需要反复检查4台主机之间是否都具备免密码登录的条件。
2、测试repl
[root@rhel6-64-2 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Fri Sep 9 22:39:56 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 9 22:39:56 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep 9 22:39:56 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep 9 22:39:56 2016 - [info] MHA::MasterMonitor version 0.53.
Fri Sep 9 22:39:57 2016 - [info] Dead Servers:
Fri Sep 9 22:39:57 2016 - [info] Alive Servers:
Fri Sep 9 22:39:57 2016 - [info] 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 22:39:57 2016 - [info] 172.28.7.204(172.28.7.204:3306)
Fri Sep 9 22:39:57 2016 - [info] 172.28.7.205(172.28.7.205:3306)
Fri Sep 9 22:39:57 2016 - [info] Alive Slaves:
Fri Sep 9 22:39:57 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 22:39:57 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 22:39:57 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 22:39:57 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 22:39:57 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 22:39:57 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 22:39:57 2016 - [info] Current Alive Master: 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 22:39:57 2016 - [info] Checking slave configurations..
Fri Sep 9 22:39:57 2016 - [info] read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 22:39:57 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 22:39:57 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep 9 22:39:57 2016 - [info] Checking replication filtering settings..
Fri Sep 9 22:39:57 2016 - [info] binlog_do_db= , binlog_ignore_db=
Fri Sep 9 22:39:57 2016 - [info] Replication filtering check ok.
Fri Sep 9 22:39:57 2016 - [info] Starting SSH connection tests..
Fri Sep 9 22:40:01 2016 - [info] All SSH connection tests passed successfully.
Fri Sep 9 22:40:01 2016 - [info] Checking MHA Node version..
Fri Sep 9 22:40:02 2016 - [info] Version check ok.
Fri Sep 9 22:40:02 2016 - [info] Checking SSH publickey authentication settings on the current master..
Fri Sep 9 22:40:03 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep 9 22:40:03 2016 - [info] Master MHA Node version is 0.53.
Fri Sep 9 22:40:03 2016 - [info] Checking recovery script configurations on the current master..
Fri Sep 9 22:40:03 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000013
Fri Sep 9 22:40:03 2016 - [info] Connecting to root@172.28.7.203(172.28.7.203)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Sep 9 22:40:04 2016 - [info] Master setting check done.
Fri Sep 9 22:40:04 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Sep 9 22:40:04 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 9 22:40:04 2016 - [info] Connecting to root@172.28.7.204(172.28.7.204:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 9 22:40:05 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 9 22:40:05 2016 - [info] Connecting to root@172.28.7.205(172.28.7.205:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 9 22:40:05 2016 - [info] Slaves settings check done.
Fri Sep 9 22:40:05 2016 - [info]
172.28.7.203 (current master)
+--172.28.7.204
+--172.28.7.205
Fri Sep 9 22:40:05 2016 - [info] Checking replication health on 172.28.7.204..
Fri Sep 9 22:40:05 2016 - [info] ok.
Fri Sep 9 22:40:05 2016 - [info] Checking replication health on 172.28.7.205..
Fri Sep 9 22:40:05 2016 - [info] ok.
Fri Sep 9 22:40:05 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep 9 22:40:05 2016 - [warning] shutdown_script is not defined.
Fri Sep 9 22:40:05 2016 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@rhel6-64-2 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Fri Sep 9 23:05:39 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 9 23:05:39 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:05:39 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:05:39 2016 - [info] Starting SSH connection tests..
Fri Sep 9 23:05:41 2016 - [debug]
Fri Sep 9 23:05:39 2016 - [debug] Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep 9 23:05:40 2016 - [debug] ok.
Fri Sep 9 23:05:40 2016 - [debug] Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep 9 23:05:41 2016 - [debug] ok.
Fri Sep 9 23:05:42 2016 - [debug]
Fri Sep 9 23:05:39 2016 - [debug] Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep 9 23:05:40 2016 - [debug] ok.
Fri Sep 9 23:05:40 2016 - [debug] Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep 9 23:05:42 2016 - [debug] ok.
Fri Sep 9 23:05:42 2016 - [debug]
Fri Sep 9 23:05:40 2016 - [debug] Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep 9 23:05:41 2016 - [debug] ok.
Fri Sep 9 23:05:41 2016 - [debug] Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep 9 23:05:42 2016 - [debug] ok.
Fri Sep 9 23:05:42 2016 - [info] All SSH connection tests passed successfully.
[root@rhel6-64-2 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Fri Sep 9 23:07:54 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 9 23:07:54 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:07:54 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:07:54 2016 - [info] MHA::MasterMonitor version 0.53.
Fri Sep 9 23:07:55 2016 - [info] Dead Servers:
Fri Sep 9 23:07:55 2016 - [info] Alive Servers:
Fri Sep 9 23:07:55 2016 - [info] 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:07:55 2016 - [info] 172.28.7.204(172.28.7.204:3306)
Fri Sep 9 23:07:55 2016 - [info] 172.28.7.205(172.28.7.205:3306)
Fri Sep 9 23:07:55 2016 - [info] Alive Slaves:
Fri Sep 9 23:07:55 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:07:55 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:07:55 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 23:07:55 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:07:55 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:07:55 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 23:07:55 2016 - [info] Current Alive Master: 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:07:55 2016 - [info] Checking slave configurations..
Fri Sep 9 23:07:55 2016 - [info] read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 23:07:55 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 23:07:55 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep 9 23:07:55 2016 - [info] Checking replication filtering settings..
Fri Sep 9 23:07:55 2016 - [info] binlog_do_db= , binlog_ignore_db=
Fri Sep 9 23:07:55 2016 - [info] Replication filtering check ok.
Fri Sep 9 23:07:55 2016 - [info] Starting SSH connection tests..
Fri Sep 9 23:07:59 2016 - [info] All SSH connection tests passed successfully.
Fri Sep 9 23:07:59 2016 - [info] Checking MHA Node version..
Fri Sep 9 23:08:00 2016 - [info] Version check ok.
Fri Sep 9 23:08:00 2016 - [info] Checking SSH publickey authentication settings on the current master..
Fri Sep 9 23:08:01 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep 9 23:08:01 2016 - [info] Master MHA Node version is 0.53.
Fri Sep 9 23:08:01 2016 - [info] Checking recovery script configurations on the current master..
Fri Sep 9 23:08:01 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000013
Fri Sep 9 23:08:01 2016 - [info] Connecting to root@172.28.7.203(172.28.7.203)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Sep 9 23:08:02 2016 - [info] Master setting check done.
Fri Sep 9 23:08:02 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Sep 9 23:08:02 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 9 23:08:02 2016 - [info] Connecting to root@172.28.7.204(172.28.7.204:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 9 23:08:03 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 9 23:08:03 2016 - [info] Connecting to root@172.28.7.205(172.28.7.205:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 9 23:08:03 2016 - [info] Slaves settings check done.
Fri Sep 9 23:08:03 2016 - [info]
172.28.7.203 (current master)
+--172.28.7.204
+--172.28.7.205
Fri Sep 9 23:08:03 2016 - [info] Checking replication health on 172.28.7.204..
Fri Sep 9 23:08:03 2016 - [info] ok.
Fri Sep 9 23:08:03 2016 - [info] Checking replication health on 172.28.7.205..
Fri Sep 9 23:08:03 2016 - [info] ok.
Fri Sep 9 23:08:03 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep 9 23:08:03 2016 - [warning] shutdown_script is not defined.
Fri Sep 9 23:08:03 2016 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
这里第二次、第三次、第四次…..实验的时候一直在报错,耽误了差不多20多个小时,百度关键字前3页的网站基本都看了,也没能解决,后来翻墙去谷歌查官方文档才解决!
原来数据库主从有记录,需要reset slave一下,因为报错信息一直给的主服务器有问题,就没关注从服务器!报错信息就不在这里写了!回头写在总结中!再更新几个常见的问题,希望大家能够避免!
3、数据库宕机测试
1)启动MHA
重新开一个窗口,用于监控日志,因为这个是打开前台的进程,运行的时候光标会卡住:
[root@rhel6-64-2 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1&
2)监控日志
打开监控日志的窗口,查看:
[root@rhel6-64-2 app1]# tail -f /etc/masterha/app1/manager.log
Fri Sep 9 23:23:52 2016 - [info] MHA::MasterMonitor version 0.53.
Fri Sep 9 23:23:53 2016 - [info] Dead Servers:
Fri Sep 9 23:23:53 2016 - [info] Alive Servers:
Fri Sep 9 23:23:53 2016 - [info] 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:23:53 2016 - [info] 172.28.7.204(172.28.7.204:3306)
Fri Sep 9 23:23:53 2016 - [info] 172.28.7.205(172.28.7.205:3306)
Fri Sep 9 23:23:53 2016 - [info] Alive Slaves:
Fri Sep 9 23:23:53 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:23:53 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:23:53 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 23:23:53 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:23:53 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:23:53 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 23:23:53 2016 - [info] Current Alive Master: 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:23:53 2016 - [info] Checking slave configurations..
Fri Sep 9 23:23:53 2016 - [info] read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 23:23:53 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 23:23:53 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep 9 23:23:53 2016 - [info] Checking replication filtering settings..
Fri Sep 9 23:23:53 2016 - [info] binlog_do_db= , binlog_ignore_db=
Fri Sep 9 23:23:53 2016 - [info] Replication filtering check ok.
Fri Sep 9 23:23:53 2016 - [info] Starting SSH connection tests..
Fri Sep 9 23:23:57 2016 - [info] All SSH connection tests passed successfully.
Fri Sep 9 23:23:57 2016 - [info] Checking MHA Node version..
Fri Sep 9 23:23:58 2016 - [info] Version check ok.
Fri Sep 9 23:23:58 2016 - [info] Checking SSH publickey authentication settings on the current master..
Fri Sep 9 23:23:58 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep 9 23:23:59 2016 - [info] Master MHA Node version is 0.53.
Fri Sep 9 23:23:59 2016 - [info] Checking recovery script configurations on the current master..
Fri Sep 9 23:23:59 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000013
Fri Sep 9 23:23:59 2016 - [info] Connecting to root@172.28.7.203(172.28.7.203)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Sep 9 23:24:00 2016 - [info] Master setting check done.
Fri Sep 9 23:24:00 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Sep 9 23:24:00 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 9 23:24:00 2016 - [info] Connecting to root@172.28.7.204(172.28.7.204:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 9 23:24:00 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 9 23:24:00 2016 - [info] Connecting to root@172.28.7.205(172.28.7.205:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 9 23:24:01 2016 - [info] Slaves settings check done.
Fri Sep 9 23:24:01 2016 - [info]
172.28.7.203 (current master)
+--172.28.7.204
+--172.28.7.205
Fri Sep 9 23:24:01 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep 9 23:24:01 2016 - [warning] shutdown_script is not defined.
Fri Sep 9 23:24:01 2016 - [info] Set master ping interval 3 seconds.
Fri Sep 9 23:24:01 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Sep 9 23:24:01 2016 - [info] Starting ping health check on 172.28.7.203(172.28.7.203:3306)..
Fri Sep 9 23:24:01 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
一切准备就绪,就等着MYSQL无响应呢。
3)主服务器宕机
手动暂停主数据库服务器:
[root@rhel6-64-3 ~]# service mysqld stop
Stopping mysqld: [ OK ]
4)查看日志
Fri Sep 9 23:24:01 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep 9 23:24:01 2016 - [warning] shutdown_script is not defined.
Fri Sep 9 23:24:01 2016 - [info] Set master ping interval 3 seconds.
Fri Sep 9 23:24:01 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Sep 9 23:24:01 2016 - [info] Starting ping health check on 172.28.7.203(172.28.7.203:3306)..
Fri Sep 9 23:24:01 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Sep 9 23:37:37 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Fri Sep 9 23:37:37 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql-bin
Fri Sep 9 23:37:38 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep 9 23:37:40 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Sep 9 23:37:40 2016 - [warning] Connection failed 1 time(s)..
Fri Sep 9 23:37:43 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Sep 9 23:37:43 2016 - [warning] Connection failed 2 time(s)..
Fri Sep 9 23:37:46 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Sep 9 23:37:46 2016 - [warning] Connection failed 3 time(s)..
Fri Sep 9 23:37:46 2016 - [warning] Master is not reachable from health checker!
Fri Sep 9 23:37:46 2016 - [warning] Master 172.28.7.203(172.28.7.203:3306) is not reachable!
Fri Sep 9 23:37:46 2016 - [warning] SSH is reachable.
Fri Sep 9 23:37:46 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Fri Sep 9 23:37:46 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 9 23:37:46 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:37:46 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep 9 23:37:47 2016 - [info] Dead Servers:
Fri Sep 9 23:37:47 2016 - [info] 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:47 2016 - [info] Alive Servers:
Fri Sep 9 23:37:47 2016 - [info] 172.28.7.204(172.28.7.204:3306)
Fri Sep 9 23:37:47 2016 - [info] 172.28.7.205(172.28.7.205:3306)
Fri Sep 9 23:37:47 2016 - [info] Alive Slaves:
Fri Sep 9 23:37:47 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:47 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:47 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 23:37:47 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:47 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:47 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 23:37:47 2016 - [info] Checking slave configurations..
Fri Sep 9 23:37:47 2016 - [info] read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 23:37:47 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep 9 23:37:47 2016 - [warning] relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep 9 23:37:47 2016 - [info] Checking replication filtering settings..
Fri Sep 9 23:37:47 2016 - [info] Replication filtering check ok.
Fri Sep 9 23:37:47 2016 - [info] Master is down!
Fri Sep 9 23:37:47 2016 - [info] Terminating monitoring script.
Fri Sep 9 23:37:47 2016 - [info] Got exit code 20 (Master dead).
Fri Sep 9 23:37:47 2016 - [info] MHA::MasterFailover version 0.53.
Fri Sep 9 23:37:47 2016 - [info] Starting master failover.
Fri Sep 9 23:37:47 2016 - [info]
Fri Sep 9 23:37:47 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Sep 9 23:37:47 2016 - [info]
Fri Sep 9 23:37:48 2016 - [info] Dead Servers:
Fri Sep 9 23:37:48 2016 - [info] 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:48 2016 - [info] Checking master reachability via mysql(double check)..
Fri Sep 9 23:37:48 2016 - [info] ok.
Fri Sep 9 23:37:48 2016 - [info] Alive Servers:
Fri Sep 9 23:37:48 2016 - [info] 172.28.7.204(172.28.7.204:3306)
Fri Sep 9 23:37:48 2016 - [info] 172.28.7.205(172.28.7.205:3306)
Fri Sep 9 23:37:48 2016 - [info] Alive Slaves:
Fri Sep 9 23:37:48 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:48 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:48 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 23:37:48 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:48 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:48 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 23:37:48 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Sep 9 23:37:48 2016 - [info]
Fri Sep 9 23:37:48 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Sep 9 23:37:48 2016 - [info]
Fri Sep 9 23:37:49 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Sep 9 23:37:49 2016 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Fri Sep 9 23:37:49 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Sep 9 23:37:50 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Sep 9 23:37:50 2016 - [info]
Fri Sep 9 23:37:50 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Sep 9 23:37:50 2016 - [info]
Fri Sep 9 23:37:50 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Sep 9 23:37:50 2016 - [info]
Fri Sep 9 23:37:50 2016 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:106
Fri Sep 9 23:37:50 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Sep 9 23:37:50 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:50 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:50 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 23:37:50 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:50 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:50 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 23:37:50 2016 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:106
Fri Sep 9 23:37:50 2016 - [info] Oldest slaves:
Fri Sep 9 23:37:50 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:50 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:50 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 23:37:50 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:50 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:50 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 23:37:50 2016 - [info]
Fri Sep 9 23:37:50 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Sep 9 23:37:50 2016 - [info]
Fri Sep 9 23:37:50 2016 - [info] Fetching dead master's binary logs..
Fri Sep 9 23:37:50 2016 - [info] Executing command on the dead master 172.28.7.203(172.28.7.203:3306): save_binary_logs --command=save --start_file=mysql-bin.000013 --start_pos=106 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000013 pos 106 to mysql-bin.000013 EOF into /var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog ..
Dumping binlog format description event, from position 0 to 106.. ok.
Dumping effective binlog data from /var/lib/mysql/mysql-bin.000013 position 106 to tail(125).. ok.
Concat succeeded.
Fri Sep 9 23:37:52 2016 - [info] scp from root@172.28.7.203:/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog to local:/etc/masterha/app1/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog succeeded.
Fri Sep 9 23:37:52 2016 - [info] HealthCheck: SSH to 172.28.7.204 is reachable.
Fri Sep 9 23:37:53 2016 - [info] HealthCheck: SSH to 172.28.7.205 is reachable.
Fri Sep 9 23:37:54 2016 - [info]
Fri Sep 9 23:37:54 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Sep 9 23:37:54 2016 - [info]
Fri Sep 9 23:37:54 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Sep 9 23:37:54 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Sep 9 23:37:54 2016 - [info] Searching new master from slaves..
Fri Sep 9 23:37:54 2016 - [info] Candidate masters from the configuration file:
Fri Sep 9 23:37:54 2016 - [info] 172.28.7.204(172.28.7.204:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:54 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:54 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 9 23:37:54 2016 - [info] Non-candidate masters:
Fri Sep 9 23:37:54 2016 - [info] 172.28.7.205(172.28.7.205:3306) Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep 9 23:37:54 2016 - [info] Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep 9 23:37:54 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Sep 9 23:37:54 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Sep 9 23:37:54 2016 - [info] New master is 172.28.7.204(172.28.7.204:3306)
Fri Sep 9 23:37:54 2016 - [info] Starting master failover..
Fri Sep 9 23:37:54 2016 - [info]
From:
172.28.7.203 (current master)
+--172.28.7.204
+--172.28.7.205
To:
172.28.7.204 (new master)
+--172.28.7.205
Fri Sep 9 23:37:54 2016 - [info]
Fri Sep 9 23:37:54 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Sep 9 23:37:54 2016 - [info]
Fri Sep 9 23:37:54 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Sep 9 23:37:54 2016 - [info] Sending binlog..
Fri Sep 9 23:37:55 2016 - [info] scp from local:/etc/masterha/app1/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog to root@172.28.7.204:/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog succeeded.
Fri Sep 9 23:37:55 2016 - [info]
Fri Sep 9 23:37:55 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Sep 9 23:37:55 2016 - [info]
Fri Sep 9 23:37:55 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Sep 9 23:37:55 2016 - [info] Starting recovery on 172.28.7.204(172.28.7.204:3306)..
Fri Sep 9 23:37:55 2016 - [info] Generating diffs succeeded.
Fri Sep 9 23:37:55 2016 - [info] Waiting until all relay logs are applied.
Fri Sep 9 23:37:55 2016 - [info] done.
Fri Sep 9 23:37:55 2016 - [info] Getting slave status..
Fri Sep 9 23:37:55 2016 - [info] This slave(172.28.7.204)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:106). No need to recover from Exec_Master_Log_Pos.
Fri Sep 9 23:37:55 2016 - [info] Connecting to the target slave host 172.28.7.204, running recover script..
Fri Sep 9 23:37:55 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog --workdir=/var/tmp --target_version=5.1.66-log --timestamp=20160909233747 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Fri Sep 9 23:37:55 2016 - [info]
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog on 172.28.7.204:3306. This may take long time...
Applying log files succeeded.
Fri Sep 9 23:37:55 2016 - [info] All relay logs were successfully applied.
Fri Sep 9 23:37:55 2016 - [info] Getting new master's binlog name and position..
Fri Sep 9 23:37:55 2016 - [info] binlog.000007:106
Fri Sep 9 23:37:55 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.28.7.204', MASTER_PORT=3306, MASTER_LOG_FILE='binlog.000007', MASTER_LOG_POS=106, MASTER_USER='qianmen', MASTER_PASSWORD='xxx';
Fri Sep 9 23:37:55 2016 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Fri Sep 9 23:37:55 2016 - [info] ** Finished master recovery successfully.
Fri Sep 9 23:37:55 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Sep 9 23:37:55 2016 - [info]
Fri Sep 9 23:37:55 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Sep 9 23:37:55 2016 - [info]
Fri Sep 9 23:37:55 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Sep 9 23:37:55 2016 - [info]
Fri Sep 9 23:37:56 2016 - [info] -- Slave diff file generation on host 172.28.7.205(172.28.7.205:3306) started, pid: 50129. Check tmp log /etc/masterha/app1/172.28.7.205_3306_20160909233747.log if it takes time..
Fri Sep 9 23:37:57 2016 - [info]
Fri Sep 9 23:37:57 2016 - [info] Log messages from 172.28.7.205 ...
Fri Sep 9 23:37:57 2016 - [info]
Fri Sep 9 23:37:56 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Sep 9 23:37:57 2016 - [info] End of log messages from 172.28.7.205.
Fri Sep 9 23:37:57 2016 - [info] -- 172.28.7.205(172.28.7.205:3306) has the latest relay log events.
Fri Sep 9 23:37:57 2016 - [info] Generating relay diff files from the latest slave succeeded.
Fri Sep 9 23:37:57 2016 - [info]
Fri Sep 9 23:37:57 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Sep 9 23:37:57 2016 - [info]
Fri Sep 9 23:37:57 2016 - [info] -- Slave recovery on host 172.28.7.205(172.28.7.205:3306) started, pid: 50131. Check tmp log /etc/masterha/app1/172.28.7.205_3306_20160909233747.log if it takes time..
Fri Sep 9 23:37:59 2016 - [info]
Fri Sep 9 23:37:59 2016 - [info] Log messages from 172.28.7.205 ...
Fri Sep 9 23:37:59 2016 - [info]
Fri Sep 9 23:37:57 2016 - [info] Sending binlog..
Fri Sep 9 23:37:58 2016 - [info] scp from local:/etc/masterha/app1/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog to root@172.28.7.205:/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog succeeded.
Fri Sep 9 23:37:58 2016 - [info] Starting recovery on 172.28.7.205(172.28.7.205:3306)..
Fri Sep 9 23:37:58 2016 - [info] Generating diffs succeeded.
Fri Sep 9 23:37:58 2016 - [info] Waiting until all relay logs are applied.
Fri Sep 9 23:37:58 2016 - [info] done.
Fri Sep 9 23:37:58 2016 - [info] Getting slave status..
Fri Sep 9 23:37:58 2016 - [info] This slave(172.28.7.205)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:106). No need to recover from Exec_Master_Log_Pos.
Fri Sep 9 23:37:58 2016 - [info] Connecting to the target slave host 172.28.7.205, running recover script..
Fri Sep 9 23:37:58 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog --workdir=/var/tmp --target_version=5.1.66-log --timestamp=20160909233747 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Fri Sep 9 23:37:58 2016 - [info]
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog on 172.28.7.205:3306. This may take long time...
Applying log files succeeded.
Fri Sep 9 23:37:58 2016 - [info] All relay logs were successfully applied.
Fri Sep 9 23:37:58 2016 - [info] Resetting slave 172.28.7.205(172.28.7.205:3306) and starting replication from the new master 172.28.7.204(172.28.7.204:3306)..
Fri Sep 9 23:37:58 2016 - [info] Executed CHANGE MASTER.
Fri Sep 9 23:37:59 2016 - [info] Slave started.
Fri Sep 9 23:37:59 2016 - [info] End of log messages from 172.28.7.205.
Fri Sep 9 23:37:59 2016 - [info] -- Slave recovery on host 172.28.7.205(172.28.7.205:3306) succeeded.
Fri Sep 9 23:37:59 2016 - [info] All new slave servers recovered successfully.
Fri Sep 9 23:37:59 2016 - [info]
Fri Sep 9 23:37:59 2016 - [info] * Phase 5: New master cleanup phease..
Fri Sep 9 23:37:59 2016 - [info]
Fri Sep 9 23:37:59 2016 - [info] Resetting slave info on the new master..
Fri Sep 9 23:37:59 2016 - [info] 172.28.7.204: Resetting slave info succeeded.
Fri Sep 9 23:37:59 2016 - [info] Master failover to 172.28.7.204(172.28.7.204:3306) completed successfully.
Fri Sep 9 23:37:59 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover 172.28.7.203 to 172.28.7.204 succeeded
Master 172.28.7.203 is down!
Check MHA Manager logs at rhel6-64-2:/etc/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 172.28.7.204(172.28.7.204:3306) has all relay logs for recovery.
Selected 172.28.7.204 as a new master.
172.28.7.204: OK: Applying all logs succeeded.
172.28.7.205: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.28.7.205: OK: Applying all logs succeeded. Slave started, replicating from 172.28.7.204.
172.28.7.204: Resetting slave info succeeded.
Master failover to 172.28.7.204(172.28.7.204:3306) completed successfully.
再登录slave2服务器,看slave 的状态:
[root@rhel6-64-5 ~]# mysql
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.28.7.204
Master_User: qianmen
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000007
Read_Master_Log_Pos: 106
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 248
Relay_Master_Log_File: binlog.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 106
Relay_Log_Space: 403
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File: 60
Master_SSL_CA_Path: 0
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
测试成功,如黄底红字标示所示,原203主机MySQL连接失败,204主机当选新的master,205主机记录最近的日志,大功告成,打完收工。
五、MHA维护与问题汇总
1、MHA日常维护命令
1)查看ssh登陆是否成功
masterha_check_ssh --conf=/etc/masterha/app1.cnf
2)查看复制是否建立好
masterha_check_repl --conf=/etc/masterha/app1.cnf
3)启动mha
nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &
当有slave节点宕掉的情况是启动不了的,加上--ignore_fail_on_start即使有节点宕掉也能启动mha
nohup masterha_manager --conf=/etc/masterha/app1.cnf --ignore_fail_on_start > /tmp/mha_manager.log < /dev/null 2>&1 &
4)检查启动的状态
masterha_check_status --conf=/etc/masterha/app1.cnf
5)停止mha
masterha_stop --conf=/etc/masterha/app1.cnf
6)failover后下次重启
每次failover切换后会在管理目录生成文件app1.failover.complete ,下次在切换的时候会发现有这个文件导致切换不成功,需要手动清理掉。
rm -rf /masterha/app1/app1.failover.complete
也可以加上参数--ignore_last_failover。
7)手工failover
手工failover场景,master死掉,但是masterha_manager没有开启,可以通过手工failover:
masterha_master_switch --conf=/etc/masterha/app1.cnf --dead_master_host=10.50.2.10 --master_state=dead --new_master_host=10.50.2.12 --ignore_last_failover
8)masterha_manager是一种监视和故障转移的程序。
另一方面,masterha_master_switch程序不监控主库。 masterha_master_switch可以用于主库故障转移,也可用于在线总开关。
9)手动在线切换
masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=192.168.119.74 --orig_master_is_new_slave
或者:
masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=192.168.119.74 --orig_master_is_new_slave --running_updates_limit=10000
--orig_master_is_new_slave切换时加上此参数是将原master变为slave节点,如果不加此参数,原来的master将不启动。
--running_updates_limit=10000 切换时候选master如果有延迟的话,mha切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover时relay日志的大小决定。
手动在线切换mha,切换时需要将在运行的mha停掉后才能切换。
在备库先执行DDL,一般先stop slave,一般不记录mysql日志,可以通过set SQL_LOG_BIN = 0实现。然后进行一次主备切换操作,再在原来的主库上执行DDL。这种方法适用于增减索引,如果是增加字段就需要额外注意。
可以通过如下命令停止mha:
masterha_stop --conf=/etc/app1.cnf
2、遇到的问题
1)yum search perl-Log-Dispatch 找不到
解决:
yum -y install perl-Log*
2)yum search parallel 找不到
解决:网上下载了Log-Dispatchouli-master.zip 安装后再yum安装parallel。
3)健康检查Not OK
解决: 因为172.28.7.203/204/205曾设置过主从,master为因为172.28.7.203。
在slave1/2上show slave status一直有信息,需要reset slave。
最后如果有问题,还得访问官网!https://code.google.com/p/mysql-master-ha/issues/list(需要梯子)。
4) 安装epel失败
Error: Cannot retrieve repository metadata (repomd.xml) for repository: epel. Please verify its path and try again
I tried installing EPEL on a fresh install of RHEL6, and after adding the repo, yum fails with the above error. I have RHEL6.1 (Santiago) and get the above error.
This happens because the RHEL/CentOS installation doesn’t trust the HTTPS certificate used by mirrors.fedoraproject.org, that is issued by “GeoTrust SSL CA“.
In my case the package ca-certificates was not installed and the/etc/pki/tls/certs/ folder didn’t contain any ca-bundle.crt or ca-bundle.trust.crt !
Solution: yum install ca-certificates
(I had to temporarily rpm –erase epel-release first, to get yum working again)
I once got the same error message eventhout ca-certificates was installed and up to date. Then it was a firewall blocking https (port 443) traffic.
I worked around that by changing from https to http in/etc/yum.repos.d/epel.repo
简单点说 就是https是443端口,你并没有把该网站的CA证书添加进入,所以要改成80端口,虽然不那么安全。