当前位置: 首页 > news >正文

MYSQL高可用集群MHA架构

一、MHA概述

1、MHA简介

MHA是当 MASTER 出现故障,挑选一个 SLAVE 作为新的 MASTER 并构建成新的主从架构的管理工具。从 MASTER 出现故障到构建成新的主从架构时间是 10-30秒。在 MASTER 出现故障时可能会出现 SLAVE 同步的数据不一致的现象,此工具可以自动应用差异的中继日志到其他 SLAVE 上保证数据的一致性。

2、MHA优点

1. Master crash 时可以快速的进行故障切换

MHA 是当 MASTER 出现故障,挑选一个 SLAVE 作为新的 MASTER 并构建成新的主从架构的管理工具。从 MASTER 出现故障到构建成新的主从架构时间是 10-30秒。在 MASTER 出现故障时可能会出现 SLAVE 同步的数据不一致的现象,此工具可以自动应用差异的中继日志到其他 SLAVE 上保证数据的一致性。    

2. Master crash 时可以最大化的减少数据丢失

当 master crash 时 MHA 自动检测选择数据同步最全的 slave,并把差异日志应用到其他 slave 上, 以保障数据的一致性。

3. Semi-Synchronous Replication 可以最大化的减少数据的丢失

MHA 的更改升级配置等不影响线上正在运行的数据库,使用 mha 不需要增加太多的服务器。MHA 由 MHA Manager 和 MHA Node 组成。 MHA Node 运行在 MYSQL 服务器上,所以不会因为 MHA node 增加新的服务器。

MHA Manager 通常需要独立运行在一台服务器上,所以你需要增加一台服务器用于监控管理运行 MHA Manager,但是一台服务器上的 MHA Manager 可以同时监控管理多达百台 master,所以总的来说服务器增加不会太多。MHA Manger 也可以运行在一台 slave 上,这样总的服务器数也不会增加。

4. 原来应用系统整体性能不会降低太多    

MHA 工作在异步或半同步的主从架构上。当监控 master 时,MHA 会每隔几Mha 部署测试文档5秒 (默认 3 秒) 向 master 发出 ping 包并且不需要大的 sql 语句用于监控 master的健康状。Slave 需要开启 binlog,整体性能不会有太大的降低。

5. MHA 适合任何存储引擎    

只要能主从复制的存储引擎它都支持,不限于支持事物的innodb引擎。

二、安装基础环境

1、软件下载

系统镜像盘链接:百度网盘 请输入提取码 密码:7vmr

软件包链接:

方式一:百度云:链接:https://pan.baidu.com/s/1mhSojWW 密码:bpky 

方式二:官方下载https://code.google.com/p/mysql-master-ha/(需要自备梯子)

添加一个yum源:

wget http://dl.fedoraproject.org/pub/ ... ease-5-4.noarch.rpm
rpm -ivh epel-release-5-4.noarch.rpm(亲测,可用)

Mha分manager节点和node节点。可以下载源码包,或者rpm包,随个人喜好,注意,版本与系统内核要匹配。 

2、角色清单

IP地址

主机名

角色

软件

172.28.7.202

rhel6-64-2

manager

mha4mysql-managermha4mysql-node

172.28.7.203

rhel6-64-3

master

mha4mysql-node

172.28.7.204

rhel6-64-4

Slave1

mha4mysql-node

172.28.7.205

rhel6-64-5

Slave2

mha4mysql-node

3、系统配置

配置如下:

系统

内核

Mysql版本

内存

Red Hat release 6.4

Linux version 2.6.32-358.el6.x86_64

mysql  Ver 14.14 Distrib 5.1.66

1G

查看命令方式:

[root@rhel6-64-2 ~]# cat /proc/version 
Linux version 2.6.32-358.el6.x86_64 (mockbuild@x86-022.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:47:41 EST 2013
[root@rhel6-64-2 ~]# mysql -V
mysql  Ver 14.14 Distrib 5.1.66, for redhat-linux-gnu (x86_64) using readline 5.1

4、架构图

5、主机环境配置(IP/DNS/HOSTS/免密码)

主机名称:

[root@rhel6-64-2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.28.7.202 manager
172.28.7.203 master
172.28.7.204 slave1
172.28.7.205 slave2

我这里已经配置好了免密码,正常情况下会要求输入密码: 


[root@rhel6-64-2 ~]# scp /etc/hosts master:/etc/ 
hosts                                100%  239     0.2KB/s   00:00  
[root@rhel6-64-2 ~]# scp /etc/hosts slave1:/etc/
hosts                                100%  239     0.2KB/s   00:00  
[root@rhel6-64-2 ~]# scp /etc/hosts slave2:/etc/
hosts                                100%  239     0.2KB/s   00:00  

IP/路由/dns地址:

要保证可以访问外网,因为装软件包需要多种依赖包,默认光盘自带的软件包不够,需要用wget命令去公网下载。

SSH免密码登录:

[root@rhel6-64-2 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 回车
Enter passphrase (empty for no passphrase): 回车
Enter same passphrase again: 回车
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1c:e6:0e:0f:85:5b:34:97:35:20:7d:77:c5:bd:a2:97 root@rhel6-64-1
The key's randomart image is:
+--[ RSA 2048]----+
|        +.o+o  .+|
|       o +o ... +|
|      . =  . . ..|
|       B .   . . |
|      + S   . o  |
|       =   . E   |
|        o   .    |
|                 |
|                 |
+-----------------+
[root@rhel6-64-5 ~]# ssh-copy-id 172.28.7.202
The authenticity of host '172.28.7.202 (172.28.7.202)' can't be established.
RSA key fingerprint is 64:32:50:c4:06:be:b6:e7:95:13:76:48:3b:e7:8b:aa.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.28.7.202' (RSA) to the list of known hosts.
root@172.28.7.202's password: 输入7.202的秘钥
Now try logging into the machine, with "ssh '172.28.7.202'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.
[root@rhel6-64-5 ~]# ssh 172.28.7.202(再进行登录测试,以不输入密码直接登录为准)
Last login: Thu Sep  8 15:19:27 2016 from 192.168.255.251

其他台配置方法与以上一致 保证每台登录互相无交互,配置完成后每一个都登录一次 过程略配置后查看。

最后查看授权的秘钥:

[root@rhel6-64-2 ~]# cat .ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA9OJqUQpcmfym4xVhMhf4chNyNgzUzT0y+89A5KBXwIv1nN3X9bMoXQhVzDWB2MlzqbZdVw4TgYyINfNpQtuFEa7nEUen8KDKACTtTU99xf8Wrzhsvm6hREEfKml1zrg46TFk3Yjil+3QDq7wefk2g12HYbJpE946Ffr49NilcgUrh6HimEW4eGgTQSpyLyfrd2VS9bUcJGxoEbpoeniZgpbHqT3hidhrCLTYzu2GBaNunZTPqbfljPp0SpCUVCBouAzjSj0j7xT6wer7X1BkHQU4ipqlGu/t3EEFhkrhKqh94Az5oM+K6g7zXbL5qhG+4JVhu/zY6ZmaOaDZ16kgJQ== root@rhel6-64-5
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAuuBt+8vSrz+4PTPcBE5ButQmgBOX93opESAxAqU1q/9fQA0Ytj+m80l4GgaxJNxJxYfaMNBo3eDPMzzLqVMHtG8j6IhKo3BBxeT9vbtHvOrgnxNy2MOJ9D+EDqMQ0r8IKQBwJbPCEDGGoBelJYnBDNO17doJ0UM1i5n4tLIeFwczIDbR7TNB/tym26Bv+yOmyrVsG7gOQT3mCoK6RSBC6fv0OAse00cjzYpfZnRAfldWTfqBsaibHqnoMWJpiqZH/gH3LoeFW3d7zwaU6kY0Oqux49sXGmI2UEgNfJhLaJbyXkc1JCD3UlISVWxNGi7kUCKz/bOjDxt5D15NdO/e0w== root@rhel6-64-2
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzumZFvTs4bai/qOMTkjW5w+dL+hvyhnYsRbANVnmTbCTyB2sGW2VkY46r/FDcAtXYULdzwOpN9hE+rRb8HYT3YCjx5RdvY+wJzpd239lzwAY691AD7PBUOMbZUoyV/Bim5wFrAjLXUJ+ER4Cahcn/n5ZbkVmBH4E3X/xdHXQj6weAPawjLSUryRfFKPUg6G/8yjLPPR61WbznmAuOi0Lbw/U5L5Hf0SZ1CcT7R+GLyea6WxQ2H9MCdTCV5i+mrC8LqGaDBbuVXq5ZUeb7WjbkfpBkep1g+yFnSzTBXRSKXRyi9vK2CAD+zLhdrGKrmvtng0QeWxo77VkgQkD0CYzxQ== root@rhel6-64-3
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAxsAASxEYfJbn6/PCh3lXRwxAifrigpQsExpP/4WPRFa1aCw53zABEfbq8T0rjDS7oZXjbMZ/hnMA6okdG9hLypplW+UcMIfKiyaTAOhs2CDKvopsfATRZbALaewP7+LLf4haYUoNOK0DI3Y7YRzgXG0pEUgWMfeBlLldNIDv3b7+UzBIPSe1WComqqPlLav/PyS7VJxfNxDMZ5slkkmmi6UJYep5KvsRkF+Ci/cyaywJmzllH6OLDuI+cEJBH3KDxnui9td3/Llr3+LfLoxcyRZIXJD5NMQrtKYzW/OzAfeWsT4iMSK4d1Gqg+SXdNh+c3cqPyVh8qEvnCeIszBIoQ== root@rhel6-64-4

6、安装MYSQL

服务器主从配置:

在master/slave1/slave2安装MySQL数据库,简化实验过程使用yum 安装即可。

[root@rhel6-64-3 ~]# yum install -y mysql-server
	……

Installed:
  mysql-server.x86_64 0:5.1.66-2.el6_3                                 

Dependency Installed:
  mysql.x86_64 0:5.1.66-2.el6_3   mysql-libs.x86_64 0:5.1.66-2.el6_3  
Complete!
[root@rhel6-64-3 ~]# service mysqld start 
Starting mysqld:                                           [  OK  ]
yum安装过程manager/slave1/slave2完全相同,步骤略。

7、配置MYSQL主从

在master主机上修改my.cnf文件:

[root@rhel6-64-3 ~]# cat /etc/my.cnf 
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-bin=mysql-bin 
server-id = 1
#红色标记部分为新增,注意不要添加到【mysql_safe】里,开始做实验室的时候盲目添加,导致主从配置失败。
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

在slave1上修改my.cnf文件,添加红色自己内容:

[root@rhel6-64-4 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-bin=mysql-bin
server-id = 2  
relay_log = /var/lib/mysql/mysql-relay-bin
log_slave_updates = 1
read_only = 1
#注意每台机器的server-id 一定互不相同

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

修改slave2中my.cnf文件:

[root@rhel6-64-5 ~]# cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-bin=mysql-bin
server-id = 3
relay_log = /var/lib/mysql/mysql-relay-bin
log_slave_updates = 1
read_only = 1  #表示从服务器是只读的!

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

重启所有mysql服务:

[root@rhel6-64-3 ~]# service mysqld restart
Stopping mysqld:                                           [  OK  ]
Starting mysqld:                                           [  OK  ]

进入master服务器中,授权qianmen用户给从服务器:

[root@rhel6-64-3 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.1.66-log Source distribution

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant replication slave, replication client on *.* to qianmen@'%' identified by '123456';

Query OK, 0 rows affected (0.00 sec)

如果不放心可以slave主机进行登录测试,命令如下:

[root@rhel6-64-4 ~]# mysql -uqianmen -p123456 -hmaster
Welcome to the MySQL monitor.  

出现welcome表明成功。

查看主机状态,要求记录file文件名称和position:

mysql> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000010 |      788 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

在2台从服务器上和主服务器进行同步:

[root@rhel6-64-4 ~]# mysql
Welcome to the MySQL monitor. 
。。。。。。
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

将圈中的信息填写正确。

mysql> change master to master_host='master',master_port=3306,master_user='qianmen',master_password='123456',master_log_file='mysql-bin.000003',master_log_pos=344;

mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: qianmen
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000010
          Read_Master_Log_Pos: 788
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 251
        Relay_Master_Log_File: mysql-bin.000010
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
            ……
1 row in set (0.00 sec)

 Slave2中进行相同操作,过程略!

之前操作一直失败,这里想说明2点:

1)授权用户不要给all,要有replication,MHA会在配置文件里要求能远程登录到数据库,所以要进行必要的赋权。

2)一定看准了主服务器的file和position,填错了 slave I/O装填会变成No。

然后再就是赋权了,之前的一步赋权操作是权限是只有replication,MHA会在配置文件里要求能远程登录到数据库,所以要进行必要的赋权。

在三台机器中都做如下操作:

mysql> grant all privileges on *.* to 'gongzi'@'%' identified by '123456';
mysql> flush privileges;

mysql相关用户,及权限的命令,以下命令无需敲打,仅供参考和验证!

mysql> select user,host from mysql.user; 		查看用户表
mysql> show grants for 'user2'@'%';    		查看授权用户
mysql> use mysql;						应用mysql数据库
mysql> drop user user4@172.28.7.204;		删除用户

三、安装配置MHA

1、安装准备工作

配置YUM源和依赖关系包。

接下来就是开始正式安装MHA了,先安装节点包开始 四台机器都要安装!每台机器上安装yum的一个软件源EPEL,里面包含了许多基本源里没有的软件。

[root@rhel6-64-5 ~]# rpm -ivh							 http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
Retrieving http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
warning: /var/tmp/rpm-tmp.vKNo5c: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing...                ########################################### [100%]
package epel-release-6-8.noarch is already installed

安装完成后,会发现原来的yum仓库中多处2个文件:

打开epel.repo:

[root@rhel6-64-5 ~]# cat /etc/yum.repos.d/epel.repo 
[epel]
name=Extra Packages for Enterprise Linux 6 - $basearch
baseurl=http://download.fedoraproject.org/pub/epel/6/$basearch #将注释的#去掉
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-6&arch=$basearch#前面加上#
failovermethod=priority
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6

什么是epel?

如果既想获得 RHEL 的高质量、高性能、高可靠性,又需要方便易用(关键是免费)的软件包更新功能,那么 Fedora Project 推出的 EPEL(Extra Packages for Enterprise Linux)正好适合你。EPEL(http://fedoraproject.org/wiki/EPEL) 是由 Fedora 社区打造,为 RHEL 及衍生发行版如 CentOS、Scientific Linux 等提供高质量软件包的项目。

另外要把https改成http,如果不改会报错,报错原因写在最后解决中!

[root@rhel6-64-5 ~]# yum clean all
[root@rhel6-64-5 ~]# yum list
[root@rhel6-64-5 ~]# yum install perl-DBD-MySQL

MHA是perl编写的软件需要perl支持 之前yum安装mysql已经安装过了 如果没安装过需要安装这个依赖:

[root@rhel6-64-5 ~]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager --skip-broken

如果基础软件包不安装全,后续软件无法安装:

以上操作每台都需要做,以下操作管理节点(manger)需要两个都安装,在3台数据库节点只要安装MHA的node节点!

2、装Node

安装node节点!软件包可以本地rz上传,也可以使用:

[root@rhel6-64-5~]#rpm	-Uvh http://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.52-0.noarch.rpm

去下载,但是需要梯子,我这里已经上传完成!

[root@rhel6-64-5 ~]# rpm -ivh mha4mysql-node-0.53-0.el6.noarch.rpm 
Preparing...                ########################################### [100%]
   1:mha4mysql-node         ########################################### [100%]

如果安装不成功,请检查是否安装了,该包不需要epel!

[root@rhel6-64-5 ~]# yum install -y perl-DBD-MySQL

3、安装manager

安装之前需要安装一些依赖包:

#yum install -y perl-DBD-MySQL
#yum install -y perl-Config-Tiny
#yum install -y perl-Log-Dispatch
#yum install -y perl-Parallel-ForkManager
#yum install -y perl-Config-IniFiles

也可以直接下载rpm包进行包装:

[root@rhel6-64-2 download]# wget -P /root/download/ http://ftp.muug.mb.ca/mirror/centos/7/os/x86_64/Packages/perl-5.16.3-286.el7.x86_64.rpm
--2016-09-12 22:02:36--  http://ftp.muug.mb.ca/mirror/centos/7/os/x86_64/Packages/perl-5.16.3-286.el7.x86_64.rpm
Resolving ftp.muug.mb.ca... 208.81.1.244, 2605:e200:3:4::244
Connecting to ftp.muug.mb.ca|208.81.1.244|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8352488 (8.0M) [application/x-redhat-package-manager]
Saving to: ?.root/download/perl-5.16.3-286.el7.x86_64.rpm?
100%[================>] 8,352,488    558K/s   in 30s     
2016-09-12 22:03:07 (268 KB/s) - ?.root/download/perl-5.16.3-286.el7.x86_64.rpm?.saved 
[8352488/8352488] 

[root@rhel6-64-2 download]# wget -P /root/download http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
--2016-09-12 22:05:06--  http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.26, 209.132.181.23, 209.132.181.24, ...
Connecting to dl.fedoraproject.org|209.132.181.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21740 (21K) [application/x-rpm]
Saving to: ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?

100%[================>] 21,740      4.13K/s   in 5.1s    

2016-09-12 22:05:15 (4.13 KB/s) - ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?.saved [21740/21740]

[root@rhel6-64-2 download]# wget -p /root/download/ http://downloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm
/root/download/: Scheme missing.
--2016-09-12 22:29:49--  http://downloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm
Resolving downloads.naulinux.ru... 91.151.181.167
Connecting to downloads.naulinux.ru|91.151.181.167|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 71564 (70K) [application/x-rpm]
Saving to: ?.ownloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm?

100%[================>] 71,564       119K/s   in 0.6s    

2016-09-12 22:29:50 (119 KB/s) -
?.ownloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm?.saved [71564/71564]

FINISHED --2016-09-12 22:29:50--
Downloaded: 1 files, 70K in 0.6s (119 KB/s)
[root@rhel6-64-2 download]# ls
downloads.naulinux.ru
perl-5.16.3-286.el7.x86_64.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
[root@rhel6-64-2 download]# cat downloads.naulinux.ru/
cat: downloads.naulinux.ru/: Is a directory
[root@rhel6-64-2 download]# cd downloads.naulinux.ru/
[root@rhel6-64-2 downloads.naulinux.ru]# ls
pub
[root@rhel6-64-2 downloads.naulinux.ru]# cd pub/
[root@rhel6-64-2 pub]# ls
NauLinux
[root@rhel6-64-2 pub]# mv NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm /root/download/
[root@rhel6-64-2 pub]# cd /root/download/
[root@rhel6-64-2 download]# ls 
downloads.naulinux.ru #这是个路径,要把该路径最深处的rpm包安装,你也可以复制出来!
perl-5.16.3-286.el7.x86_64.rpm
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
[root@rhel6-64-2 download]#  wget 
[root@rhel6-64-2 download]# wget -P /root/download http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
--2016-09-12 22:05:06--  http://dl.fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.26, 209.132.181.23, 209.132.181.24, ...
Connecting to dl.fedoraproject.org|209.132.181.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21740 (21K) [application/x-rpm]
Saving to: ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?

100%[================>] 21,740      4.13K/s   in 5.1s    

2016-09-12 22:05:15 (4.13 KB/s) - ?.root/download/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm?.saved [21740/21740]

[root@rhel6-64-2download]#http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sender-0.8.16-3.el6.noarch.rpm 
--2016-09-12 22:34:57--  http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.26, 209.132.181.27, 209.132.181.23, ...
Connecting to dl.fedoraproject.org|209.132.181.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 55224 (54K) [application/x-rpm]
Saving to: ?.erl-Mail-Sender-0.8.16-3.el6.noarch.rpm?

100%[================>] 55,224      9.12K/s   in 11s     

2016-09-12 22:35:09 (4.87 KB/s) - ?.erl-Mail-Sender-0.8.16-3.el6.noarch.rpm?.saved [55224/55224] 
[root@rhel6-64-2 download]# wget http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
--2016-09-12 22:38:53--  http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
Resolving dl.fedoraproject.org... 209.132.181.24, 209.132.181.25, 209.132.181.26, ...
Connecting to dl.fedoraproject.org|209.132.181.24|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28448 (28K) [application/x-rpm]
Saving to: ?.erl-Mail-Sendmail-0.79-12.el6.noarch.rpm?

100%[================>] 28,448      6.61K/s   in 4.2s    

2016-09-12 22:39:03 (6.61 KB/s) - ?.erl-Mail-Sendmail-0.79-12.el6.noarch.rpm?.saved [28448/28448]

[root@rhel6-64-2 download]# wget http://mirror.centos.org/centos/6/os/x86_64/Packages/perl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm
--2016-09-12 22:41:37--  http://mirror.centos.org/centos/6/os/x86_64/Packages/perl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm
Resolving mirror.centos.org... 103.27.60.52, 2605:9000:401:102::2
Connecting to mirror.centos.org|103.27.60.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50352 (49K) [application/x-rpm]
Saving to: ?.erl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm?

100%[================>] 50,352       216K/s   in 0.2s    

2016-09-12 22:41:43 (216 KB/s) - ?.erl-Time-HiRes-1.9721-141.el6_7.1.x86_64.rpm?.saved [50352/50352]

下载完毕,对比下下载的包:

全部安装比较不容易:

 

总结下需要依赖安装4个包:

perl-4:5.16.3-286.el7.x86_64
perl-Log-Dispatch-2.27-1.el6.noarch
perl-Time-HiRes-4:1.9721-141.el6_7.1.x86_64

这里我用了挺暴力的方式同时进行的安装,不是特别推荐以后有好的方法,我再回来修改!

即便是manager的包,也同样用—nodeps的参数:

[root@rhel6-64-2 ~]# rpm -qa|grep mha4mysql-
mha4mysql-manager-0.53-0.el6.noarch
mha4mysql-node-0.53-0.el6.noarch

安装完成。

这里没有使用tar包安装,因为之前用tar包进行编译安装,发现测试命令无法执行,不知道哪里出了问题,有机会请教下大神,昨天这里耽误了2个多小时!

4、配置

配置mha配置文件:

[root@rhel6-64-2 ~]# mkdir -p /etc/masterha/app1 
[root@rhel6-64-2 ~]# mkdir -p /var/lib/mysql/
[root@rhel6-64-2 ~]# touch /etc/masterha/app1/manager.log
[root@rhel6-64-2 ~]# vi /etc/masterha/app1.cnf
[root@rhel6-64-2 ~]# cat /etc/masterha/app1.cnf
[server default]
manager_workdir=/etc/masterha/app1
manager_log=/etc/masterha/app1/manager.log
 #日志文件及其路径,需要手动创建
user=gongzi
password=123456
#之前数据库远程登录的

ssh_user=root
#用于测试ssh登录的

repl_user=qianmen
repl_password=123456
 #千门做主从的用户  这个也是每一台都要授权的
 
[server1]
hostname=172.28.7.203
port=3306
master_binlog_dir=/var/lib/mysql
candidate_master=1
check_repl_delay=0
 #/var/lib/mysql这里路径也需要手工创建
[server2]
hostname=172.28.7.204
port=3306
master_binlog_dir=/var/lib/mysql
candidate_master=1  
check_repl_delay=0
 
[server3]
hostname=172.28.7.205
port=3306
master_binlog_dir=/var/lib/mysql
ignore_fail=1  
no_master=1  

四、MHA测试

1、测试SSH

[root@rhel6-64-2 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Fri Sep  9 23:05:39 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep  9 23:05:39 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:05:39 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:05:39 2016 - [info] Starting SSH connection tests..
Fri Sep  9 23:05:41 2016 - [debug] 
Fri Sep  9 23:05:39 2016 - [debug]  Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep  9 23:05:40 2016 - [debug]   ok.
Fri Sep  9 23:05:40 2016 - [debug]  Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep  9 23:05:41 2016 - [debug]   ok.
Fri Sep  9 23:05:42 2016 - [debug] 
Fri Sep  9 23:05:39 2016 - [debug]  Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep  9 23:05:40 2016 - [debug]   ok.
Fri Sep  9 23:05:40 2016 - [debug]  Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep  9 23:05:42 2016 - [debug]   ok.
Fri Sep  9 23:05:42 2016 - [debug] 
Fri Sep  9 23:05:40 2016 - [debug]  Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep  9 23:05:41 2016 - [debug]   ok.
Fri Sep  9 23:05:41 2016 - [debug]  Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep  9 23:05:42 2016 - [debug]   ok.
Fri Sep  9 23:05:42 2016 - [info] All SSH connection tests passed successfully.
这里如果出现问题,一定需要反复检查4台主机之间是否都具备免密码登录的条件。

2、测试repl

[root@rhel6-64-2 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Fri Sep  9 22:39:56 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep  9 22:39:56 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep  9 22:39:56 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep  9 22:39:56 2016 - [info] MHA::MasterMonitor version 0.53.
Fri Sep  9 22:39:57 2016 - [info] Dead Servers:
Fri Sep  9 22:39:57 2016 - [info] Alive Servers:
Fri Sep  9 22:39:57 2016 - [info]   172.28.7.203(172.28.7.203:3306)
Fri Sep  9 22:39:57 2016 - [info]   172.28.7.204(172.28.7.204:3306)
Fri Sep  9 22:39:57 2016 - [info]   172.28.7.205(172.28.7.205:3306)
Fri Sep  9 22:39:57 2016 - [info] Alive Slaves:
Fri Sep  9 22:39:57 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 22:39:57 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 22:39:57 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 22:39:57 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 22:39:57 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 22:39:57 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 22:39:57 2016 - [info] Current Alive Master: 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 22:39:57 2016 - [info] Checking slave configurations..
Fri Sep  9 22:39:57 2016 - [info]  read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 22:39:57 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 22:39:57 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep  9 22:39:57 2016 - [info] Checking replication filtering settings..
Fri Sep  9 22:39:57 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Fri Sep  9 22:39:57 2016 - [info]  Replication filtering check ok.
Fri Sep  9 22:39:57 2016 - [info] Starting SSH connection tests..
Fri Sep  9 22:40:01 2016 - [info] All SSH connection tests passed successfully.
Fri Sep  9 22:40:01 2016 - [info] Checking MHA Node version..
Fri Sep  9 22:40:02 2016 - [info]  Version check ok.
Fri Sep  9 22:40:02 2016 - [info] Checking SSH publickey authentication settings on the current master..
Fri Sep  9 22:40:03 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep  9 22:40:03 2016 - [info] Master MHA Node version is 0.53.
Fri Sep  9 22:40:03 2016 - [info] Checking recovery script configurations on the current master..
Fri Sep  9 22:40:03 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000013 
Fri Sep  9 22:40:03 2016 - [info]   Connecting to root@172.28.7.203(172.28.7.203).. 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Sep  9 22:40:04 2016 - [info] Master setting check done.
Fri Sep  9 22:40:04 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Sep  9 22:40:04 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Sep  9 22:40:04 2016 - [info]   Connecting to root@172.28.7.204(172.28.7.204:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Sep  9 22:40:05 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Sep  9 22:40:05 2016 - [info]   Connecting to root@172.28.7.205(172.28.7.205:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Sep  9 22:40:05 2016 - [info] Slaves settings check done.
Fri Sep  9 22:40:05 2016 - [info] 
172.28.7.203 (current master)
 +--172.28.7.204
 +--172.28.7.205

Fri Sep  9 22:40:05 2016 - [info] Checking replication health on 172.28.7.204..
Fri Sep  9 22:40:05 2016 - [info]  ok.
Fri Sep  9 22:40:05 2016 - [info] Checking replication health on 172.28.7.205..
Fri Sep  9 22:40:05 2016 - [info]  ok.
Fri Sep  9 22:40:05 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep  9 22:40:05 2016 - [warning] shutdown_script is not defined.
Fri Sep  9 22:40:05 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
[root@rhel6-64-2 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Fri Sep  9 23:05:39 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep  9 23:05:39 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:05:39 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:05:39 2016 - [info] Starting SSH connection tests..
Fri Sep  9 23:05:41 2016 - [debug] 
Fri Sep  9 23:05:39 2016 - [debug]  Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep  9 23:05:40 2016 - [debug]   ok.
Fri Sep  9 23:05:40 2016 - [debug]  Connecting via SSH from root@172.28.7.203(172.28.7.203:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep  9 23:05:41 2016 - [debug]   ok.
Fri Sep  9 23:05:42 2016 - [debug] 
Fri Sep  9 23:05:39 2016 - [debug]  Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep  9 23:05:40 2016 - [debug]   ok.
Fri Sep  9 23:05:40 2016 - [debug]  Connecting via SSH from root@172.28.7.204(172.28.7.204:22) to root@172.28.7.205(172.28.7.205:22)..
Fri Sep  9 23:05:42 2016 - [debug]   ok.
Fri Sep  9 23:05:42 2016 - [debug] 
Fri Sep  9 23:05:40 2016 - [debug]  Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.203(172.28.7.203:22)..
Fri Sep  9 23:05:41 2016 - [debug]   ok.
Fri Sep  9 23:05:41 2016 - [debug]  Connecting via SSH from root@172.28.7.205(172.28.7.205:22) to root@172.28.7.204(172.28.7.204:22)..
Fri Sep  9 23:05:42 2016 - [debug]   ok.
Fri Sep  9 23:05:42 2016 - [info] All SSH connection tests passed successfully.
[root@rhel6-64-2 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Fri Sep  9 23:07:54 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep  9 23:07:54 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:07:54 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:07:54 2016 - [info] MHA::MasterMonitor version 0.53.
Fri Sep  9 23:07:55 2016 - [info] Dead Servers:
Fri Sep  9 23:07:55 2016 - [info] Alive Servers:
Fri Sep  9 23:07:55 2016 - [info]   172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:07:55 2016 - [info]   172.28.7.204(172.28.7.204:3306)
Fri Sep  9 23:07:55 2016 - [info]   172.28.7.205(172.28.7.205:3306)
Fri Sep  9 23:07:55 2016 - [info] Alive Slaves:
Fri Sep  9 23:07:55 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:07:55 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:07:55 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 23:07:55 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:07:55 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:07:55 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 23:07:55 2016 - [info] Current Alive Master: 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:07:55 2016 - [info] Checking slave configurations..
Fri Sep  9 23:07:55 2016 - [info]  read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 23:07:55 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 23:07:55 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep  9 23:07:55 2016 - [info] Checking replication filtering settings..
Fri Sep  9 23:07:55 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Fri Sep  9 23:07:55 2016 - [info]  Replication filtering check ok.
Fri Sep  9 23:07:55 2016 - [info] Starting SSH connection tests..
Fri Sep  9 23:07:59 2016 - [info] All SSH connection tests passed successfully.
Fri Sep  9 23:07:59 2016 - [info] Checking MHA Node version..
Fri Sep  9 23:08:00 2016 - [info]  Version check ok.
Fri Sep  9 23:08:00 2016 - [info] Checking SSH publickey authentication settings on the current master..
Fri Sep  9 23:08:01 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep  9 23:08:01 2016 - [info] Master MHA Node version is 0.53.
Fri Sep  9 23:08:01 2016 - [info] Checking recovery script configurations on the current master..
Fri Sep  9 23:08:01 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000013 
Fri Sep  9 23:08:01 2016 - [info]   Connecting to root@172.28.7.203(172.28.7.203).. 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Sep  9 23:08:02 2016 - [info] Master setting check done.
Fri Sep  9 23:08:02 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Sep  9 23:08:02 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Sep  9 23:08:02 2016 - [info]   Connecting to root@172.28.7.204(172.28.7.204:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Sep  9 23:08:03 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Sep  9 23:08:03 2016 - [info]   Connecting to root@172.28.7.205(172.28.7.205:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Sep  9 23:08:03 2016 - [info] Slaves settings check done.
Fri Sep  9 23:08:03 2016 - [info] 
172.28.7.203 (current master)
 +--172.28.7.204
 +--172.28.7.205

Fri Sep  9 23:08:03 2016 - [info] Checking replication health on 172.28.7.204..
Fri Sep  9 23:08:03 2016 - [info]  ok.
Fri Sep  9 23:08:03 2016 - [info] Checking replication health on 172.28.7.205..
Fri Sep  9 23:08:03 2016 - [info]  ok.
Fri Sep  9 23:08:03 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep  9 23:08:03 2016 - [warning] shutdown_script is not defined.
Fri Sep  9 23:08:03 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

这里第二次、第三次、第四次…..实验的时候一直在报错,耽误了差不多20多个小时,百度关键字前3页的网站基本都看了,也没能解决,后来翻墙去谷歌查官方文档才解决!

原来数据库主从有记录,需要reset slave一下,因为报错信息一直给的主服务器有问题,就没关注从服务器!报错信息就不在这里写了!回头写在总结中!再更新几个常见的问题,希望大家能够避免!

3、数据库宕机测试

1)启动MHA

重新开一个窗口,用于监控日志,因为这个是打开前台的进程,运行的时候光标会卡住:

[root@rhel6-64-2 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1&

2)监控日志

打开监控日志的窗口,查看:

[root@rhel6-64-2 app1]# tail -f /etc/masterha/app1/manager.log
	
Fri Sep  9 23:23:52 2016 - [info] MHA::MasterMonitor version 0.53.
Fri Sep  9 23:23:53 2016 - [info] Dead Servers:
Fri Sep  9 23:23:53 2016 - [info] Alive Servers:
Fri Sep  9 23:23:53 2016 - [info]   172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:23:53 2016 - [info]   172.28.7.204(172.28.7.204:3306)
Fri Sep  9 23:23:53 2016 - [info]   172.28.7.205(172.28.7.205:3306)
Fri Sep  9 23:23:53 2016 - [info] Alive Slaves:
Fri Sep  9 23:23:53 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:23:53 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:23:53 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 23:23:53 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:23:53 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:23:53 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 23:23:53 2016 - [info] Current Alive Master: 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:23:53 2016 - [info] Checking slave configurations..
Fri Sep  9 23:23:53 2016 - [info]  read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 23:23:53 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 23:23:53 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep  9 23:23:53 2016 - [info] Checking replication filtering settings..
Fri Sep  9 23:23:53 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Fri Sep  9 23:23:53 2016 - [info]  Replication filtering check ok.
Fri Sep  9 23:23:53 2016 - [info] Starting SSH connection tests..
Fri Sep  9 23:23:57 2016 - [info] All SSH connection tests passed successfully.
Fri Sep  9 23:23:57 2016 - [info] Checking MHA Node version..
Fri Sep  9 23:23:58 2016 - [info]  Version check ok.
Fri Sep  9 23:23:58 2016 - [info] Checking SSH publickey authentication settings on the current master..
Fri Sep  9 23:23:58 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep  9 23:23:59 2016 - [info] Master MHA Node version is 0.53.
Fri Sep  9 23:23:59 2016 - [info] Checking recovery script configurations on the current master..
Fri Sep  9 23:23:59 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000013 
Fri Sep  9 23:23:59 2016 - [info]   Connecting to root@172.28.7.203(172.28.7.203).. 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Sep  9 23:24:00 2016 - [info] Master setting check done.
Fri Sep  9 23:24:00 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Sep  9 23:24:00 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Sep  9 23:24:00 2016 - [info]   Connecting to root@172.28.7.204(172.28.7.204:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Sep  9 23:24:00 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205 --slave_port=3306 --workdir=/var/tmp --target_version=5.1.66-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Sep  9 23:24:00 2016 - [info]   Connecting to root@172.28.7.205(172.28.7.205:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Sep  9 23:24:01 2016 - [info] Slaves settings check done.
Fri Sep  9 23:24:01 2016 - [info] 
172.28.7.203 (current master)
 +--172.28.7.204
 +--172.28.7.205

Fri Sep  9 23:24:01 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep  9 23:24:01 2016 - [warning] shutdown_script is not defined.
Fri Sep  9 23:24:01 2016 - [info] Set master ping interval 3 seconds.
Fri Sep  9 23:24:01 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Sep  9 23:24:01 2016 - [info] Starting ping health check on 172.28.7.203(172.28.7.203:3306)..
Fri Sep  9 23:24:01 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

 一切准备就绪,就等着MYSQL无响应呢。

3)主服务器宕机

手动暂停主数据库服务器:

[root@rhel6-64-3 ~]# service mysqld stop
Stopping mysqld:                                           [  OK  ]

4)查看日志

Fri Sep  9 23:24:01 2016 - [warning] master_ip_failover_script is not defined.
Fri Sep  9 23:24:01 2016 - [warning] shutdown_script is not defined.
Fri Sep  9 23:24:01 2016 - [info] Set master ping interval 3 seconds.
Fri Sep  9 23:24:01 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Sep  9 23:24:01 2016 - [info] Starting ping health check on 172.28.7.203(172.28.7.203:3306)..
Fri Sep  9 23:24:01 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Sep  9 23:37:37 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Fri Sep  9 23:37:37 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql-bin
Fri Sep  9 23:37:38 2016 - [info] HealthCheck: SSH to 172.28.7.203 is reachable.
Fri Sep  9 23:37:40 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Sep  9 23:37:40 2016 - [warning] Connection failed 1 time(s)..
Fri Sep  9 23:37:43 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Sep  9 23:37:43 2016 - [warning] Connection failed 2 time(s)..
Fri Sep  9 23:37:46 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Sep  9 23:37:46 2016 - [warning] Connection failed 3 time(s)..
Fri Sep  9 23:37:46 2016 - [warning] Master is not reachable from health checker!
Fri Sep  9 23:37:46 2016 - [warning] Master 172.28.7.203(172.28.7.203:3306) is not reachable!
Fri Sep  9 23:37:46 2016 - [warning] SSH is reachable.
Fri Sep  9 23:37:46 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Fri Sep  9 23:37:46 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep  9 23:37:46 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:37:46 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Sep  9 23:37:47 2016 - [info] Dead Servers:
Fri Sep  9 23:37:47 2016 - [info]   172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:47 2016 - [info] Alive Servers:
Fri Sep  9 23:37:47 2016 - [info]   172.28.7.204(172.28.7.204:3306)
Fri Sep  9 23:37:47 2016 - [info]   172.28.7.205(172.28.7.205:3306)
Fri Sep  9 23:37:47 2016 - [info] Alive Slaves:
Fri Sep  9 23:37:47 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:47 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:47 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 23:37:47 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:47 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:47 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 23:37:47 2016 - [info] Checking slave configurations..
Fri Sep  9 23:37:47 2016 - [info]  read_only=1 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 23:37:47 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.204(172.28.7.204:3306).
Fri Sep  9 23:37:47 2016 - [warning]  relay_log_purge=0 is not set on slave 172.28.7.205(172.28.7.205:3306).
Fri Sep  9 23:37:47 2016 - [info] Checking replication filtering settings..
Fri Sep  9 23:37:47 2016 - [info]  Replication filtering check ok.
Fri Sep  9 23:37:47 2016 - [info] Master is down!
Fri Sep  9 23:37:47 2016 - [info] Terminating monitoring script.
Fri Sep  9 23:37:47 2016 - [info] Got exit code 20 (Master dead).
Fri Sep  9 23:37:47 2016 - [info] MHA::MasterFailover version 0.53.
Fri Sep  9 23:37:47 2016 - [info] Starting master failover.
Fri Sep  9 23:37:47 2016 - [info] 
Fri Sep  9 23:37:47 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Sep  9 23:37:47 2016 - [info] 
Fri Sep  9 23:37:48 2016 - [info] Dead Servers:
Fri Sep  9 23:37:48 2016 - [info]   172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:48 2016 - [info] Checking master reachability via mysql(double check)..
Fri Sep  9 23:37:48 2016 - [info]  ok.
Fri Sep  9 23:37:48 2016 - [info] Alive Servers:
Fri Sep  9 23:37:48 2016 - [info]   172.28.7.204(172.28.7.204:3306)
Fri Sep  9 23:37:48 2016 - [info]   172.28.7.205(172.28.7.205:3306)
Fri Sep  9 23:37:48 2016 - [info] Alive Slaves:
Fri Sep  9 23:37:48 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:48 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:48 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 23:37:48 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:48 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:48 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 23:37:48 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Sep  9 23:37:48 2016 - [info] 
Fri Sep  9 23:37:48 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Sep  9 23:37:48 2016 - [info] 
Fri Sep  9 23:37:49 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Sep  9 23:37:49 2016 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Fri Sep  9 23:37:49 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Sep  9 23:37:50 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Sep  9 23:37:50 2016 - [info] 
Fri Sep  9 23:37:50 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Sep  9 23:37:50 2016 - [info] 
Fri Sep  9 23:37:50 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Sep  9 23:37:50 2016 - [info] 
Fri Sep  9 23:37:50 2016 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:106
Fri Sep  9 23:37:50 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Sep  9 23:37:50 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:50 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:50 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 23:37:50 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:50 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:50 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 23:37:50 2016 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:106
Fri Sep  9 23:37:50 2016 - [info] Oldest slaves:
Fri Sep  9 23:37:50 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:50 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:50 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 23:37:50 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:50 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:50 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 23:37:50 2016 - [info] 
Fri Sep  9 23:37:50 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Sep  9 23:37:50 2016 - [info] 
Fri Sep  9 23:37:50 2016 - [info] Fetching dead master's binary logs..
Fri Sep  9 23:37:50 2016 - [info] Executing command on the dead master 172.28.7.203(172.28.7.203:3306): save_binary_logs --command=save --start_file=mysql-bin.000013  --start_pos=106 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
  Creating /var/tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000013 pos 106 to mysql-bin.000013 EOF into /var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog ..
  Dumping binlog format description event, from position 0 to 106.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysql-bin.000013 position 106 to tail(125).. ok.
 Concat succeeded.
Fri Sep  9 23:37:52 2016 - [info] scp from root@172.28.7.203:/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog to local:/etc/masterha/app1/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog succeeded.
Fri Sep  9 23:37:52 2016 - [info] HealthCheck: SSH to 172.28.7.204 is reachable.
Fri Sep  9 23:37:53 2016 - [info] HealthCheck: SSH to 172.28.7.205 is reachable.
Fri Sep  9 23:37:54 2016 - [info] 
Fri Sep  9 23:37:54 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Sep  9 23:37:54 2016 - [info] 
Fri Sep  9 23:37:54 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Sep  9 23:37:54 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Sep  9 23:37:54 2016 - [info] Searching new master from slaves..
Fri Sep  9 23:37:54 2016 - [info]  Candidate masters from the configuration file:
Fri Sep  9 23:37:54 2016 - [info]   172.28.7.204(172.28.7.204:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:54 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:54 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Sep  9 23:37:54 2016 - [info]  Non-candidate masters:
Fri Sep  9 23:37:54 2016 - [info]   172.28.7.205(172.28.7.205:3306)  Version=5.1.66-log (oldest major version between slaves) log-bin:enabled
Fri Sep  9 23:37:54 2016 - [info]     Replicating from 172.28.7.203(172.28.7.203:3306)
Fri Sep  9 23:37:54 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Sep  9 23:37:54 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Sep  9 23:37:54 2016 - [info] New master is 172.28.7.204(172.28.7.204:3306)
Fri Sep  9 23:37:54 2016 - [info] Starting master failover..
Fri Sep  9 23:37:54 2016 - [info] 
From:
172.28.7.203 (current master)
 +--172.28.7.204
 +--172.28.7.205

To:
172.28.7.204 (new master)
 +--172.28.7.205
Fri Sep  9 23:37:54 2016 - [info] 
Fri Sep  9 23:37:54 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Sep  9 23:37:54 2016 - [info] 
Fri Sep  9 23:37:54 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Sep  9 23:37:54 2016 - [info] Sending binlog..
Fri Sep  9 23:37:55 2016 - [info] scp from local:/etc/masterha/app1/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog to root@172.28.7.204:/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog succeeded.
Fri Sep  9 23:37:55 2016 - [info] 
Fri Sep  9 23:37:55 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Sep  9 23:37:55 2016 - [info] 
Fri Sep  9 23:37:55 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Sep  9 23:37:55 2016 - [info] Starting recovery on 172.28.7.204(172.28.7.204:3306)..
Fri Sep  9 23:37:55 2016 - [info]  Generating diffs succeeded.
Fri Sep  9 23:37:55 2016 - [info] Waiting until all relay logs are applied.
Fri Sep  9 23:37:55 2016 - [info]  done.
Fri Sep  9 23:37:55 2016 - [info] Getting slave status..
Fri Sep  9 23:37:55 2016 - [info] This slave(172.28.7.204)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:106). No need to recover from Exec_Master_Log_Pos.
Fri Sep  9 23:37:55 2016 - [info] Connecting to the target slave host 172.28.7.204, running recover script..
Fri Sep  9 23:37:55 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=gongzi --slave_host=172.28.7.204 --slave_ip=172.28.7.204  --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog --workdir=/var/tmp --target_version=5.1.66-log --timestamp=20160909233747 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Fri Sep  9 23:37:55 2016 - [info] 
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog on 172.28.7.204:3306. This may take long time...
Applying log files succeeded.
Fri Sep  9 23:37:55 2016 - [info]  All relay logs were successfully applied.
Fri Sep  9 23:37:55 2016 - [info] Getting new master's binlog name and position..
Fri Sep  9 23:37:55 2016 - [info]  binlog.000007:106
Fri Sep  9 23:37:55 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.28.7.204', MASTER_PORT=3306, MASTER_LOG_FILE='binlog.000007', MASTER_LOG_POS=106, MASTER_USER='qianmen', MASTER_PASSWORD='xxx';
Fri Sep  9 23:37:55 2016 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Fri Sep  9 23:37:55 2016 - [info] ** Finished master recovery successfully.
Fri Sep  9 23:37:55 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Sep  9 23:37:55 2016 - [info] 
Fri Sep  9 23:37:55 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Sep  9 23:37:55 2016 - [info] 
Fri Sep  9 23:37:55 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Sep  9 23:37:55 2016 - [info] 
Fri Sep  9 23:37:56 2016 - [info] -- Slave diff file generation on host 172.28.7.205(172.28.7.205:3306) started, pid: 50129. Check tmp log /etc/masterha/app1/172.28.7.205_3306_20160909233747.log if it takes time..
Fri Sep  9 23:37:57 2016 - [info] 
Fri Sep  9 23:37:57 2016 - [info] Log messages from 172.28.7.205 ...
Fri Sep  9 23:37:57 2016 - [info] 
Fri Sep  9 23:37:56 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Sep  9 23:37:57 2016 - [info] End of log messages from 172.28.7.205.
Fri Sep  9 23:37:57 2016 - [info] -- 172.28.7.205(172.28.7.205:3306) has the latest relay log events.
Fri Sep  9 23:37:57 2016 - [info] Generating relay diff files from the latest slave succeeded.
Fri Sep  9 23:37:57 2016 - [info] 
Fri Sep  9 23:37:57 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Sep  9 23:37:57 2016 - [info] 
Fri Sep  9 23:37:57 2016 - [info] -- Slave recovery on host 172.28.7.205(172.28.7.205:3306) started, pid: 50131. Check tmp log /etc/masterha/app1/172.28.7.205_3306_20160909233747.log if it takes time..
Fri Sep  9 23:37:59 2016 - [info] 
Fri Sep  9 23:37:59 2016 - [info] Log messages from 172.28.7.205 ...
Fri Sep  9 23:37:59 2016 - [info] 
Fri Sep  9 23:37:57 2016 - [info] Sending binlog..
Fri Sep  9 23:37:58 2016 - [info] scp from local:/etc/masterha/app1/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog to root@172.28.7.205:/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog succeeded.
Fri Sep  9 23:37:58 2016 - [info] Starting recovery on 172.28.7.205(172.28.7.205:3306)..
Fri Sep  9 23:37:58 2016 - [info]  Generating diffs succeeded.
Fri Sep  9 23:37:58 2016 - [info] Waiting until all relay logs are applied.
Fri Sep  9 23:37:58 2016 - [info]  done.
Fri Sep  9 23:37:58 2016 - [info] Getting slave status..
Fri Sep  9 23:37:58 2016 - [info] This slave(172.28.7.205)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:106). No need to recover from Exec_Master_Log_Pos.
Fri Sep  9 23:37:58 2016 - [info] Connecting to the target slave host 172.28.7.205, running recover script..
Fri Sep  9 23:37:58 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=gongzi --slave_host=172.28.7.205 --slave_ip=172.28.7.205  --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog --workdir=/var/tmp --target_version=5.1.66-log --timestamp=20160909233747 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Fri Sep  9 23:37:58 2016 - [info] 
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_172.28.7.203_3306_20160909233747.binlog on 172.28.7.205:3306. This may take long time...
Applying log files succeeded.
Fri Sep  9 23:37:58 2016 - [info]  All relay logs were successfully applied.
Fri Sep  9 23:37:58 2016 - [info]  Resetting slave 172.28.7.205(172.28.7.205:3306) and starting replication from the new master 172.28.7.204(172.28.7.204:3306)..
Fri Sep  9 23:37:58 2016 - [info]  Executed CHANGE MASTER.
Fri Sep  9 23:37:59 2016 - [info]  Slave started.
Fri Sep  9 23:37:59 2016 - [info] End of log messages from 172.28.7.205.
Fri Sep  9 23:37:59 2016 - [info] -- Slave recovery on host 172.28.7.205(172.28.7.205:3306) succeeded.
Fri Sep  9 23:37:59 2016 - [info] All new slave servers recovered successfully.
Fri Sep  9 23:37:59 2016 - [info] 
Fri Sep  9 23:37:59 2016 - [info] * Phase 5: New master cleanup phease..
Fri Sep  9 23:37:59 2016 - [info] 
Fri Sep  9 23:37:59 2016 - [info] Resetting slave info on the new master..
Fri Sep  9 23:37:59 2016 - [info]  172.28.7.204: Resetting slave info succeeded.
Fri Sep  9 23:37:59 2016 - [info] Master failover to 172.28.7.204(172.28.7.204:3306) completed successfully.
Fri Sep  9 23:37:59 2016 - [info] 

----- Failover Report -----

app1: MySQL Master failover 172.28.7.203 to 172.28.7.204 succeeded

Master 172.28.7.203 is down!

Check MHA Manager logs at rhel6-64-2:/etc/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 172.28.7.204(172.28.7.204:3306) has all relay logs for recovery.
Selected 172.28.7.204 as a new master.
172.28.7.204: OK: Applying all logs succeeded.
172.28.7.205: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.28.7.205: OK: Applying all logs succeeded. Slave started, replicating from 172.28.7.204.
172.28.7.204: Resetting slave info succeeded.
Master failover to 172.28.7.204(172.28.7.204:3306) completed successfully.

再登录slave2服务器,看slave 的状态:

[root@rhel6-64-5 ~]# mysql
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.28.7.204
                  Master_User: qianmen
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000007
          Read_Master_Log_Pos: 106
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 248
        Relay_Master_Log_File: binlog.000007
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 106
              Relay_Log_Space: 403
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 60
           Master_SSL_CA_Path: 0
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

测试成功,如黄底红字标示所示,原203主机MySQL连接失败,204主机当选新的master,205主机记录最近的日志,大功告成,打完收工。

五、MHA维护与问题汇总

1、MHA日常维护命令

1)查看ssh登陆是否成功

masterha_check_ssh --conf=/etc/masterha/app1.cnf

2)查看复制是否建立好

masterha_check_repl --conf=/etc/masterha/app1.cnf

3)启动mha

nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log  < /dev/null 2>&1 & 

当有slave节点宕掉的情况是启动不了的,加上--ignore_fail_on_start即使有节点宕掉也能启动mha

nohup masterha_manager --conf=/etc/masterha/app1.cnf  --ignore_fail_on_start > /tmp/mha_manager.log  < /dev/null 2>&1 & 

4)检查启动的状态

masterha_check_status --conf=/etc/masterha/app1.cnf

5)停止mha

masterha_stop  --conf=/etc/masterha/app1.cnf

6)failover后下次重启

每次failover切换后会在管理目录生成文件app1.failover.complete ,下次在切换的时候会发现有这个文件导致切换不成功,需要手动清理掉。

rm -rf /masterha/app1/app1.failover.complete

也可以加上参数--ignore_last_failover。

7)手工failover
手工failover场景,master死掉,但是masterha_manager没有开启,可以通过手工failover:

masterha_master_switch --conf=/etc/masterha/app1.cnf --dead_master_host=10.50.2.10 --master_state=dead --new_master_host=10.50.2.12 --ignore_last_failover

8)masterha_manager是一种监视和故障转移的程序。

另一方面,masterha_master_switch程序不监控主库。 masterha_master_switch可以用于主库故障转移,也可用于在线总开关。

9)手动在线切换

masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=192.168.119.74 --orig_master_is_new_slave

或者:

masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=192.168.119.74 --orig_master_is_new_slave --running_updates_limit=10000

--orig_master_is_new_slave切换时加上此参数是将原master变为slave节点,如果不加此参数,原来的master将不启动。

--running_updates_limit=10000 切换时候选master如果有延迟的话,mha切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover时relay日志的大小决定。

手动在线切换mha,切换时需要将在运行的mha停掉后才能切换。

在备库先执行DDL,一般先stop slave,一般不记录mysql日志,可以通过set SQL_LOG_BIN = 0实现。然后进行一次主备切换操作,再在原来的主库上执行DDL。这种方法适用于增减索引,如果是增加字段就需要额外注意。

可以通过如下命令停止mha:

masterha_stop --conf=/etc/app1.cnf

2、遇到的问题

1)yum search perl-Log-Dispatch 找不到

解决:

yum -y install perl-Log*

2)yum search parallel 找不到

解决:网上下载了Log-Dispatchouli-master.zip 安装后再yum安装parallel。

3)健康检查Not OK

解决: 因为172.28.7.203/204/205曾设置过主从,master为因为172.28.7.203。 

在slave1/2上show slave status一直有信息,需要reset slave。

最后如果有问题,还得访问官网!https://code.google.com/p/mysql-master-ha/issues/list(需要梯子)。

4) 安装epel失败

Error: Cannot retrieve repository metadata (repomd.xml) for repository: epel. Please verify its path and try again
I tried installing EPEL on a fresh install of RHEL6, and after adding the repo, yum fails with the above error. I have RHEL6.1 (Santiago) and get the above error.
This happens because the RHEL/CentOS installation doesn’t trust the HTTPS certificate used by mirrors.fedoraproject.org, that is issued by “GeoTrust SSL CA“.
In my case the package ca-certificates was not installed and the/etc/pki/tls/certs/ folder didn’t contain any ca-bundle.crt or ca-bundle.trust.crt !
Solution: yum install ca-certificates
(I had to temporarily rpm –erase epel-release first, to get yum working again)
I once got the same error message eventhout ca-certificates was installed and up to date. Then it was a firewall blocking https (port 443) traffic.
I worked around that by changing from https to http in/etc/yum.repos.d/epel.repo

简单点说 就是https是443端口,你并没有把该网站的CA证书添加进入,所以要改成80端口,虽然不那么安全。

相关文章:

  • Python进阶(三)-图形界面编程Tkinter(3)
  • Rethinking Image Aesthetics Assessment:Models,Datasets and Benchmarks
  • 人工智能时代的离散数学教学研究
  • frp内网穿透教程2022最新(含内网ssh配置与msf联动配置)
  • TS装饰器
  • PAT 1007 Maximum Subsequence Sum
  • go中的slice
  • 什么是完全的静态分析?
  • 如何在ios手机上使用动态代理?
  • 搭建zookeeper集群
  • React生命周期详解
  • 大数据项目中数据倾斜
  • Kafka Consumer源码讲解
  • svg中 path标签的d属性
  • 什么样的数字IC后端工程师能拿到高薪Offer?
  • 10个确保微服务与容器安全的最佳实践
  • 4. 路由到控制器 - Laravel从零开始教程
  • 8年软件测试工程师感悟——写给还在迷茫中的朋友
  • C++11: atomic 头文件
  • C++回声服务器_9-epoll边缘触发模式版本服务器
  • C语言笔记(第一章:C语言编程)
  • JavaScript实现分页效果
  • LeetCode算法系列_0891_子序列宽度之和
  • Linux快速配置 VIM 实现语法高亮 补全 缩进等功能
  • react-native 安卓真机环境搭建
  • UMLCHINA 首席专家潘加宇鼎力推荐
  • 删除表内多余的重复数据
  • 思维导图—你不知道的JavaScript中卷
  • 通过来模仿稀土掘金个人页面的布局来学习使用CoordinatorLayout
  • 我是如何设计 Upload 上传组件的
  • 用mpvue开发微信小程序
  • ​LeetCode解法汇总1276. 不浪费原料的汉堡制作方案
  • ​一帧图像的Android之旅 :应用的首个绘制请求
  • #DBA杂记1
  • #每日一题合集#牛客JZ23-JZ33
  • $.ajax中的eval及dataType
  • (NO.00004)iOS实现打砖块游戏(十二):伸缩自如,我是如意金箍棒(上)!
  • (附源码)ssm码农论坛 毕业设计 231126
  • (一)ClickHouse 中的 `MaterializedMySQL` 数据库引擎的使用方法、设置、特性和限制。
  • (转) RFS+AutoItLibrary测试web对话框
  • *1 计算机基础和操作系统基础及几大协议
  • .net分布式压力测试工具(Beetle.DT)
  • .NET和.COM和.CN域名区别
  • [20161101]rman备份与数据文件变化7.txt
  • [AIGC] Java 和 Kotlin 的区别
  • [AIGC] SQL中的数据添加和操作:数据类型介绍
  • [AX]AX2012 AIF(四):文档服务应用实例
  • [C++]——带你学习类和对象
  • [CISCN2019 华东南赛区]Web11
  • [DM复习]Apriori算法-国会投票记录关联规则挖掘(上)
  • [Excel VBA]单元格区域引用方式的小结
  • [LeetCode]—Roman to Integer 罗马数字转阿拉伯数字
  • [Linux]知识整理(持续更新)
  • [luoguP2401] 不等数列
  • [SpringBoot系列]缓存解决方案