查看mysql主从复制延迟和数据中断

        查看mySQL延迟的方法,查看了多个案例,大家众说纷纭,意见差不多一致。如下也是我参考别人经验做的一些测试,希望能检测到mysql复制延迟、数据中断。

方法一:查看Seconds_Behind_Master

        该参数有如下值:

  • NULL 表示io_thread或sql_thread有一个发生故障,就是说该线程的Running状态时No,而非Yes
  • 0 表示主从复制良好,没有lag存在
  • 正值 表示主从已出现延时,数字越大表示从库落后主库越多
  • 负值 很罕见,是一个BUG,按理说不应该出现

        该方法是使用命令show slave status,通过比较SQL THREAD接受events时间的时间戳与IO THREAD执行事件events时间戳的差值–秒数,来确定slave落后于master多少,如主从时间不同,改时间的计算不受影响。

        众所周知备库relay-log和主库的bin-log里的内容一样,真正和主库有关两的是io_thread,当主库I/O负载很大或网络阻塞时,io_thread不能及时复制binlog,而sql_thread一直能跟上io_thread的脚步,这时seconds_behind_master的值是0,实际上却不是,这时用该值作为延迟参考则不准。

1
change master to master_host='192.168.2.7',master_user='tongbu',master_password='123456',master_log_file='mysql-bin.000008',master_log_pos=291263843;

方法二:使用pt-heartbeat工具

        该工具可以计算出MySQL复制或者是PostgreSQL,它可以更新master或者监控复制。它还可以从f 读取配置。它借助timestmp的比较实现的,首先需要保证主从服务器时间必须要保持一致,通过与相同的一个NTP server同步时钟。它需要在主库上创建一个heartbeat的表,里面的时间戳ts就是当前的时间戳 now(),该结构也会被复制到从库上。表建好以后,会在主库上以后台进程的模式去执行一行更新操作的命令,定期去向表中的插入数据,这 个周期默认为1 秒,同时从库也会在后台执行一个监控命令,与主库保持一致的周期+0.5S(默认0.5S延迟检查)去比较,复制过来记录的ts值与主库上的同一条ts值,差值为0表示无延时,差值越大表示 延时的秒数越多。

使用工具前提:

  1. 在主库上建立heartbeat表
1
pt-heartbeat -h localhost -D test --create-table --update
  1. 更新主库上的heartbeat
1
pt-heartbeat -D test --master-server-id=1 --update
  1. 在从库上监控复制延迟
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
pt-heartbeat --user=tongbu --password='123456' -D test --monitor -h 192.168.2.9 --print-master-server-id
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.01s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1
0.00s [ 0.00s, 0.00s, 0.00s ] 1

        当然还有其他一些操作命令:

        将主库上的update使用守护进程方式调度

1
pt-heartbeat -D test --master-server-id=1 --update --daemonize

        修改主库上的更新时间间隔为2s

1
pt-heartbeat -D test --update --daemonize --interval=2

        修改主库上的pt-heartbeat守护进程

1
2
3
4
pt-heartbeat --stop
Successfully created file /tmp/pt-heartbeat-sentinel
rm -rf /tmp/pt-heartbeat-sentinel

        单词查看从库上的延迟情况

1
2
pt-heartbeat --user=tongbu --password='123456' -D test -h 192.168.2.9 --check
0.00

        使用守护进程监控从库并输出日志

1
pt-heartbeat --user=tongbu --password='123456' -h 192.168.2.9 -D test --master-server-id=1 --monitor --print-master-server-id --daemonize --log=/var/log/pt_slave_lag.log

        如下是脚本的方式,只不过使用脚本的方式实现的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#!/bin/sh
#description:check slave replication delay
PT=`which pt-heartbeat`
ADMIN=`which mysqladmin`
MYSQL=`which mysql`
WARN=10
CRITIC=20
COMMON=3
MASTERID=1
###############
watch_update(){
RE=`ps -ef|grep $PT|grep -v grep|grep "\--update"`
if [ ! -n "$RE" ];then
$PT -D test --master-server-id=$MASTERID --update --daemonize
fi
}
################
watch_mysql(){
SAFE_STATUS=`ps -ef|grep -w mysqld_safe|grep -v grep`
MYSQLD_STATUS=`ps -ef|grep -w mysqld|grep -v grep`
if [ ! -n "$SAFE_STATUS" ];then
echo "Mysqld_safe doesn't running,Please check your mysqld_safe status"
exit 1
fi
if [ ! -n "$MYSQLD_STATUS" ];then
echo "Mysqld program status --stop,Please check your mysqld status "
exit 1
fi
}
####################
watch_mysql_slave(){
REP_STATUS=`$ADMIN processlist|grep "Binlog Dump"`
if [ ! -n "$REP_STATUS" ];then
echo "slave process doesn't running,Please check your replication"
exit 1
fi
}
#################
enter_slave_info(){
echo "please enter your slave username:"
read NAME
if [ -n "$NAME" ];then
echo " you enter username is: $NAME"
fi
echo "please enter your slave user password:"
read PASS
if [ -n "$PASS" ];then
echo "you enter user password is: $PASS"
fi
echo "please enter your slave hostname or address:"
read ADDR
if [ -n "$ADDR" ];then
echo "you enter slave hostname or address is: $ADDR"
fi
}
##################
watch_slave_delay(){
# echo "please enter your watch options(1.check 2.monitor)"
# read OPTION
# if [ $OPTION -eq 1 ];then
SLAVE_DELAY=`$PT --user=$NAME --password="$PASS" -h $ADDR -D test --master-server-id=$MASTERID --check --print-master-server-id`
# elif [ $OPTION -eq 2 ];then
# `$PT --user=$NAME --password="PASS" -h $ADDR -D test --master-server-id=$MASTERID --monitor --print-master-server-id`
# else
# echo "your enter are error,now EXIT"
# exit 1
# fi
if [ ! -n "$SLAVE_DELAY" ];then
echo "Your pt-heartbeat tool must haven't install or you username,password,slave hostname ERROR"
exit 1
else
echo "DELAY TIME: $SLAVE_DELAY"
fi
}
##################################
watch_slave_interrupt(){
INT_RE=`$MYSQL -s -u $NAME -p"$PASS" -h $ADDR -e 'show slave status\G'|grep "Last_Error:"`
if [ -n "$INT_RE" ];then
echo "$INT_RE"
else
echo "INTERUUPT Info: "
echo "$INT_RE"
fi
}
########################
WATCH(){
watch_slave_delay
watch_slave_interrupt
}
#########################
START_WATCH(){
watch_mysql
watch_mysql_slave
watch_update
enter_slave_info
}
##########################
LOOP_WATCH(){
START_WATCH
echo "+++++++++++++ DELAY INFO ++++++++++++++++"
while true;do
WATCH
sleep 30
done
}
##########################
LOOP_WATCH

        脚本运行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@mjx mjx]# ./delay.sh
please enter your slave username:
tongbu
you enter username is: tongbu
please enter your slave user password:
123456
you enter user password is: 123456
please enter your slave hostname or address:
192.168.2.9
you enter slave hostname or address is: 192.168.2.9
+++++++++++++ DELAY INFO ++++++++++++++++
DELAY TIME: 0.36 1
Warning: Using a password on the command line interface can be insecure.
Last_Error:
DELAY TIME: 0.96 1
Warning: Using a password on the command line interface can be insecure.
Last_Error:
DELAY TIME: 0.12 1
Warning: Using a password on the command line interface can be insecure.
Last_Error:
DELAY TIME: 1.91 1
Warning: Using a password on the command line interface can be insecure.
Last_Error:

        但是输出结果没法把报错屏蔽掉。