Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.
A new release of check_mysql_slavestatus, a monitoring plugin to monitor MySQL and MariaDB master-slave replications, is available!
The new version (2020093000) adds support to monitor multi-replications on MySQL using channels by adding a new parameter -C. Monitoring multi-replications on MariaDB was already possible using the -s parameter.
This article is not only used to introduce the new version of the monitoring plugin, but also explain what MySQL multi replication is, what the difference between MariaDB's and MySQL's multi-source replication is and how they can be monitored with check_mysql_slavestatus.
Note: To keep a sane writing, the term "MySQL" here applies to both MySQL and MariaDB, unless otherwise explained.
It is called multi source replication, when the MySQL slave server is a slave of multiple master (source) servers. Imagine you run several applications, each with its own database. For performance or security reasons the databases are running on dedicated MySQL servers. To use a centralized location for backups/dumps, a MySQL slave is configured to receive replications from each master (for the relevant application database). That's just one scenario why such a multi replication would be used.
Multi source replication was added in MariaDB 10.0.1. It uses a "connection_name" as the identifier of the master server.
To prepare the multi replication, all MariaDB servers (masters and slave) need to set a unique server_id in the [mysqld] configuration (on Debian and Ubuntu this is in /etc/mysql/mariadb.conf.d/50-server.cnf). Furthermore all master servers need to enable binary logs by enabling the log_bin parameter. Optionally it is possible to define the database(s) to replicate using the replicate-do-db option. Otherwise all databases will be subject for the replication (watch out for the mysql database though!). Obviously all the servers must be able to communicate with each other on the MariaDB port (default tcp/3306). Make sure MariaDB is listening on the public interface, not just on localhost (127.0.0.1). And last but not least, the a replication user with the necessary privileges must be created on the master servers:
MariaDB [(none)]> CREATE USER 'repl'@'192.168.15.189' IDENTIFIED BY 'repl';
MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.15.189';
To create a new replication connection on the slave server, the command is almost the same as in "classic master-slave replications", however the connection names are added (here app1 and app2):
MariaDB [(none)]> CHANGE MASTER 'app1' TO MASTER_HOST='192.168.15.187', MASTER_USER='repl', MASTER_PASSWORD='repl';
Query OK, 0 rows affected (0.780 sec)
MariaDB [(none)]> CHANGE MASTER 'app2' TO MASTER_HOST='192.168.15.188', MASTER_USER='repl', MASTER_PASSWORD='repl';
Query OK, 0 rows affected (0.626 sec)
Afterwards a single slave or all slaves can be started:
MariaDB [(none)]> START ALL SLAVES;
Query OK, 0 rows affected, 2 warnings (0.083 sec)
The command to see the status of all replications on the slave is:
MariaDB [(none)]> SHOW ALL SLAVES STATUS\G;
By selecting specific fields from the output (here running mysql from Bash), the two replications are listed:
root@slave:~# mysql -e "SHOW ALL SLAVES STATUS\G" | egrep "(Connection_name|Master_Host)"
Connection_name: app1
Master_Host: 192.168.15.187
Connection_name: app2
Master_Host: 192.168.15.188
Now that two replications should be running on the slave server, check_mysql_slavestatus can be used to monitor the replication status. For this, a dedicated MariaDB user should be created with the necessary privileges:
MariaDB [(none)]> CREATE USER 'nagios'@'%' IDENTIFIED BY 'secret';
MariaDB [(none)]> GRANT REPLICATION CLIENT on *.* TO 'nagios'@'%' IDENTIFIED BY 'secret';
Now the monitoring plugin can be run with the -s connection_name parameter:
$ ./check_mysql_slavestatus.sh -H 192.168.15.189 -u nagios -p secret -s app1
CRITICAL: -h 192.168.15.189 Slave_IO_Running: Connecting
$ ./check_mysql_slavestatus.sh -H 192.168.15.189 -u nagios -p secret -s app2
OK: Slave SQL running: Yes Slave IO running: Yes / master: 192.168.15.188 / slave is 0 seconds behind master | delay=0s
Thanks to the monitoring plugin we now know there is a problem in the "app1" replication, so this should be investigated.
Multi-source replication is available since MySQL 5.7.6. The implementation is however different than on MariaDB. Instead of using a connection_name, a communication channel is used. From the 5.7.6 release notes:
MySQL Multi-Source Replication adds the ability to replicate from multiple masters to a slave. MySQL Multi-Source Replication topologies can be used to back up multiple servers to a single server, to merge table shards, and consolidate data from multiple servers to a single server. See MySQL Multi-Source Replication.
As part of MySQL Multi-Source Replication, replication channels have been added. Replication channels enable a slave to open multiple connections to replicate from, with each channel being a connection to a master. To enable selection of particular channels, replication-related SQL statements now support an optional FOR CHANNEL channel clause. See Replication Channels.
According to the description, a communication channel is basically the same as the connection name in MariaDB - but the syntax and therefore the way to manage multi-source replication is different.
To prepare the multi replication, all MySQL servers (masters and slave) need to set a unique server_id in the [mysqld] configuration (on Ubuntu 20.04 with MySQL 8.0 this is in /etc/mysql/mysql.conf.d/mysqld.cnf). Furthermore all master servers need to enable binary logs by enabling the log_bin parameter. Optionally it is possible to define the database(s) to replicate using the replicate-do-db option. Otherwise all databases will be subject for the replication (watch out for the mysql database though!). Obviously all the servers must be able to communicate with each other on the MariaDB port (default tcp/3306). Make sure MariaDB is listening on the public interface, not just on localhost (127.0.0.1). And last but not least, the a replication user with the necessary privileges must be created on the master servers:
mysql> CREATE USER 'repl'@'192.168.15.189' IDENTIFIED WITH mysql_native_password BY 'repl';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.15.189';
Note: Since MySQL 8.0.11 the default authentication plugin changed from mysql_native_password to caching_sha2_password. For easier understanding and better comparison with MariaDB's multi-source replication, I created the replication user with the older authentication method. Otherwise the CHANGE MASTER command further down would require additional arguments using TLS certificates.
Additionally, multi-source replication in MySQL requires TABLE repositories for the connection metadata. The following options should be added in the [mysqld] section on the slave server:
server-id = 3
master_info_repository=TABLE
relay_log_info_repository=TABLE
To set them during runtime and verification, use:
mysql> SET GLOBAL master_info_repository='TABLE';
mysql> SET GLOBAL relay_log_info_repository='TABLE';
mysql> show global variables where variable_name like '%repository%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| master_info_repository | TABLE |
| relay_log_info_repository | TABLE |
+---------------------------+-------+
2 rows in set (0.00 sec)
MySQL's multi-source replication allows two ways of replications: "Classic" binary log replication and GTID based replication. In the next step, we'll go ahead with the classic binary log replication.
Before the master connection can be configured on the slave server, the master(s)'s current log position must be known. To see this, use show master status on the master servers:
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 1109 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
To create a new replication channel on the slave server, the command is almost the same as in "classic master-slave replications", however the channel names are added (here app1 and app2):
mysql> CHANGE MASTER TO MASTER_HOST="192.168.15.187", MASTER_USER="repl", MASTER_PASSWORD="repl", MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=1109 FOR CHANNEL "app1";
Query OK, 0 rows affected, 2 warnings (1.98 sec)
mysql> CHANGE MASTER TO MASTER_HOST="192.168.15.188", MASTER_USER="repl", MASTER_PASSWORD="repl", MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=716 FOR CHANNEL "app2";
Query OK, 0 rows affected, 2 warnings (1.64 sec)
Afterwards all replications need to be started (one by one):
mysql> START SLAVE FOR CHANNEL 'app1';
mysql> START SLAVE FOR CHANNEL 'app2';
The command to see the status of all replications on the slave is:
mysql> SHOW SLAVE STATUS\G;
By selecting specific fields from the output (here running mysql from Bash), the two replications are listed:
root@slave:~# mysql -e "SHOW SLAVE STATUS\G" | egrep "(Master_Host|Channel_Name)"
Master_Host: 192.168.15.187
Channel_Name: app1
Master_Host: 192.168.15.188
Channel_Name: app2
Now that two replications should be running on the slave server, check_mysql_slavestatus can be used to monitor the replication status. For this, a dedicated MariaDB user should be created with the necessary privileges:
mysql> CREATE USER 'nagios'@'%' IDENTIFIED BY 'secret';
mysql> GRANT REPLICATION CLIENT ON *.* TO 'nagios'@'%';
The monitoring plugin should automatically detect, that a multi-source replication is configured on this slave server:
$ ./check_mysql_slavestatus.sh -H 192.168.15.189 -u nagios -p secret
CRITICAL: Multiple master detected, please use the connection or channel parameter.
The monitoring plugin can be run with the -C channel parameter in this case:
$ ./check_mysql_slavestatus.sh -H 192.168.15.189 -u nagios -p secret -C app1
OK: Slave SQL running: Yes Slave IO running: Yes / master: 192.168.15.187 / slave is 0 seconds behind master | delay=0s
$ ./check_mysql_slavestatus.sh -H 192.168.15.189 -u nagios -p secret -C app2
CRITICAL: -h 192.168.15.189 Slave_IO_Running: Connecting
Thanks to the monitoring plugin we now know there is a problem in the "app2" replication, so this should be investigated.