This morning, my business partner noticed that some of our sites were not updating from our main database. One of our database servers was rebooted last night, so I suspected that the slave was not running. So I logged in to the server and started up mysql client. Then I issued this command:
show slave status\G
The slave was in fact running, but the relay log was corrupted
Part of the output shows this:
Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
And it also says seconds behind master is NULL, which always signals a problem in replication. I ran the command that it told me to after I looked in the output table to see what the relay log was named (mysqld-relay.000037):
This proceeded to run through the relay log, which took a minute or 2 and then at the end it showed me that there was an error on the last line. I assume this was caused by an incomplete write in the relay log as a result of the server restarting. OK, that makes sense and it kinda sucks that this is possible on a restart, but you can’t have everything right? On to the solution…
In order to get your slave back to where it should be, you need to look at the results of this command again:
show slave status\G
Copy down these values as you will need them in the next step:
Now you will need to stop your slave, issue a command with the values in the previous step and then start your slave again like so:
stop slave; CHANGE master TO master_log_file='mysql-bin.000041',master_log_pos=661877428; slave start;
Then you can issue the show slave status command and Seconds_Behind_Master should not be NULL now. If its higher than zero, you can keep issuing the command to get an idea of how long it will take to catch up to the master. It is usually fairly quick unless you have many GB of data.