Database Replication Trouble Shooting

From idenprotect Knowledge Base
Revision as of 11:21, 27 November 2020 by Aisteshah (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Introduction

Database replication is a robust solution that should handle issues such as loss of network connection between the two servers. However certain chains of events can cause replication to fail. These are where transactions created on one server (whilst there is no connection between the two servers) can then not be implemented on the other server.

Examples

  • Sync job runs on both servers whilst connection is lost leading to trying to delete the same account twice
  • Sync jobs running at exactly the same time on both servers possibly leading to trying to create the same account twice.

An issue like these will stop database changes from replicating and the databases will become out of sync.

Checking Replication Status

To check the replication status of the two databases log onto both servers and access the database console

mysql

Then on each server type the command

show slave status\G;

You should see on both servers

Slave_IO_Running: Yes
Slave_SQL_Running: Yes


DatabaseSlaveStatus.png

You can also run

show master status;

and you would expect to see the value in the position column (626 in this example) to match the value for Read_Master_Log_Pos from the show slave status command on the*other* sever.

DatabaseMasterStatus.png

If the slave has stopped running you can try and restart them by running the start slave command from within the MySQL console on both servers.

start slave;

if after this command show slave status still shows that they are not both running that the replication needs to be repaired.

Repairing Replication

Scripts are supplied to help repair replication. These scripts are under /opt/idenprotect/scripts. If you do not have these on your installation email support@idenprotect.com

The scripts take a back-up of each database, however, it is highly recommended that a manual back up of both databases is taken before attempting to repair replication

The replication repair process will copy across the "correct database" to the "out of date" database. Before beginning this process it has to be decided which database is going to be preserved.

  1. Stop the idenprotect service from running on both servers
  2. Run the script resetReplication.sh on both servers this will stop replication, and take a back up of the database
  3. Check that the back up has been created, it will be under /etc/idenprotect and be called database<date>.sql where <date> is today's date
  4. Run the script restartReplication.sh, this will recreate the database and then restart replication.
  5. Only on the server that has the database to be preserved run the script restoreDatabase.sh. This will restore the database locally and also recreate an identical copy on the other server
  6. Start the idenprotect service on both servers

Once this process has been completed you can check the replication status as described earlier.