A node in MySQL 5.7 innoDB cluster is crashed and unable to re-join the crashed node to the cluster

I did the following to restore the failed node from backup and able to recover the cluster state.

1)Below is the status of the cluster when one of the nodes failed (node3:3306).

MySQL  node3:3306 ssl  JS > c.status()
{
    "clusterName": "myCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "node1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", 
        "topology": {
            "node3:3306": {
                "address": "node3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "Missing"
            }, 
            "node2:3306": {
                "address": "node2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "node1:3306": {
                "address": "node1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "node1:3306"
}

2) Take mysqldump from the master node (healthy node) using the following command.

[root@node1] db_backup]# mysqldump --all-databases --add-drop-database --single-transaction --triggers --routines --port=mysql_port --user=root -p > /db_backup/mysql_dump_03062019.sql
Enter password:
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.

3) Execute below step to remove the failed node from the cluster.

MySQL  node1:3306 ssl  JS > c.rescan()
Rescanning the cluster...

Result of the rescanning operation for the 'default' ReplicaSet:
{
    "name": "default", 
    "newTopologyMode": null, 
    "newlyDiscoveredInstances": [], 
    "unavailableInstances": [
        {
            "host": "node3:3306", 
            "label": "node3:3306", 
            "member_id": "78ce3f2d-b970-11e9-87d9-06925c715652"
        }
    ]
}

The instance 'node3:3306' is no longer part of the ReplicaSet.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('node3:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: y
Removing instance from the cluster metadata...
The instance 'node3:3306' was successfully removed from the cluster metadata.


 MySQL  node1:3306 ssl  JS > c.status()
{
    "clusterName": "myCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "node1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures.", 
        "topology": {
            "node2:3306": {
                "address": "node2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "node1:3306": {
                "address": "node1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "node1:3306"
}

4) Stop group replication if it is still running on failed node.

mysql> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (1.01 sec)

5) Reset “gtid_executed” on the failed node.

mysql> show global variables like 'GTID_EXECUTED';
+---------------+--------------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                      |
+---------------+--------------------------------------------------------------------------------------------+
| gtid_executed | 01f27b9c-182a-11e9-a199-00505692188c:1-14134172,
e2aa897d-1828-11e9-85b3-00505692188c:1-12 |
+---------------+--------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> reset master;
Query OK, 0 rows affected (0.02 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.02 sec)

mysql> show global variables like 'GTID_EXECUTED';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_executed |       |
+---------------+-------+
1 row in set (0.00 sec)

6) Disable “super_readonly_flag” on the failed node.

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

mysql> SET GLOBAL super_read_only = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        0 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

7) Restore the mysqldump from master on to the failed node.

[root@node3] # mysql -uroot -p < mysql_dump_03062019.sql

8) Once restore is completed enable “super_readonly_flag” on the failed node.

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        0 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

mysql> SET GLOBAL super_read_only = 1;
Query OK, 0 rows affected (0.00 sec)


mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

9) Finally add the failed node back to the innodb cluster.

 MySQL  node1:3306 ssl  JS > c.addInstance("cluster_user@node3")
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster ...

Validating instance at node3:3306...

This instance reports its own address as node3

Instance configuration is suitable.
WARNING: On instance 'node3:3306' membership change cannot be persisted since MySQL version 5.7.27 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance() command locally to persist the changes.
WARNING: On instance 'node1:3306' membership change cannot be persisted since MySQL version 5.7.27 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance() command locally to persist the changes.
WARNING: On instance 'node2:3306' membership change cannot be persisted since MySQL version 5.7.27 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance() command locally to persist the changes.
The instance 'cluster_user@node3' was successfully added to the cluster.


 MySQL  node1:3306 ssl  JS > c.status()
{
    "clusterName": "myCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "node1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "node3:3306": {
                "address": "node3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "node2:3306": {
                "address": "node2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "node1:3306": {
                "address": "node1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "node1:3306"
}

source https://stackoverflow.com/questions/55036255/a-node-in-mysql-5-7-innodb-cluster-is-crashed-and-unable-to-re-join-the-crashed special thanks to @sqlcheckpoint

 

Leave a comment