Tuesday 4 October 2011

drbd split brain recovery




I was trying to reboot redhat cluster for backups and gfs performance testing but while restarting I got spilt brain drbd messages in /var/log/messages and the following is the output of /proc/drbd: { we have primary/primary drbd config}




Primary:
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by
argus@docidtxt03, 2009-03-09 18:04:20
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r---
    ns:0 nr:0 dw:94815205 dr:14861358 al:19160 bm:18136 lo:0 pe:0 ua:0
ap:0 ep:1 wo:b oos:18911216


Secondary:
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by
argus@docidtxt04, 2009-03-04 16:23:58
 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown A r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:55544 




Following commands are used to recover from the situation:::


drbdadm -- --discard-my-data connect all (on node with "bad" data)
drbdadm connect all (on node with "good" data)




And now the resources are connected. 


Then  verify the /proc/drbd status and if it isn't connecting check iptables and stop or allow the ports which are being used by drbd and then follow the commands again. or otherwise demote one of the node to secondary as 


drbdadm secondary r0 (on bad data node), after the successfull sync do




drbdadm primary r0 (on both nodes)

No comments:

Post a Comment