Pacemaker/DRBD: Auto-failback kills active DRBD Sync Primary to Secondary. How to prevent this?

Posted by Ushan_Destiny@reddit | linuxadmin | View on Reddit | 11 comments

Hi everyone, I am testing a 2-node Pacemaker/Corosync + DRBD cluster (Active/Passive). Node 1 is Primary; Node 2 is Secondary. I have a setup where node1 has a location preference score of 50. **The Scenario:** 1. I simulated a failure on Node 1. Resources successfully failed over to Node 2. 2. While running on Node 2, I started a large file transfer (SCP) to the DRBD mount point. 3. While the transfer was running, I brought Node 1 back online. 4. Pacemaker immediately moved the resources back to Node 1. **The Result:** The SCP transfer on Node 2 was killed instantly, resulting in a partial/corrupted file on the disk. **My Question:** I assumed Pacemaker or DRBD would wait for active write operations or data sync to complete before switching back, but it seems to have just killed the processes on Node 2 to satisfy the location constraint on Node 1. 1. Is this expected behavior? (Does Pacemaker not care about active user sessions/jobs?) 2. How do I configure the cluster to stay on Node 2 until sync complete? My requirement is to keep the Node1 always as the master. 3. Is there a risk of filesystem corruption doing this, or just interrupted transactions? **My Config:** * stonith-enabled=false (I know this is bad, just testing for now) * default-resource-stickiness=0 * Location Constraint: Resource prefers node1=50 Thanks for the help! *(used Gemini to enhance the grammar and readability)*