Is S2D supposed to survive a crash of the cluster disk owner node?

Posted by TechGoat@reddit | sysadmin | View on Reddit | 28 comments

I'm testing out a 3-node, 3-way-mirror CSV on SAS (didn't have the budget for NVMe unfortunately) SSD disks.

Enabling S2D was easy, and it's performant enough to consider putting it into production - but one thing that concerns me is that whichever node owns the cluster disk, seems to be a single point of failure; i.e. the test VMs that are stored on the CSV on all 3 nodes, don't seem to wait long enough if I simulate a crash (i.e. just hard powering off) of the S2D owner node.

If I do a proper, graceful shutdown/restart of that node - everything is fine; the ownership gets migrated smoothly and there's no problem. I'm only talking about crash/outage scenarios.

The other two nodes, the ones that don't own the S2D disk role - that's fine (if annoying) if when that node crashes, the VMs only on that specific node crash too (I'll only have 3x per node anyway; losing 3 VMs and annoying their users sucks but better than all of them) - but my eventual goal is to have 12x hosts sharing the CSV - if the crashing of that S2D disk role owner kills all 36 VMs though, that is keeping me up at night thinking about whether it's stable enough to go to prod or not.

I am having difficulty finding explicit documentation on this: should S2D, using a private VLAN network all its own for "Cluster Communications" and a different one for "Client Communications" - we're doing this already - should it be low-latency enough that in the case of a hard crash, ownership of the S2D role should instantly, within milliseconds, move to another node, and the other VMs should stay up?

It seems to me that when you're hyperconverged, you would want and expect a single node failure in a 3+ node cluster, even if it is the S2D owner node, to keep the cluster running. But maybe this is a single point of failure?

We're using the default settings for Server 2019 for thresholds and heartbeat delays:

CrossSubnetDelay          : 1000
CrossSubnetThreshold      : 20
PlumbAllCrossSubnetRoutes : 0
SameSubnetDelay           : 1000
SameSubnetThreshold       : 10