Hyper-V, VMware, or other, which would you choose?

Posted by jedimaster4007@reddit | sysadmin | View on Reddit | 128 comments

I'm curious what y'all would choose to do in my situation.

We're a small org, currently have a 4-node VXrail VMware cluster running about 50 VMs. The cluster's been running since 2020, but support just ran out in December. For the vast majority of the cluster's life it has been rock solid, but with no support and aging hardware it feels risky to keep using it.

My predecessor wanted to transition to Hyper-V, so they bought three Server Datacenter 2022 nodes and two Dell PowerStore appliances, so that's the new cluster I inherited. For some reason they only included a 2-port NIC on each host, so each host only has one path for management and one path for iSCSI. Because of that we've lost the cluster twice due to unannounced switch firmware upgrades which brought down too many nodes at once, and for some reason even if I brought all but one node offline and tried to force quorum, I could never restore the cluster. In both cases I had to destroy the cluster and build a new one. It wasn't too devastating because we had only migrated a couple of non-critical VMs to test performance, and I just had to restore those from backups after building the new cluster.

The redundancy issues are easily fixed, but I'm more concerned about the cluster's resiliency. I've spent almost six months now trying to figure out why the cluster can't be restored after quorum loss, it's too complicated to get into all the details but even with expert consultation it's still a mystery. Having to build a new cluster isn't so bad when it's just a couple of non-critical VMs that go down, but the idea of having to build a new cluster with all of production completely down is nightmare fuel. So that leads us to a difficult choice.

Do we just add extra NICs to fix the redundancy issues and continue with the existing Hyper-V cluster hoping for the best? Or, do we take advantage of an optional (up to) $500k one time fund to buy a replacement VXrail VMware stack? Or a third option like Nutanix/Proxmox? Fixing the redundancy issues makes it less likely that the cluster would ever go down, we have really nice backup UPS and generator power as well, but I want to plan for the worst case scenario. We can always repurpose the PowerStores as file share servers, but I'm not sure what we would use the existing Hyper-V host servers for if we choose to pivot away from Hyper-V. I suppose we could try to convert the existing hosts to ESXi assuming that's possible, but since these hosts were intended for iSCSI storage they don't have enough storage for VXrail HCI. Although I suppose purchasing more storage for the existing hosts might be cheaper than buying brand new hosts especially with the cost of memory right now.