Took the plunged and switched to Enterprise NVMe - Now wondering what I'm doing wrong as performance is awful.
Posted by ADynes@reddit | sysadmin | View on Reddit | 76 comments
So it was time for a server change out, replacing a Dell PowerEdge R650 that had 6x 1.92Tb 12Gbps SAS SSD's in a RAID 10 array on a PERC H755 card. Had no issues with the server, we proactively replace at 2.75 years and have the new one up and running when the old hits 3 years when it then gets moved to our warm backup site to serve out the next three years sitting mostly idle accepting Veeam backups and hosting a single DC. Looking at all the flashy Dell literature promoting NVMe drives it seemed I would be dumb not to switch! So I got a hold of my sales rep and asked to talk to a storage specialist to see how close the pricing would be.
Long story short with some end of quarter promos the pricing was in line with what the last server cost me. Got a new shiny dual Xeon Gold 6442Y with 256Gb RAM and all the bells and whistles. But the main thing is the 8x 1.6Tb E3.S Data Center grade NVMe drives rated at 11GB/s read, 3.3Gb/s write sequential and 1610k random (4k) IOPs, 310k write (4k) IOPs each. Pretty respectable numbers, far outpacing my old drives specs by a large magnitude. They are configured in one large software RAID 10 array through a Dell PERC S160.
And here is the issue. Fresh install of Windows 2025, only role installed is HyperV. All drivers fresh installed form Dell. All firmware up to date. Checked and rechecked any setting I thought could possibly matter. Go to create a single 200Gb VM hard drive and the operation takes 5 minutes and 12 seconds. I watch Task Manager and the Disk activity stays pegged at 50% hovering between 550Mb/s and 900Mb/s, no where near where it should be.
Now on my current/old server the same operation takes 108 seconds. The old drives are rated for 840Mb sequential read and 650Mb seq writes. In that servers 6 drive raid 10 that would be 650 x 3 = for 1950 Mb/s for a sequential write operation. So a 200Gb file = 200/1.950 = 102.5 seconds (theoretical max) so the math works out per the drive specs. But on the new server the sequential write is 3.3 GB which x4 drives is a ridiculous 13.2 Gb/s. I should be writing the hard drive in 200/12.3 = 16 seconds yet it's taking almost 20 times that.
Is my bottle neck the controller? And if so do I yell at the storage specialist that approve the quote or myself or both? Anyone have any experience with this that can tell me what to do next?
lordcochise@reddit
Thanks for the insight here - We recently got a new PE 760XS (we're mostly a Dell shop) and I basically ran into the same issues you did. Originally it was going to be configured with 2x 965i but we had a reconfig a few weeks after the initial one when speccing online and it ended up with just the one due to budget concerns. We wanted to hardware raid SAS for hypervisor / nvme for VMs, but in reality in the 16 + 8 backplane config with a single controller it's really just one or the other, with 'other' using the S160 (a detail that was missed).
Hadn't tried this sort of setup with storage spaces before, figured it was worth a shot. Performance difference vs S160 is pretty close to your results (as expected); don't have enough VMs on this yet to really saturate the bandwidth yet, but looks pretty promising (though hypervisor is on the 965i SAS array, not sure i want to fly quite as close to the sun as you are just yet).
Would feel more comfortable with a 2nd 965i, but that ain't happening anytime soon. But we also use Hyper-V / veeam / UPS etc. so reliability isn't really an issue.
ADynes@reddit (OP)
Yeah, the OS and HyperV are running off the BOSS card (C:) which is just a glorified plug in hardware raid controller with drives attached while the VM disks and configs themselves are on the software raid (D:). So far so good.
lordcochise@reddit
Also, on a side note, just FYI in the past we've mostly used Hyper-V with the CPU compatibility box checked, and I've never had issues live-migrating VMs between hosts, but looks like there IS a threshold between far-enough generations of CPUs.
Until recently we had R720/R730 machines, and migrating from either of those to an R760XS was no issue. HOWEVER:
(1) Server 2025 enables credential guard by default, so if you were using CredSSP before, you'll have to set up Kerberos constrained delegation to go the other way or 2025 -> 2025. Not that tough though. Moving from Server 2022 or older TO Server 2025 with CredSSP seems fine.
(2) CPU difference between a 760XS and 720XD (Xeon Gold 6526Y vs Xeon E5-2697 v2) is \~12 years, but even with compatibility checked, VM still needed to be off in order to be Moved from new back to old (differences apparently too great)
(2b) it ALSO doesn't like if you have # of cores for a VM set too high such that it would be out of range for the older host's CPU (and this can't be adjusted unless VM is off), so this also needs to be adjusted back, if you increased it enough after migrating to the newer host.
We'll be mostly upgrading to 2025 eventually, but I was testing just in case I needed to move back if something went wrong; so if your older platform/CPU is old ENOUGH (and you're keeping at least one of them), live migration will no longer be possible (at least not as far back as my equipment).
ADynes@reddit (OP)
Thanks for the info. We proactively upgrade every 3 years on schedule, current server gets moved to another location to act as a warm back up as it hosts my veeam replicas and a domain controller for that branch. So luckily I'm not moving much, I'm going from an R650 to an R660, the last upgrade was from a R640 to the R650, so hopefully no issues with CPU versions. We also are at a slight advantage in that I can take down a server for an hour or two on a weekend so I usually remove its static IP address information, delete the network controller, shut it down, physically move the files over to the new server, boot it back up, and then reassign the network settings.
Every server I'm moving is 2022 except for a 2019 Exchange Server which I both can't wait and Dread changing out. But for now I'm just interested in getting everything moved over and then upgrading throughout the year
decipher_xb@reddit
They should have never sold you a new server with e3.s drives with software raid.
decipher_xb@reddit
@op curious if you heard back from your sales team?
anxiousinfotech@reddit
This. Those emulated controllers can barely handle spinning rust. They don't stand a chance with NVMe.
You either need a hardware RAID controller actually designed to handle NVMe (which will likely still end up being a notable bottleneck), or pass through the NVMe disks directly to the OS and use a software solution. Since Windows Server is in use the most likely candidate is Storage Spaces. As much as Storage Spaces makes me cringe, I've been running it on enterprise NVMe drives connected through an NVMe enablement card for 3 years now with no issues.
ADynes@reddit (OP)
That's kinda how my email is going to be worded tomorrow......
pastelfemby@reddit
Stuff like this is why I'm glad to no longer having to touch any windows boxes. Manufacturer advertised (software) raid is universally garbage, software raid at a filesystem level like ZFS while not perfect at least has some major quality of life features
Speaking of which, loving bcachefs at home, hope one day it's a bit more ready for prime time. Been great at least for build servers with a ramdisk holding cache and metadata.
teardropsc@reddit
Its most likely the Controller, just passthrough the Drives and do a Software Raid, you will notice the difference
girlwithabluebox@reddit
It's 100% the controller. He went from hardware raid on the old server to a software raid solution on the new server. Should have spent some money on a proper controller.
miredalto@reddit
Thing is, proper hardware NVMe RAID controllers don't exist (I would love for someone to show me otherwise, but the few I've seen on the market have looked like snake oil).
On Linux you just go for software RAID, and the cost on modern CPUs is negligible. Pure write performance will not quite match a RAID controller with a battery backed cache, but NVMe will trounce that on any mixed load.
On Windows you have the problem that the software RAID is garbage, so you do that and suffer, or you just rely on HA over multiple hosts. Microsoft doesn't care, because they never made real money selling server OSs anyway.
mnvoronin@reddit
HPE SR416 and SR932 are proper hardware tri-mode (SATA/SAS/NVMe) controllers. I'm sure Dell has something similar in the lineup.
Ikinoki@reddit
HighPoint SSD7580B
ADynes@reddit (OP)
You were correct. The software RAID was the limiting factor. Passing through the drives allowed them to perform fully, the 200Gb sequential write was almost instantaneous. The problem now is I'm slightly screwed in redundancy with my boot drive as I don't want to waste two 1.6Tb drives for that. And once Windows is installed I can't use the drive it's installed on.
So now I either have to get a proper hardware RAID controller so I can RAID 10 all 8 drives, software RAID 0 two drives for the boot and software RAID 10 the other 6 for data, or buy two more drives for a RAID 0 boot and software RAID 10 all 8 existing drives.
Unable-Entrance3110@reddit
Maybe spring for a RAID-1 BOSS card?
ADynes@reddit (OP)
Lol. It's really funny you posted this 4 minutes ago when for the last 20 minutes this is what I've been researching and more than likely what I'm going to go to sucks that's going to cost an extra $1,700 but I feel it's probably the correct solution.
trail-g62Bim@reddit
So, my understanding of ReFS is it needs to be on a battery backed raid controller to ensure it doesn't corrupt. If you're running without a hardware controller, doesn't that negate that?
ADynes@reddit (OP)
So I did two searches: "ReFS or NTFS for Software raid?" and "ReFS or NTFS for VM Storage?" and almost every result said ReFS for both. The server has redundant power supplies, each plugged into it's own UPS, each plugged into a dedicated power circuit. And the building had a generator that comes on after a 15 second power outage and stays on for 15 minutes after power has been restored. Don't think I can get much more fault tolerant.
trail-g62Bim@reddit
I would prob feel comfortable in that scenario too. I have always used ntfs but started looking at refs for veeam. Every source I could find said to use a hardware raid with a battery to ensure data isnt lost. In my scenario, power outages are way more likely, so it's a pretty important detail. But it is also possible this data is out of date.
bcredeur97@reddit
NVMe drives are essentially designed to be DIRECTLY ATTACHED TO THE CPU
Any middle man is going to reduce your IOPS for sure
ADynes@reddit (OP)
Apparently once I switched the controller from RAID mode to non-raid the drives are acting as they are directly attached. Access is extremely fast, CPU load is non-existent even with large sequential reads and writes.
No_Wear295@reddit
Not an expert, but I'd put decent odds on your theory that the software "controller" is the issue. As far as assigning blame.... I'd never consider a software-based storage solution for enterprise but that's just me
ADynes@reddit (OP)
My last two servers had Hardware based raid controllers but with these NVMe drives I wasn't sure what was needed which is why I asked them to look it over and make sure it was correct. Apparently that didn't happen
Sirelewop14@reddit
Relying on Dell to help you configure your hardware is a crapshoot. You might get someone who knows what they are talking about, and you might not.
You may have had someone who does know their stuff, but they made a mistake.
It's best to do some digging on your own and cross-reference with a VAR if you have the option. Especially for an expensive purchase like this.
Sirelewop14@reddit
Many business store PBs of data in Ceph clusters, all SDS.
Hardware raid and software raid solutions have their places.
What about a Nimble/Alletra or Pure SAN? Sure they have hardware controllers, but they also run software on the array to manage the storage, perform dedupe and compression, and monitor and alert.
It's just usually not as simple as "this is best, that sucks"
Sufficient-West-5456@reddit
Is veem considered software based? Asking for a friend
Sirelewop14@reddit
Veeam is backup software, not storage software.
tidderwork@reddit
ZFS, Ceph, and just about every parallel file system would like to have a word.
Hardware raid is boomer raid. It works in small scale, but it's just so old school.
lost_signal@reddit
S160 Is a garbage tier software fake raid thing.
You should use VROC or a proper Perc with a mega raid chip. You’ll still bottleneck on the single pci card using a H7xx.
Now I’m a VMware storage guy, but in my world creating a VMDK is always instant (thin VMDK or VAAI assisted EZT).
VMware doesn’t support the garage S controllers for a reason.
brianinca@reddit
Trash controller for sure, geeze can't even make RAID10 work? BUT - he's using NTFS, not ReFS, for his storage partition. That's not the Microsoft way, for a number of reasons. Some folks have a hard time leaving NTFS behind.
Dell screws people every way they can with storage, it's baffling.
ADynes@reddit (OP)
Se Re-Re-Edit.....direct using ReFS is almost instantaneous to the point I'm having trouble believing what I'm seeing.
ADynes@reddit (OP)
I was debating turning off raid in the controller and direct connecting the drives and letting Windows server use storage spaces and ReFS to do everything but I'm afraid I'm still going to hit the same 6Gb/s limitation on the controller and be in the same position I'm in now since I don't think it's the raid implementation (which "works") but the connection to the system.
lost_signal@reddit
HPE also sells 1x PCI-E lane U3 backplanes drive cages to morons who don’t pay attention with smart arrays (yes, it’s hilariously ugly on performance, customer had to downgrade to SAS).
Tzctredd@reddit
As soon as I read "software RAID" I knew what was coming. ☹️
cosmos7@reddit
You went from a hardware PERC H755 to a software PERC S160... and you're surprised performance sucks?
AlexisFR@reddit
I though hardware RAID died 10 years ago?
Hefty_Weird_5906@reddit
OP, as per the comments in this thread, switching to a more capable RAID controller will definitely help. It's worth noting that in my experience Enterprise class NVMe's will typically still bottleneck a dedicated RAID controller doing HW-RAID1, HW-RAID10.
My own testing of SW RAID vs HW RAID (via the same dedicated RAID controller card) showed consistently slower results in certain tests for HW RAID. E.g. Random 32 queues, 16 threads (nvme profile of CrystalDiskBench). However the trade-off is that SW RAID consumes significant CPU time.
InleBent@reddit
GRAID has entered the chat.
BaztronZ@reddit
Make sure you're not using the perc controller write buffer. The array should be set to read ahead / write through
Hefty_Weird_5906@reddit
If OP ends up upgrading to a RAID controller and it has a battery/energy pack then the optimal mode would be 'No Read Ahead' and 'Write Back', on HPE MR controllers the 'Write Back' mode will fallback to the safer 'Write Through' mode if/when there battery backup is lost/discharged.
MBILC@reddit
that raid card is a software raid card, so it is basically a backplane for the drives and offloads all the work to the CPU.
No_Resolution_9252@reddit
That is sata speed, are you sure its plugged into the correct controller?
BobRepairSvc1945@reddit
The problem is trusting the Dell sales rep. Most of them know less about the hardware than you do. Heck most of them have never seen a server (other than the pictures on the Dell website).
Pork_Bastard@reddit
This is the answer. Source: wifes cousin and countless “experts” ive dealt with. Said cousin went from sneaker sales to dell enterprise san sales. 6 months in he was fascinated we had a san and asked what it was used for. Did not know what a VM was. 3 years ago. Lasted 2 years!
adoodle83@reddit
for max performance i would see if you can do multiple controllers and separate the drives to each; which should resolve the single PCI limits.
the downside is the wasted space of the multiple arrays.
Pork_Bastard@reddit
Sale Experts at either hp or dell or name_it are often underpaid and undertrained folks who got a sales job and dont even have ANY IT background or training and bullshit it like wild. Ill never forget the call about 2930F and 2930M and major performance differences and the only thing i could get out of them was 2930M was focused on heavy wifi environments. Wtf
Sinister_Crayon@reddit
To your edit: yup; the S160 is complete dogshit that's got no business running anything more complex than a boot drive.
The PERC H965i is a much better card, but you're honestly far better off software RAIDing those bad boys. The controller will still be a bottleneck so what you really need is a card to pass through the NVMe drives as raw devices.
ADynes@reddit (OP)
From what I can tell the h965i is their top card and should be capable of 22 Gb/s with up to 8 NVMe plus has 8Gb cache and battery backup. I mean I can try switching them off of raid and just direct connecting them and seeing what performance is like I feel that's a better idea
Sinister_Crayon@reddit
I mean you do what works for your workloads... I'm just some rando on the Internet LOL. But seriously, I became allergic to hardware RAID controllers of any kind mostly while working for Dell. Nothing like seeing how the sausage is made to make you eat more bacon.
It's not that hardware RAID is inherently bad... it's not... but you are always at the mercy of the vendor if something goes wrong. Especially out of warranty it can get expensive and sometimes impossible to recover data from a hardware RAID because said hardware RAID won't import to a new controller because of some bug in the firmware. Software RAID can be portable across controllers, even operating systems. As a result, recovery from a failure state can be much simpler. Software updates also can be rolled back much easier than firmware updates as a general rule.
Finally, while the H965i is a really solid card, your max performance is still going to be limited by the CPU and memory on the card... what if your application performs best with more than 8GB of cache? Software RAID will use as much memory as your machine has for cache which is much easier to expand.
Again though, it depends a lot on your application and operating system. Some apps just don't like software RAID of any kind, though I personally think those application suites deserve to die in a fire :)
R2-Scotia@reddit
Dell ... expert 🤣
It's rare to find a Dell SE that knows as much as customers
When I studied performance in college there was a cSe study of exactly this mistake being made by IBM with a big mainframe client in the late 60s. Plus ça change etc.
Sinister_Crayon@reddit
That's because good SE's left. Back in 2017 or so they started pushing the SE's to be salespeople... to the extent that technical training took a back seat to sales training. By 2020 (the year I left Dell) there weren't really many actually competent SE's left because they all either got let go or quit because they didn't sign up to be salesdroids.
Modern Dell "teams" are two salespeople and no technical people.
And let me finish with my traditional "Fuck Jeff Clarke"
ADynes@reddit (OP)
Honestly my sales guy seems pretty Sharp but ironically the storage expert who led the conversation sent me scanned in PDFs with stuff circled. Lol. That should have been my first red flag
Leucippus1@reddit
I am surprised they still sell the S160. Honestly, that RAID card is why I stopped buying dells and went HP on my last order, the HP storage cards are a night and day difference. Even in hardware raid I noticed much faster performance on HP.
HJForsythe@reddit
Whats the CPU usage like when benchmarking? Software RAID crushes CPU with fast drives. We use Dells H755N with NVMe drives and the overall throughput was about 10x the best SATA SSD we could find. You do need an H755N for each set of 8 drives and even then the PCIe lanes arent being fully utilized.
Also I have been yelling at Dell for 5 years about supporting VROC but they refuse.
ADynes@reddit (OP)
CPU usage was barely noticeable but then again nothing else was running on the server and there are 96 threads sitting mostly idle...
HJForsythe@reddit
Weird.
I was getting 1GB/SEC+ on the H755N
if you have a spare drive and drive bay you could try setting up a new drive as direct attached or just reinstall if it isnt in production without the s160.
ADynes@reddit (OP)
Yeah, pretty confident the "raid" controller is the issue. I'm sure direct attached would be much better but at this point it's not worth even testing. Rather just get the proper Hardware controller
HJForsythe@reddit
Yeah I dont know a ton about the S controllers.. never use them I would just use OS raid in that case or in your case storage spaces.
Imobia@reddit
Hmm software raid not hardware? So it’s JBOD to Windows but a windows Driver than creates a raid 10?
ADynes@reddit (OP)
No, it actually gets configured as part of the bios. Once you start to install Windows it just sees the disk like a normal drive. It's some weird in between.
Zenkin@reddit
Doesn't the "S" in the RAID card signify it's a software version instead of hardware version? So operations where were previously handled by a dedicated piece of hardware is now getting offloaded to the rest of the system.
I've got zero experience with software RAID, but that's where I would be focusing my attention. Don't yell at the Dell guy, but show him what you're seeing and ask for clarification since you were (reasonably) expecting a performance boost, but you're seeing the opposite. Maybe he has an explanation which is better than my guesstimation.
ADynes@reddit (OP)
I'm going to guess the S does stand for software although you do configure it as part of the BIOS of the machine. And I've made a pretty big edit, looks like that is definitely the bottleneck
SAL10000@reddit
Yes get an actual hardware raid card with cache.
Software raids rely on the CPU for help.
The_Great_Sephiroth@reddit
3.3Gbps write seems LOW. Like, SATA low. Are you sure it wasn't 33Gbps? I have four NVME 4.0 PCIE drives in my gaming rig. They're performing above 30Gbps.
Another thought. Are those drives somehow optimized for sequential reads/writes? Random would be slow on those. I'd ask my Dell rep to see what he/she thinks. Something is wrong somewhere.
MBILC@reddit
They have a software raid card, 99% chance that is the issue. They never perform well and never have especially once SSD's / NVMe's came into the scene.
MBILC@reddit
You spend good money on all that hardware, but got a software raid controller, this is why.
lost_signal@reddit
S160 Is a garbage tier software fake raid thing.
You should use VROC or a proper Perc with a mega raid chip. You’ll still bottleneck on the single pci card using a H7xx.
Now I’m a VMware storage guy, but in my world creating a VMDK is always instant (thin VMDK or VAAI assisted EZT).
VMware doesn’t support the garage S controllers for a reason…
Secret_Account07@reddit
Nothing to contribute but curious on the reasoning. Stumps me.
Secret_Account07@reddit
Nothing to contribute but curious on the reasoning. Stumps me.
HJForsythe@reddit
If the server isnt in production yet you could always try direct attached but you would likely need to reinstall
HJForsythe@reddit
If the server isnt in production yet you could always try direct attached but you would likely need to reinstall
rcade2@reddit
Open a ticket with Dell/the storage specialist. It should be much faster, as you have noticed. This has happened to me before and it was tuning, plus when you build a new array (on HPE servers) it has to go through and "optimize" it for a couple days. Before that the speed is much lower.
Imobia@reddit
Hmm software raid not hardware? So it’s JBOD to Windows but a windows Driver than creates a raid 10?
hihcadore@reddit
I’d call Dell and ask them
cetrius_hibernia@reddit
Well, you just bought it... So speak to Dell..