Took the plunged and switched to Enterprise NVMe - Now wondering what I'm doing wrong as performance is awful.
Posted by ADynes@reddit | sysadmin | View on Reddit | 58 comments
So it was time for a server change out, replacing a Dell PowerEdge R650 that had 6x 1.92Tb 12Gbps SAS SSD's in a RAID 10 array on a PERC H755 card. Had no issues with the server, we proactively replace at 2.75 years and have the new one up and running when the old hits 3 years when it then gets moved to our warm backup site to serve out the next three years sitting mostly idle accepting Veeam backups and hosting a single DC. Looking at all the flashy Dell literature promoting NVMe drives it seemed I would be dumb not to switch! So I got a hold of my sales rep and asked to talk to a storage specialist to see how close the pricing would be.
Long story short with some end of quarter promos the pricing was in line with what the last server cost me. Got a new shiny dual Xeon Gold 6442Y with 256Gb RAM and all the bells and whistles. But the main thing is the 8x 1.6Tb E3.S Data Center grade NVMe drives rated at 11GB/s read, 3.3Gb/s write sequential and 1610k random (4k) IOPs, 310k write (4k) IOPs each. Pretty respectable numbers, far outpacing my old drives specs by a large magnitude. They are configured in one large software RAID 10 array through a Dell PERC S160.
And here is the issue. Fresh install of Windows 2025, only role installed is HyperV. All drivers fresh installed form Dell. All firmware up to date. Checked and rechecked any setting I thought could possibly matter. Go to create a single 200Gb VM hard drive and the operation takes 5 minutes and 12 seconds. I watch Task Manager and the Disk activity stays pegged at 50% hovering between 550Mb/s and 900Mb/s, no where near where it should be.
Now on my current/old server the same operation takes 108 seconds. The old drives are rated for 840Mb sequential read and 650Mb seq writes. In that servers 6 drive raid 10 that would be 650 x 3 = for 1950 Mb/s for a sequential write operation. So a 200Gb file = 200/1.950 = 102.5 seconds (theoretical max) so the math works out per the drive specs. But on the new server the sequential write is 3.3 GB which x4 drives is a ridiculous 13.2 Gb/s. I should be writing the hard drive in 200/12.3 = 16 seconds yet it's taking almost 20 times that.
Is my bottle neck the controller? And if so do I yell at the storage specialist that approve the quote or myself or both? Anyone have any experience with this that can tell me what to do next?
Tzctredd@reddit
As soon as I read "software RAID" I knew what was coming. ☹️
No_Wear295@reddit
Not an expert, but I'd put decent odds on your theory that the software "controller" is the issue. As far as assigning blame.... I'd never consider a software-based storage solution for enterprise but that's just me
tidderwork@reddit
ZFS, Ceph, and just about every parallel file system would like to have a word.
Hardware raid is boomer raid. It works in small scale, but it's just so old school.
Sufficient-West-5456@reddit
Is veem considered software based? Asking for a friend
ADynes@reddit (OP)
My last two servers had Hardware based raid controllers but with these NVMe drives I wasn't sure what was needed which is why I asked them to look it over and make sure it was correct. Apparently that didn't happen
lost_signal@reddit
S160 Is a garbage tier software fake raid thing.
You should use VROC or a proper Perc with a mega raid chip. You’ll still bottleneck on the single pci card using a H7xx.
Now I’m a VMware storage guy, but in my world creating a VMDK is always instant (thin VMDK or VAAI assisted EZT).
VMware doesn’t support the garage S controllers for a reason.
brianinca@reddit
Trash controller for sure, geeze can't even make RAID10 work? BUT - he's using NTFS, not ReFS, for his storage partition. That's not the Microsoft way, for a number of reasons. Some folks have a hard time leaving NTFS behind.
Dell screws people every way they can with storage, it's baffling.
ADynes@reddit (OP)
I was debating turning off raid in the controller and direct connecting the drives and letting Windows server use storage spaces and ReFS to do everything but I'm afraid I'm still going to hit the same 6Gb/s limitation on the controller and be in the same position I'm in now since I don't think it's the raid implementation (which "works") but the connection to the system.
lost_signal@reddit
HPE also sells 1x PCI-E lane U3 backplanes drive cages to morons who don’t pay attention with smart arrays (yes, it’s hilariously ugly on performance, customer had to downgrade to SAS).
cosmos7@reddit
You went from a hardware PERC H755 to a software PERC S160... and you're surprised performance sucks?
AlexisFR@reddit
I though hardware RAID died 10 years ago?
Hefty_Weird_5906@reddit
OP, as per the comments in this thread, switching to a more capable RAID controller will definitely help. It's worth noting that in my experience Enterprise class NVMe's will typically still bottleneck a dedicated RAID controller doing HW-RAID1, HW-RAID10.
My own testing of SW RAID vs HW RAID (via the same dedicated RAID controller card) showed consistently slower results in certain tests for HW RAID. E.g. Random 32 queues, 16 threads (nvme profile of CrystalDiskBench). However the trade-off is that SW RAID consumes significant CPU time.
InleBent@reddit
GRAID has entered the chat.
BaztronZ@reddit
Make sure you're not using the perc controller write buffer. The array should be set to read ahead / write through
Hefty_Weird_5906@reddit
If OP ends up upgrading to a RAID controller and it has a battery/energy pack then the optimal mode would be 'No Read Ahead' and 'Write Back', on HPE MR controllers the 'Write Back' mode will fallback to the safer 'Write Through' mode if/when there battery backup is lost/discharged.
MBILC@reddit
that raid card is a software raid card, so it is basically a backplane for the drives and offloads all the work to the CPU.
teardropsc@reddit
Its most likely the Controller, just passthrough the Drives and do a Software Raid, you will notice the difference
girlwithabluebox@reddit
It's 100% the controller. He went from hardware raid on the old server to a software raid solution on the new server. Should have spent some money on a proper controller.
miredalto@reddit
Thing is, proper hardware NVMe RAID controllers don't exist (I would love for someone to show me otherwise, but the few I've seen on the market have looked like snake oil).
On Linux you just go for software RAID, and the cost on modern CPUs is negligible. Pure write performance will not quite match a RAID controller with a battery backed cache, but NVMe will trounce that on any mixed load.
On Windows you have the problem that the software RAID is garbage, so you do that and suffer, or you just rely on HA over multiple hosts. Microsoft doesn't care, because they never made real money selling server OSs anyway.
mnvoronin@reddit
HPE SR416 and SR932 are proper hardware tri-mode (SATA/SAS/NVMe) controllers. I'm sure Dell has something similar in the lineup.
No_Resolution_9252@reddit
That is sata speed, are you sure its plugged into the correct controller?
bcredeur97@reddit
NVMe drives are essentially designed to be DIRECTLY ATTACHED TO THE CPU
Any middle man is going to reduce your IOPS for sure
BobRepairSvc1945@reddit
The problem is trusting the Dell sales rep. Most of them know less about the hardware than you do. Heck most of them have never seen a server (other than the pictures on the Dell website).
Pork_Bastard@reddit
This is the answer. Source: wifes cousin and countless “experts” ive dealt with. Said cousin went from sneaker sales to dell enterprise san sales. 6 months in he was fascinated we had a san and asked what it was used for. Did not know what a VM was. 3 years ago. Lasted 2 years!
adoodle83@reddit
for max performance i would see if you can do multiple controllers and separate the drives to each; which should resolve the single PCI limits.
the downside is the wasted space of the multiple arrays.
Pork_Bastard@reddit
Sale Experts at either hp or dell or name_it are often underpaid and undertrained folks who got a sales job and dont even have ANY IT background or training and bullshit it like wild. Ill never forget the call about 2930F and 2930M and major performance differences and the only thing i could get out of them was 2930M was focused on heavy wifi environments. Wtf
Sinister_Crayon@reddit
To your edit: yup; the S160 is complete dogshit that's got no business running anything more complex than a boot drive.
The PERC H965i is a much better card, but you're honestly far better off software RAIDing those bad boys. The controller will still be a bottleneck so what you really need is a card to pass through the NVMe drives as raw devices.
ADynes@reddit (OP)
From what I can tell the h965i is their top card and should be capable of 22 Gb/s with up to 8 NVMe plus has 8Gb cache and battery backup. I mean I can try switching them off of raid and just direct connecting them and seeing what performance is like I feel that's a better idea
Sinister_Crayon@reddit
I mean you do what works for your workloads... I'm just some rando on the Internet LOL. But seriously, I became allergic to hardware RAID controllers of any kind mostly while working for Dell. Nothing like seeing how the sausage is made to make you eat more bacon.
It's not that hardware RAID is inherently bad... it's not... but you are always at the mercy of the vendor if something goes wrong. Especially out of warranty it can get expensive and sometimes impossible to recover data from a hardware RAID because said hardware RAID won't import to a new controller because of some bug in the firmware. Software RAID can be portable across controllers, even operating systems. As a result, recovery from a failure state can be much simpler. Software updates also can be rolled back much easier than firmware updates as a general rule.
Finally, while the H965i is a really solid card, your max performance is still going to be limited by the CPU and memory on the card... what if your application performs best with more than 8GB of cache? Software RAID will use as much memory as your machine has for cache which is much easier to expand.
Again though, it depends a lot on your application and operating system. Some apps just don't like software RAID of any kind, though I personally think those application suites deserve to die in a fire :)
R2-Scotia@reddit
Dell ... expert 🤣
It's rare to find a Dell SE that knows as much as customers
When I studied performance in college there was a cSe study of exactly this mistake being made by IBM with a big mainframe client in the late 60s. Plus ça change etc.
Sinister_Crayon@reddit
That's because good SE's left. Back in 2017 or so they started pushing the SE's to be salespeople... to the extent that technical training took a back seat to sales training. By 2020 (the year I left Dell) there weren't really many actually competent SE's left because they all either got let go or quit because they didn't sign up to be salesdroids.
Modern Dell "teams" are two salespeople and no technical people.
And let me finish with my traditional "Fuck Jeff Clarke"
ADynes@reddit (OP)
Honestly my sales guy seems pretty Sharp but ironically the storage expert who led the conversation sent me scanned in PDFs with stuff circled. Lol. That should have been my first red flag
decipher_xb@reddit
They should have never sold you a new server with e3.s drives with software raid.
anxiousinfotech@reddit
This. Those emulated controllers can barely handle spinning rust. They don't stand a chance with NVMe.
You either need a hardware RAID controller actually designed to handle NVMe (which will likely still end up being a notable bottleneck), or pass through the NVMe disks directly to the OS and use a software solution. Since Windows Server is in use the most likely candidate is Storage Spaces. As much as Storage Spaces makes me cringe, I've been running it on enterprise NVMe drives connected through an NVMe enablement card for 3 years now with no issues.
ADynes@reddit (OP)
That's kinda how my email is going to be worded tomorrow......
Leucippus1@reddit
I am surprised they still sell the S160. Honestly, that RAID card is why I stopped buying dells and went HP on my last order, the HP storage cards are a night and day difference. Even in hardware raid I noticed much faster performance on HP.
HJForsythe@reddit
Whats the CPU usage like when benchmarking? Software RAID crushes CPU with fast drives. We use Dells H755N with NVMe drives and the overall throughput was about 10x the best SATA SSD we could find. You do need an H755N for each set of 8 drives and even then the PCIe lanes arent being fully utilized.
Also I have been yelling at Dell for 5 years about supporting VROC but they refuse.
ADynes@reddit (OP)
CPU usage was barely noticeable but then again nothing else was running on the server and there are 96 threads sitting mostly idle...
HJForsythe@reddit
Weird.
I was getting 1GB/SEC+ on the H755N
if you have a spare drive and drive bay you could try setting up a new drive as direct attached or just reinstall if it isnt in production without the s160.
ADynes@reddit (OP)
Yeah, pretty confident the "raid" controller is the issue. I'm sure direct attached would be much better but at this point it's not worth even testing. Rather just get the proper Hardware controller
HJForsythe@reddit
Yeah I dont know a ton about the S controllers.. never use them I would just use OS raid in that case or in your case storage spaces.
Imobia@reddit
Hmm software raid not hardware? So it’s JBOD to Windows but a windows Driver than creates a raid 10?
ADynes@reddit (OP)
No, it actually gets configured as part of the bios. Once you start to install Windows it just sees the disk like a normal drive. It's some weird in between.
Zenkin@reddit
Doesn't the "S" in the RAID card signify it's a software version instead of hardware version? So operations where were previously handled by a dedicated piece of hardware is now getting offloaded to the rest of the system.
I've got zero experience with software RAID, but that's where I would be focusing my attention. Don't yell at the Dell guy, but show him what you're seeing and ask for clarification since you were (reasonably) expecting a performance boost, but you're seeing the opposite. Maybe he has an explanation which is better than my guesstimation.
ADynes@reddit (OP)
I'm going to guess the S does stand for software although you do configure it as part of the BIOS of the machine. And I've made a pretty big edit, looks like that is definitely the bottleneck
SAL10000@reddit
Yes get an actual hardware raid card with cache.
Software raids rely on the CPU for help.
The_Great_Sephiroth@reddit
3.3Gbps write seems LOW. Like, SATA low. Are you sure it wasn't 33Gbps? I have four NVME 4.0 PCIE drives in my gaming rig. They're performing above 30Gbps.
Another thought. Are those drives somehow optimized for sequential reads/writes? Random would be slow on those. I'd ask my Dell rep to see what he/she thinks. Something is wrong somewhere.
MBILC@reddit
They have a software raid card, 99% chance that is the issue. They never perform well and never have especially once SSD's / NVMe's came into the scene.
MBILC@reddit
You spend good money on all that hardware, but got a software raid controller, this is why.
lost_signal@reddit
S160 Is a garbage tier software fake raid thing.
You should use VROC or a proper Perc with a mega raid chip. You’ll still bottleneck on the single pci card using a H7xx.
Now I’m a VMware storage guy, but in my world creating a VMDK is always instant (thin VMDK or VAAI assisted EZT).
VMware doesn’t support the garage S controllers for a reason…
Secret_Account07@reddit
Nothing to contribute but curious on the reasoning. Stumps me.
Secret_Account07@reddit
Nothing to contribute but curious on the reasoning. Stumps me.
HJForsythe@reddit
If the server isnt in production yet you could always try direct attached but you would likely need to reinstall
HJForsythe@reddit
If the server isnt in production yet you could always try direct attached but you would likely need to reinstall
rcade2@reddit
Open a ticket with Dell/the storage specialist. It should be much faster, as you have noticed. This has happened to me before and it was tuning, plus when you build a new array (on HPE servers) it has to go through and "optimize" it for a couple days. Before that the speed is much lower.
Imobia@reddit
Hmm software raid not hardware? So it’s JBOD to Windows but a windows Driver than creates a raid 10?
hihcadore@reddit
I’d call Dell and ask them
cetrius_hibernia@reddit
Well, you just bought it... So speak to Dell..