Contingencies for garbage workstations?
Posted by BalderVerdandi@reddit | sysadmin | View on Reddit | 38 comments
What is everyone doing for workstations you know are going to fail?
We've been "force-fed" a bunch of 13th and 14th gen Intel micro form factor Dells. The current batch has about a 40% failure rate (7090's) and we've just had a bunch of the 7010's (14th gen) delivered - and the kicker is we're going to Windows 11 over the next 90 days.
Both models get hot enough that you can use them for coffee warmers, and we've had enough of the 7090's fail that I just don't trust the 7010's as they get even hotter.
I've already told our local leadership that we're literally going to need replacements for the replacements due to heat failure, but it's fallen on deaf ears.
How are you all handling it?
StoneyCalzoney@reddit
Warranty?
Also do these mini PCs have the desktop version of the chips in them, or is it the laptop version? The desktop chips had a flaw in 13th and 14th gen where they would be slightly overvolted, causing instability and failure. Intel was able to fix it by updating the microcode but any damage done to units before that patch is permanent.
BalderVerdandi@reddit (OP)
Wish we could....
No one wants to absorb the shipping costs since we're overseas and it requires special handling. We would get charged to have them shipped back and forth, so shipping/handling costs would be multiplied three times (initial shipping, warranty return, repair return).
Plus, we don't have a depot to ship them back for repairs as they're not setup to handle that.
a60v@reddit
Can't you just ship the CPUs and not the entire machines? That shouldn't be terribly expensive.
pisandwich@reddit
Updating uefi/bios also pushes microcode updates to the cpu.
BalderVerdandi@reddit (OP)
We're updating the BIOS and running solely UEFI for Win10/11.
ForOursAndYours2137@reddit
Can you not downthrottle them down to 80%?
1a2b3c4d_1a2b3c4d@reddit
OK, so what? They will fail. Users will not be able to work. The users' manager can complain to their director, or escalate to your manager.
In either case, you are not accountable or respomnjsible for this mess. You just need to clean it up.
So have some spares around. And if that is not possible, then be clear when you set the expectations of how long the user will be without a PC. You are just the messenger and fixer.
BalderVerdandi@reddit (OP)
That's exactly what I dropped into today's meeting.
And you're right - while I get paid enough to note the issue and what the fix-action is, I don't get paid enough to care.
1a2b3c4d_1a2b3c4d@reddit
To be fair, you can care. You can care about the users, their distress, the impact it will have. Empathy is not a bad thing. You just don't want to worry about it, since there is not much you can do to change the situation. You don't want to stress over it, to the point that you get burnt out.
Its a money thing. The company doesn't want to spend the money... yet. Depending on where you are in the world, just wait until summer time, those PCs will over heat in no time.
And you can say "I told you so", but no one will care.
BalderVerdandi@reddit (OP)
That's the issue I have - no one seems to care about the customer base except a small number of us that are bringing it up, and I'm in awe over the lack of empathy.
I'm at the point where I'm seriously considering having a Yu-Gi-Oh style card made using the old "Forum Thread Necromancer" meme card with an "I told you so" line put on it.
thatsnotamachinegun@reddit
We had a similar issue back in 2008 with the Optiplex 280 SFF. They were dropping like DMX albums in the late 90s, and Dell didn’t have enough stock to for warranty replacements. It was a combination of bad capacitors and lack of airflow.
We had to use old stock 260s and larger form factor 280 towers for replacements, sometimes until the next hardware refresh. We didn’t really have much of an option. People have already covered some options (disable power heavy features, lower clock speed, et al) but this is really a business decision that must needs made. You’ve informed the powers that be. Best of luck
BalderVerdandi@reddit (OP)
Oh, I remember those days and their use of FoxConn motherboards. We had one we kept as a trophy when a leaky capacitor decided it wanted to try an explosive exit from the case.
wutthedblhockeystick@reddit
Use them as thin clients and run VDI through a data center provider. I might know one (me).
ddaw735@reddit
Id move on. If they are cheap here they may be cheap with me.
cbiggers@reddit
The 7020 Micros and SFF (14th gen) have been fine for us, even in "harsh" environments like housekeeping offices. We do keep the BIOS up to date, so perhaps there is that.
DestinyForNone@reddit
Huh, we use the same models but haven't had any issues.
Of course, we did have to do the bios updates due to the chip flaw from the Intel chips
InvisibleTextArea@reddit
Run crypto miners on them until they melt.
No_Wear295@reddit
Has anything significant changed with the Micros? They used to be pretty solid for everyday users
Disturbed_Bard@reddit
Yeah the latest Intel's run super hot and these things have shit cooling so they die
No_Wear295@reddit
Well that sucks.
disposeable1200@reddit
Send them back to Dell?
All my contracts have something like a 10% failure rate before I can just reject the entire model and return them. If you've not got this sort of thing in your agreements - get it added.
BalderVerdandi@reddit (OP)
No one wants to eat the shipping costs since we're overseas and require special handling, and no one wants to take responsibility back in the States to deal with warranties.
thatrandomauschain@reddit
... Get contracts in place fast.
disposeable1200@reddit
Can't you send it to Dell in your region?
Being a global company they usually have warehouses everywhere
BalderVerdandi@reddit (OP)
Nope - work space is considered "controlled access" and if we ship them out, there has to be chain of custody.
Moist_Lawyer1645@reddit
Can you redact them in any way? Remove memory and storage? (Yeah ik ram is volatile but when I worked in controlled access they considered it the same as storage)
ZAFJB@reddit
Always buy kit from local suppliers for this reason.
SevaraB@reddit
That's always the tradeoff with micro PCs- no room for real cooling, so no matter what the spec is inside, you don't want to push it to any kind of limit.
If there's enough reason to be concerned about thermal performance, there's enough reason to find more space for a workstation that can at least use ATX case fans.
Moist_Lawyer1645@reddit
I thought a while ago when improving macbook cooling they'd found that a smaller area makes it easier to cool?
thatrandomauschain@reddit
Depends on the hardware and software and how much aluminium used for passive cooling
stephendt@reddit
Disable turbo boost and see if there is anything that will reduce power limits. This will make them a bit slower but it should help with failure rates. You might be able to do this via power profile configs so there isn't as much manual intervention required
BOOZy1@reddit
Go with mobile series CPUs if you using small form factor PCs. Both Intel and AMD are rock solid when it comes to their mobile lineup.
15th gen Intel on-die GPUs (Arc) does have some odd driver issues though, not the crashing or overheating kind but hardware accelerated video streaming often breaks and RDP bitmap caching isn't working properly (black bars).
Next_Information_933@reddit
Warranty? Cover desktops for 3 years and then replace them when they die
PoolMotosBowling@reddit
Make sure nothing is getting saved on the hard drives. All file servers or SharePoint/OneDrive.
Always keep some imaged with all the standard stuff and ready to go.
Swap when dead, all you do is install that department's software.
Order more.
Sweet-Sale-7303@reddit
You sure it's not the microscope issue?
g225@reddit
Likely this is what it is, we have had issues with the 13th and 14th gen Intels on HP Z2 Mini. Previous 12th gen i9s are running fine.
jaskij@reddit
13th and 14th gen also had issues with the BIOS feeding the cores too high voltage and there were high failure rates from that alone, regardless of anything being wrong with the machines. Updating the BIOS fixes that, but the CPU could be physically damaged already.
_UberGuber@reddit
I handle it by replacing micro form factors with small form factors. But we don't have that problem because leadership usually talks to IT before ordering something stupid.