99.99% purity replacement part
Posted by pie_-_-_-_-_-_-_-_@reddit | talesfromtechsupport | View on Reddit | 52 comments
I'm an IT intern at a small clinic and radiology imaging center in the US.
Monday night, a pretty bad storm rolled through that caused power surges and outages all over town, including at our clinic. The day after, I came into work and my boss told me that our file server (hosts radiology images, administrative documents, and many other things) had been down since the power outage last night, and asked if I could take a look at it.
I walk in the server room just expecting to have to manually start some service or another. I hop on the jumpbox, and "no boot device found". Oh no. This is on a Dell PowerEdge, so I spend 20 minutes trying to find a laptop with an RJ45 port so I can plug into the confusing iDRAC interface and see what's up. Eventually someone finds me a USB-to-RJ45 adapter, I start looking in the web interface and discover that it doesn't see the storage controller at all.
I open up the server to start re-seating things and looking for damage, and I happen to notice that one of the pins on the RAID controller card has pretty bad electrical pitting, probably storm-related damage. I re-seat a bunch of stuff and try powering it on anyway, no dice, still doesn't even see the RAID card. Yep, that card is very likely the problem.
I'm about to go tell my boss that we need to set up another computer to be the jumpbox for now and that that'll take a few more hours. But then I consider that the RAID card looks remarkably intact except for a single pin, and if I could get it to work, the jumpbox would come back up.
Searching for materials to MacGyver this with, I find the perfect thing: some medical gold foil in a supply closet that we use for some other machine or medical procedure or something. I take out a bit of gold foil, press it into the pits on the RAID card, and for lack of luck in finding such perfect material again, electrical tape it in place at the top.
I plug the RAID card back into the jumpbox, power it on, and it miraculously boots straight into Windows, just like it did yesterday. The RAID config and everything survived.
I think I used up my luck for the quarter.
(Of course, I told my boss that this was a very temporary fix, it might not survive a second boot, and that we really should have a PDU or something behind the servers anyway, and was promptly ignored.)
____-is-crying@reddit
If this was true, please never do that again. They need to learn several lessons from getting backup solution, UPS let alone having someone on staff that’s higher than a damn intern to be dealing with servers
kandoras@reddit
"I kept your business from dissolving by having to repeatedly explain to patients why you no longer had their records" is well beyond the "I should be getting paid to do this" and into the "I damned well better have a matching 401K and dental."
pie_-_-_-_-_-_-_-_@reddit (OP)
I agree!
The_Mammoth_Hunter@reddit
Re: boss ignoring your input... Nothing as permanent as a temporary solution
pie_-_-_-_-_-_-_-_@reddit (OP)
I can't wait for the next guy in 5 years to get told "the servers are down" and have to, mystified, pull electrical tape and gold foil out of the PCIe slot
kandoras@reddit
That next guy in 5 years could very well be you, not remembering what you did.
Might want to get a sticky note and put up a warning on that thing that the electrical tape is load bearing. If you're feeling really cautious, you can cover the sticky note in scotch tape.
SpongeJake@reddit
Dude did you at least document it all in an email to your boss? In my experience word of mouth warning followed by management not doing anything about it can come back to you unless all of it was documented.
Send an email to your boss outlining everything: that this is only a temporary fix and that the next time they won’t be so lucky.
That way when the next storm hits no one will be pointing fingers at you.
pie_-_-_-_-_-_-_-_@reddit (OP)
Yeah
LupercaniusAB@reddit
Yes, especially as you’re an intern, and therefore the most perfect scapegoat around. Definitely cover your ass.
gonzalbo87@reddit
The way I see it, he will go in for something unrelated and remove the tape/foil only to panic after. He will not understand why it works with it and not work without it. He will label it “magic” and the legend of the magic tape will be born.
pie_-_-_-_-_-_-_-_@reddit (OP)
It occurs to me that something like this is probably exactly how Magic / More Magic happened
Rough-Patience-2435@reddit
As someone who likes a good field fix, I have to remind myself "Functional is NOT fixed".
meitemark@reddit
If it works, it is fixed. Now forget all about it and DO NOT write anything down.
nicolasknight@reddit
Temporary: the most permanent form of Orary!
djdaedalus42@reddit
Intern in a small radiology lab? Ooh, talk about an outfit working on the cheap. I hope OP sees this only as a springboard for a real job somewhere else (yeah right, in this economy) because something is bound to happen, and it won't be good. Best not to be around when it does.
pie_-_-_-_-_-_-_-_@reddit (OP)
Yeah, I'm still in school and hopefully bailing from this place at the end of my internship
Wells1632@reddit
Then when the SHTF at some later point, you can pull out said documentation, including the suggestions and inaction on management's part, and be absolved of any wrong doing and an added bonus of the holier than thou look you can give when you say "I told you so".
fatmanwithabeard@reddit
How the hell does a radiology shop not have clean power and back up power?
I'm just utterly flabbergasted.
(also, I just realized I couldn't remember how to manage an iDRAC off the top off my head, and that makes me very happy)
You've got raw street power to your
Wells1632@reddit
How the hell does a radiology shop not have an on-call IT person who is trained in this kind of stuff instead of relying on an intern? Sounds like a liability suit just waiting to happen.
probablythewind@reddit
At best maybe the clean power/backup infrastructure is for the medical machines only and they don't want to tie those into a bunch of computers?
fatmanwithabeard@reddit
One would expect that if they have the expertise and in house understanding of power infrastructure to keep the imagers safely powered they'd apply that to all of their clinical tools (of which data storage of images is certainly one).
(my experience with imaging is more MRI based, which has some intense power conditioning requirements. Nothing in those labs is on street power.)
probablythewind@reddit
Losing an image is nothing compared to accidentaly overloading an imager. One means a redo the other means potentialy iradiating someone. I can see the logic in not trusting the computers to cause issues with the other stuff. Cannot see the logic in not atleast tying it into a UPS, having a backup and all that other normal stuff..
fatmanwithabeard@reddit
Yeah, sure.
Losing a whole pile of images could be devastating nonetheless. People who can't take another imaging session, managing retakes along with new appointments, and just losing history.
You don't put them on the same circuit. But if you can manage to reliably power the imagers (and for certain imagers, that's a serious amount of work), you can manage to power the entire lab.
There's very little reason not to have the whole lab on conditioned power. You don't need to have everything on backups (though the generators should be outside the conditioners), but no spikes should be getting through.
pie_-_-_-_-_-_-_-_@reddit (OP)
Yep, this is the situation
PSPHAXXOR@reddit
You guys don't just have the iDRAC connected to the network all the time? Would've saved you having to find an Ethernet adapter
pie_-_-_-_-_-_-_-_@reddit (OP)
There's a cable but our network/firewalling setup is so tangled that I couldn't figure out how to get in from the main LAN
BossStevedore@reddit
Photographs…
blixt141@reddit
You might want to put your suggestion to your boss in writing so at least you have cover when it happens again.
deeseearr@reddit
I am looking forward to the follow-up story next year, in which nobody can figure out why the RAID card mysteriously failed or why there are no backups and has no idea why that bit of electrical tape is there.
sherlockham@reddit
And the guy who put the weird tape there gets thrown under the bus for breaking the system.
TinyNiceWolf@reddit
"To whom it may concern: We have received the replacement RAID card we ordered, but we specifically requested an identical card for compatibility. I am retuning the card you sent, as it is entirely lacking the gold foil on our current model. We demand that supply us at once with your gold foil model, or we shall be obliged to take our business elsewhere."
Leonie-Lionheard@reddit
ROFL. Made my day
DuckDodgers22@reddit
Hope there was a “As we discussed” email after this meeting
SavvySillybug@reddit
Better save that on that server for safe keeping!
maroongrad@reddit
it not, there needs to be one asap!
OcotilloWells@reddit
"Some idiot just put gold foil on it instead of replacing it!"
Material-Echidna-465@reddit
Plus the ongoing discussion that it was working fine until that intern messed with it.
zeus204013@reddit
You need silver paint.
turunambartanen@reddit
For an event last year we filled balloons with 99.999% pure helium. It was at hand and with the volume we buy it's actually cheaper than buying consumer balloon gas, lol
Intelligent_Law_5614@reddit
Very nice improv! This appears to be a case in which the curse is foiled again.
MerionesofMolus@reddit
It was certainly a golden solution.
Purple-Lie-354@reddit
Take your upvote, and get out.
nymalous@reddit
Oof! 😄
wiredcrusader@reddit
JFC, dude...
Back that whole thing up to something that will allow you to virtually boot if needed.
Get a new RAID card.
Get a new server or get current with your warranty support if the thing isn't EOL.
Get a better boss. LOL
You shouldn't have to "MacGyver" anything. That's a sign of a terrible employer.
ChrisCopp@reddit
Can I get that in an email sir? You don't want to do what?
ThunderDwn@reddit
Why does this bit not surprise me in the slightest?
Id10t_techsupport@reddit
Bandaid applied. Forgotten by the end of the internship. Not hired on...no doc(s)...boss finds tape, removes tape and gold fire/smoke from where tape was...dead server. Next intern please?
Impossible_IT@reddit
https://tenor.com/view/macgyver-approved-macgyver-approved-gif-18574744?utm_source=share-button&utm_medium=Social&utm_content=reddit
valarmorghulis@reddit
Get a new sysbd and perc for that server in addition to a 15 minute rated UPS.
Relatents@reddit
I suppose it would be wrong to sabotage the temporary repair to help him understand the necessity of fixing it correctly before there are disastrous consequences?
zsrh@reddit
You’ve officially earned your MacGyver IT badge! The fix is one of those that could either last for the next boot or for the rest of the servers life 🤪.
ThatUsrnameIsAlready@reddit
Possibly both.