Thanks A Bunch, Facilities!
Posted by TheYoungBung@reddit | sysadmin | View on Reddit | 190 comments
So I'm at work today, sitting at my desk working on some stuff. I'm the sole IT guy in our organization so I've got plenty to do and pretty much everything we have is in my scope of work. A coworker walks by and casually mentions that the server room is kinda hot. It was cooler than usual today so the heat was on, so it wouldn't be surprising that it's a bit warmer in there.
I go check it out and the wall thermometer was reading 102F. That's ambient room temperature.
I start checking everything in a panic. Turns out when facilities turned on the heat, they switched off our climate control. The temperature warning is built into that system, so it was not receiving power and altering to how hot it was.
I called the desk and said we needed someone down there to fix this immediately. The dude in a non caring voice said "He'll let someone know, no idea when anyone can make it down there"
I talked to my boss and shut down the servers. He wanted to argue if it was necessary, I told him we could go down for the day or be down for the month after we replace all the server racks.
Facilities never showed up to fix it, so now I get to come in tomorrow while everyone else teleworks and just wait for them to show up. Just had to get this garbage off my chest!
TequilaCamper@reddit
My boss would have been on the phone to someone above him in two seconds.
Did your chain of mgmt not get involved? That's the insane part.
Let that shit flow back down hill from whichever SVP is over facilities.
TheYoungBung@reddit (OP)
Oh this is great, thanks for bring this point up!
He didn't call anyone because he was in a meeting! I had to knock on the door and so a little "Um, excuse me?" To tell him what I was about to do.
He then went back into his meeting
TequilaCamper@reddit
Wow. I'm pretty sure if I tell my boss I'm shutting down all the servers he's gonna leave a meeting with God.
TheYoungBung@reddit (OP)
Yeah after he got out, he proudly announced that he was going to look at that climate control unit himself! Got up on a ladder and did a "Hmmmmm.... Yes..... I see..... The air conditioner is made of air conditioner" Then gave up, as though there was going to be an on/off switch on it
xGarionx@reddit
your vp is an insane lunatic....
zakabog@reddit
I'm shocked that you don't have a separate HVAC system that you control for your server room, though I guess in a one man shop your server room is more of a server closet with one rack and comfort cooling.
Also, purchase a device that you control that connects to your network and monitors temperatures.
Also also, you should be monitoring metrics like temperature on your servers.
TheYoungBung@reddit (OP)
Dude there is a refrigerator in the server room lol.
And we used to, but it kept breaking so facilities installed a dedicated unit into the ceiling. Which they control and apparently will just turn off on accident
zakabog@reddit
And you suggested the heat pump be moved to the appropriate location, right? ...right?
So monitor the server temps and that'll at least provide you with data indicating there's a problem.
This environment sounds like a mess, I genuinely hope you've been looking around for a better gig and already have one foot out the door.
TheYoungBung@reddit (OP)
I have! "But there's nowhere else to put it"
And it's really hard to consider walking away from what they pay me
zakabog@reddit
I can't imagine it's much if you're a solo employee working in an environment with no HVAC in a mismanaged server room and can't find $200 in the budget for a decent network connected temperature monitoring device.
TheYoungBung@reddit (OP)
145K
DrAculaAlucardMD@reddit
Really? Wait, are you in a large city, or is this a tech startup?
TheYoungBung@reddit (OP)
Government contracting
DrAculaAlucardMD@reddit
Ah, yeah I understand that. I'll be making close at a non-profit but running the dept. Congrats!
zakabog@reddit
I make double that and don't have to deal with the absurdity you've described.
Plus we have temperature monitoring devices in all of our server rooms.
charleswj@reddit
You make 300k as a "sysadmin"? Something feels off are you sure that title is accurate or is it a niche of some kind?
TecheunTatorTots@reddit
Lol. Me, sitting over here managing the entire IT for a non-profit making 47k š¤£š¤£š¤£
charleswj@reddit
That can't be in the US, right? That's exactly what I was making in 2007 which would be 75k today...and I wasn't doing everything. Time to start applying man, there's other jobs out there
ms6615@reddit
There are cities in the US other than NYC and SF. Some of them are very very cheap. People who live there donāt get paid as much.
charleswj@reddit
What makes you think I'm unaware of that? There is nowhere in the US that the role they are describing isn't severely underpaid
ms6615@reddit
I didnāt make any claim as the whether it was paid well or not, simply that there exist many places that canāt pay anything more. There are tons of tiny rural places that are still required to run computer systems and they canāt pay people much to do it. Just because it isnāt a āgoodā wage doesnāt mean it doesnāt exist.
TecheunTatorTots@reddit
Oh I agree with you. I'm definitely underpaid. And so is the other user that posted saying they were making a similar amount as a junior sysadmin. It's just that; in my area, there are very few tech jobs. The ones that do exist either A) contract with the DoD and want a top secret clearance (they won't sponsor it) or B) expect 5+ years of experience, a bachelor's, all the certs and a portfolio consisting of big projects for fortune 500 companies. And they want to pay peanuts, like, in the range of 50-70k for most of those positions. If you ask for the top of the range, that immediately disqualifies you. It's absolutely an employer's market right now and they can do whatever they want. I have hope that things will get better, so I'm hanging in there and still putting out apps. It just sucks out here, lol.
TecheunTatorTots@reddit
It's definitely in the U.S. I just live in a poor state. In the last month alone I've applied to about 300 or so jobs. It's been crickets. I think it's likely because I have relatively little on-the-job experience. I've been in this position for about 1.7 months, and I have an associates; no bachelor's. Working know the bachelor's now.
thunderbird32@reddit
48k here as a junior admin. Some folks have a hard time even believing that, let alone someone in your position (running everything). It seems like a flight of fancy to even dream of making what those folks up-thread make, lol
TecheunTatorTots@reddit
I feel ya. I hope someday soon things will improve for both of us š
zakabog@reddit
I work in finance and one of the partners has an IT background so they understand the importance of having a knowledgeable competent team, especially given how much money downtime can cost the company. Which is why they have absolutely no problem throwing a small fortune at keeping things running smoothly.
charleswj@reddit
Makes sense
TheYoungBung@reddit (OP)
Well it's a good salary for my life, area I live, and skill level. I'd consider leaving for the right offer
zakabog@reddit
Just be on the lookout for better positions and apply to them on a whim, that's how I got my current position. Your job sounds absolutely awful, I wouldn't want to work for a company that mismanaged by incompetent people, it's going to burn you out or you're going to get laid off as the higher ups mismanage themselves into bankruptcy, or learn that they could hire an outside vendor to replace their one IT person at a fraction of the price.
You already told us they don't make smart decisions in order to save money, what's one more bad decision on the pile?
TheYoungBung@reddit (OP)
Well, not exactly. I'm a government contractor, so my company is in Virginia where I'm in Ohio. The people I work for pay for my slot through their massive manning budget, where things I need come from our tiny annual IT requirements budget. My position is funded for years to come, so if they don't want to listen to my suggestions, so be it. I put everything in writing to prove I advised them of the situation.
zakabog@reddit
Massive manning budget but they can only afford a single employee on your team? Just keep an eye out, apply for better positions, you don't want to work under such incompetency for long, it never ends well.
TheYoungBung@reddit (OP)
I appreciate the concern! That's not sarcasm by the way. I'm pretty good at seeing the writing on the wall, I've been in contracting for awhile now and know when it's time to jump ship.
I'm always applying for new jobs, if nothing else I like interviewing for higher positions so I know where my weaknesses lie.
I wouldn't say that the people I work with are incompetent, just mostly veterans that are stuck in their ways. They really have no idea how to use a computer. But if I need to remap their printer 17 times a week, that keeps me employed
zakabog@reddit
Let me take a moment to remind you that you have a refrigerator in your makeshift "server room."
TheYoungBung@reddit (OP)
Lol that's a great point, they are absolutely cavemen when it comes to anything tech related, which is why they're quicker to buy better monitors than the titular visual fault finder. It's honestly just something they wouldn't consider as a problem, but these guys are no joke in their respective fields. That's why I'm there
Splask@reddit
No beverages in the server room!!
Model_M_Typist@reddit
An employee used ours as a breast pumping room. good times
anxiousinfotech@reddit
What about the emergency Jack Daniels??
Moontoya@reddit
That goes under the floor tile along with a body bag or three
Problably__Wrong@reddit
Approved!
TheYoungBung@reddit (OP)
There are many abandoned diet Cokes in the server room
what-the-puck@reddit
At a past facility I managed the server room and we didn't know that one of our dedicated server room ACs went offline (it was checked weekly), for unknown reason, and we also didn't know that our second AC rooftop condenser was shared between the server room and a random meeting room plenum AC on a different floor in the building.
While we only had a single active AC working, someone flipped the thermostat in that meeting room from Cool to Heat. Heat always wins, without exception, so the entire rooftop condenser shut down and would not start back up.
Fortunately this happened during business hours so we were right nearby to power cycle the the second AC which took over and got the room cool again.
Neon_Splatters@reddit
Why does a 1 IT guy department have more than 1 physical server with several VMs on it?
TheYoungBung@reddit (OP)
Well before me, the position sat unfunded and vacant for 5 years. So 0 IT guys and 0 tech refresh. By some miracle, things were still sorta working through that time. For the first 3 weeks I wasn't taking tickets, I was too busy doing chewing gum patches on what I could to get things running somewhere in the ballpark of acceptable.
way__north@reddit
posts like this really makes me appreciate that we have good relations with our facilities guys, and that they're not totally clueless (with some exeptions of course ..)
But sounds very familiar that they're not the best at planning - or notifying affected parties. So I've established a direct link to some of the most used contractors so they notify me as soon at they have some work lined up that might involve IT just the slightest bit
way__north@reddit
fast forward to today: contractor calls me 2PM, needs network access at one of our locations for some cloud based monitoring tool that he was going to install during the weekend
Luckily for him, it was kinda slow today at the office, and there was spare capacity both in the patch panel and the switch
megasxl264@reddit
Why are you concerned if no one else is?
TheYoungBung@reddit (OP)
I'd rather reboot and reconfigure than replace entirely
megasxl264@reddit
Replace entirely doesnāt change your hours or come out of your personal pockets.
TheYoungBung@reddit (OP)
I work for the government and would like to see our budgets utilized in a responsible and reasonable fashion.
Also don't want to have to explain why our servers melted while I was on shift, when the IT equipment is directly my job
Oso-Sic@reddit
āBecause facilities turned the A/C off and we donāt have active separate temp monitoring. Remember when I emailed you in October of 2024 stating this could someday be an issue? Here, Iāll forward you the email.ā
Thatās how it works. If it doesnāt work this way, you need to find somewhere else to work.
vrtigo1@reddit
As you've found, many facilities teams are inept and simply don't care.
Our facilities team would routinely schedule AC maintenance and we'd find out about it when we'd get overtemp alarms and find out they'd opened the server room and allowed the contractors to power off our AC systems without any notice to IT.
After the first time we gave them a pass because I guess if you don't tell them, they don't know that's not kosher. We told them we need to be in the loop in the future, and they need to do one system at a time instead of doing both at the same time. Or, if they insist they can't spare the time to do them back to back, then either they or the contractor need to arrange for a portable cooling unit that can handle the cooling load while the primary systems are switched off.
In one ear and out the other, they could not have given less fucks. They did it two more times over the next year.
The next time they did it, we simply turned everything off, so a building with a couple hundred people in it had no Internet, no WiFi, no phones. Essentially no reason to be at work, since they couldn't work.
CFO/COO blew a gasket. We told them facilities failed to coordinate maintenance with us and took the system down, and that it'd happened several times before and they ignored our quite reasonable requests to be kept in the loop. Hellfire rained down on the facilities team that day, now they've just stopped maintaining our AC units out of spite.
anxiousinfotech@reddit
We had a landlord that would do this. They would shut off the building HVAC for maintenance and we'd usually get a notice about it being off for maintenance an hour or two after the first temperature alerts started coming in. The server room had dedicated cooling, but it still relied on the building's chillers being operational.
It was a mad dash to shut down everything that could be off after hours when those temp alerts started coming in. In all 8 years of that lease we never once got through to them that we need to know ahead of time. The only saving grace was it was always on a weekend, so most things could be off without people noticing.
vrtigo1@reddit
In that case, I'd expect (hope) that sort of maintenance was probably addressed in the lease. Probably just an issue of whomever reviewed the lease not knowing that would be problematic for you.
anxiousinfotech@reddit
It was addressed, stating that the landlord could do what they wanted, how they wanted, when they wanted. This was challenged before signing, but they were told that any modifications to the HVAC or utilities clauses would cause us to lose the space.
We told management this wasn't acceptable, but they apparently just had to have an office at this particular address and signed the lease anyway.
ReputationNo8889@reddit
Well i would say thats a win at least. Now you can follow your required maintenance on your terms :D
MB-Z28@reddit
Over temp can happen real fast in a busy server room. We had a A/C/ unit fail and in 45 minutes the room went +40 F over max temp of 80. Shortly after it was 130 in the room and a $100,000 router took a dump, hard. DRT. It was so fried that it required complete replacement as the power supply regulator failed and zapped many chips...Not a good day. That's when management decided to upgrade the A/C Expensive mistake, they were warned the A/C was critical and needed redundancy.
steverikli@reddit
Once upon a time we got one of those "hey it's pretty hot in the lab" mentions. Come to find out the AC had failed over the weekend, and nobody knew.
By the time we got there, it was a furnace. Lots of storage gear, spinning disks and enclosures, servers and switches and such. It was hot enough that panduit fiber trays in the ladders had deformed and in some cases come loose from the supports, just dangling loose. Even with the doors propped open it was stifling in there, so we took turns shutting down anything which hadn't already done thermal shutdown on its own.
Arguably, facilities should have been alerted that their unit was offline and looped in IT, but I suspect they weren't monitoring their gear. To be fair, this was an engineering lab rather than datacenter, so R&D rather than production, but still.
We got our own standalone little temperature monitoring gadgets after that one.
itishowitisanditbad@reddit
Arguably you both should have things warning you about this issue from different perspectives.
I get you fixed it after the fact but blaming facilities for the knock on effect doesn't really pan when you're admitting you had no monitoring whatsoever yourself.
I mean if I had this issue at my workplace there would be hundreds of warnings flying around.
steverikli@reddit
No real argument from me.
To be fair to the team, the company was still fairly young, recently moved into a new campus, growing rapidly during post-startup chaos, and a lot of infrastructure was either being built for the first time, or rebuilding something else that was put together "just to get going".
It's easy in retrospect to see all the things that could & should have been done back then, and a lot of it did get done over time. But at that point of the company's age and growth, I imagine "monitor Facilities' gear" wasn't top of IT management's list.
All that aside, "the room was so hot some of the cable management melted" still makes for a good story, over a decade later.
itishowitisanditbad@reddit
Your own.
Not monitoring theirs, monitoring your own to see the temps.
In no way am I saying you should monitor theirs.
Also...
Its actually also easy to see them while they happen, the 'just get it done' attitude destroys that though.
People act like startups don't have to bother trying because they operate under special rules... they don't. They can have issues just the same and is the reason most of them wind up bankrupt or absorbed/vanished.
But please don't misdirect what I was saying to something sillier than it was, thats gaslighting.
steverikli@reddit
How wonderful for you to have worked at places that have everything perfectly dialed in perfectly right from the start.
Please don't assume you know someone else's situation.
itishowitisanditbad@reddit
Never said it doesn't happen, again you're misdirecting what I said to something else entirely.
I'm saying its always obvious while its happening and not a 'we couldn't have known' thing like you're suggesting.
Again you've missed what I said, turned it into a more extreme version, and then responded against that...
Strawman gaslighting again dude, 101 stuff too.
steverikli@reddit
You're the one making assumptions about what IT was doing/thinking or prioritizing at the time. You simply can't know what else was on the team's plate.
Nor did I suggest "couldn't have known" -- that's your fabrication.
Again, please don't assume you understand someone else's situation.
itishowitisanditbad@reddit
Theres a reason one of us quotes and the other doesn't.
lul jesus i'm out
kuldan5853@reddit
I mean the good news is that most servers will survive being in 102F even for a day or two (I know from experience).
The bad thing is that your facilities team seems to be insane.
TheYoungBung@reddit (OP)
We just got off a 3 day weekend, I have no idea how long it was operating at that temperature
Problably__Wrong@reddit
No temp monitoring alerts from your UPSes?
TheYoungBung@reddit (OP)
Our switches run on desktop UPSs. All they do is beep when they're sourcing current
Zncon@reddit
Please!
https://avtech.com/Products/Environment_Monitors/Room_Alert_3S.htm
wezelboy@reddit
AVTech is a total ripoff. You are better off using an Arduino or your server temp monitoring.
Zncon@reddit
Unless it's for personal use, I don't mess around with custom projects. You have no idea who'll have to support that in the future.
TecheunTatorTots@reddit
If you wanna do this on the cheap, just a simple Raspberry Pi Pico W (or Arduino equivalent), some Python, and Adafruit's API (or even IFTTT) will get you most of the way there. You'll just need the appropriate sensors for heat and humidity. You can probably even set up push notifications with something like PushBullet. You could also go the MQTT route.
https://www.canakit.com/Common/System/Cart.aspx
https://www.adafruit.com/product/385
Really, you don't need that whole starter kit. Just the board, the headers, and a way to connect the sensor it.
Zncon@reddit
I've got a few things like this running at home on ESP32 hardware, but for work I try and avoid things that would be complicated for someone else to support if they didn't share my hobbies.
TecheunTatorTots@reddit
That's a good idea. I've tried to keep it more or less the same where I'm at, but also I get no budget to do anything so often times my only options are "free or almost free." That's non-profits for ya.
rcp9ty@reddit
Meraki sells temperature monitoring sensors as well. Hey thanks for reminding me to use up some of our sensor licenses that we aren't using to monitor a server area that changed roles recently.
cosmos7@reddit
This the replacement now that ITWatchdogs/Geist/Vertiv see-you-next-tuesdays have ruined a good thing?
Zncon@reddit
I still have enough working Geist units that I haven't had to order any, but it's on my saved list for the next time I need to buy something.
FuckMississippi@reddit
Itās been my go to for almost 20 years now. Heat alerts, generator alerts, water alerts, itās a fine piece of equipment.
vlad_draculya@reddit
We personally rely on this little beauty:
https://tempstick.com/?utm_source=bing&utm_medium=cpc&utm_campaign=Use%20%3A%20Server&utm_term=server%20room%20environment%20monitoring&utm_content=Server
Otherwise_Time3371@reddit
Love my Room Alerts!
EastDallasMatt@reddit
We have other monitoring, but I use these as my sensor of last resort.
https://tempstick.com/
DrAculaAlucardMD@reddit
You should fix that.
TahinWorks@reddit
An APC1500 w/ NMC card only runs about $1500 and does everything you need it to in an environment like that. The NMC card comes with a temp probe, with a port for a second, ideally for measuring both inside and outside the rack. And the NMC card is set up for a slew of network alerting.
You mentioned it may have been hot all weekend. An emergency weekend fix as soon as it was observed would have saved 1+ days of telework; the production hours saved would easily pay for the unit. That argument to leadership could be the lever you need to make the purchase.
B4rberblacksheep@reddit
You can pull the ambient inlet temp from a server via SNMP and build that into whatever your monitoring tool is.
Used to use that for a budget thermometer monitoring for a few old customers who refused to install or maintain AC
sauced@reddit
You can monitor inlet temperature on your servers
chillyhellion@reddit
Even just a cheap Temp Stick would be loads better than nothing.
Resident-Artichoke85@reddit
Servers should have temp stats. Setup something to poll and alert.
ImCaffeinated_Chris@reddit
Zabbix! Everything should be in zabbix!
thortgot@reddit
Modern servers aren't that delicate. Beyond potentially increased hard drive failure you aren't likely to see much of any issue.
It isn't ideal for the operating temps for CPUs but those have safety limiters already in place. Both to limit performance and hard lock machines before damage is done.
Power supply failure used to be an issue before the capacitor people got things sorted out but that's largely a non issue now.
cybot904@reddit
Room Alert hardware is a good solution for this.
210Matt@reddit
Sound like you need some temp monitoring in that room
StimpsonEB@reddit
https://temperaturestick.com/ this is what I have in my server room. It will also alert you if they cut the power if you get the one that has an external power supply.
kuldan5853@reddit
Yeah, same back when it happened to me. But I think our room was more like 120F by the time I noticed.
All servers survived, but the UPS was cooked.
8BFF4fpThY@reddit
We made it up to 125 one time. Several of the servers did their own thermal shutdown, but once it was back to a reasonable temperature everything came back online with no issues (other than some crashed VMs due to a non-graceful shutdown).
Hardware lasted a few more years before it was replaced on normal schedule.
Darkk_Knight@reddit
I'd get few spare hard drives on hand. Heat kills them easily. I've had like 3 or 4 hard drives die a few days later. Rest of the servers are fine.
RandomDamage@reddit
Power supplies, too. I've lost a few of those to heat and it always looked like the wierdest things
Darkk_Knight@reddit
Oh yeah. We did lose a couple of PSUs. Lucky I was able to reclaim them from unused servers.
floswamp@reddit
This is why thermal shutdown exists. Some people just care too much.
Alcobob@reddit
The AC in our server room went out Friday and the facility guys pressed the ignore warning button....
We noticed that there were issues when one mail server died on Sunday. Every single device had a warning logged that the temperature went above 70°C ( 158°F) and one router reached 140°C (284°F) according to its own sensors.
That said, nothing really broke. The router acted strangely according to the network guys and was later replaced as a caution, but that was it.
The 40 servers or so were totally fine.
craig_s_bell@reddit
Fine today; but look out for increased failure rates for various components, over the next few months. A temperature event is like sloshing the water up the far side of the bathtub curve... it may be a while before anyone notices that its overflowed.
Alcobob@reddit
In that case it was fine until they were replaced due to old age. We are talking so old that nobody in the university even wanted them for free.
Oso-Sic@reddit
Look into these. Highly recommend.
https://avtech.com/Products/Environment_Monitors/
TheYoungBung@reddit (OP)
Brother they won't even but my a visual fault finder. I trace fiber by telling someone to shine a flashlight into it
codename_1@reddit
use snmp to poll some devices for temp. switches at the top of the rack work good, gonna be a little hotter then ambient but close enough.
TheYoungBung@reddit (OP)
Great suggestion, though our equipment is so old I can't use it. My position was vacant for 5 years as they wouldn't fund it and we needed a tech refresh when my predecessor was in the seat. The network issues ST fiber at the endpoint and all the LC connections are keyed to a standard that the company doesn't make anymore. If we lost a cable, we lose a to workstation. I really have to MacGyver so much crap to keep it working
codename_1@reddit
i kind of doubt you dont have any device in your rack with snmp support, its not like its new tech.
TheYoungBung@reddit (OP)
It's hard to go too deep into it because of the nature of where I work, so I'm sorry but I hope a good ol "trust me bro" will suffice
itishowitisanditbad@reddit
I work in a specifically restricted environment, ITAR mostly but some other caveats here and there depending which group.
Nothing you're saying is explained with what you're saying.
Or by being in some special industry.
Sure a couple methods are not possible but 40+ others are...
Sparkycivic@reddit
You would be shocked to learn just how well even old hardware can expose critical health monitoring like temp, CPU load, batt voltage, ethernet traffic volume via SNMP. It's free, and the monitoring software can be free too!
I can diagnose basically any problem in my broadcast plant just based on the emailed warnings and their attached graphs that my PRTG instance gives me during any incident.
If PRTG can't automatically discover the parameters of a device, loading-in a MIB file from the device manufacturer certainly fills-in the blanks. Working with MIBs in PRTG isn't very enjoyable, but the results are satisfying. Test it at home if work time is tight.
pdp10@reddit
This. Use your built-in hardware, you just have to collect the metrics with OpenMetrics, SNMP, etc.
iamLisppy@reddit
+1
Yes, you need something that is actively monitoring temps for you to send you alerts so you don't need to manually check.
You could go even further and setup Grafana + Prometheus setup
TheYoungBung@reddit (OP)
The place I work in doesn't allow any form of wireless devices or transmitters, I believe the system that is in place is truly as good as it gets
lovejw2@reddit
The room temp sensors are wired that are posted above, I know you said they won't buy them. SNMP is a very old protocol and if they wont buy the sensors then you can poll you switches and get it's temp with it unless you are using unmanaged switches. There are a number of free applications that can check SNMP info for free and that can alert you to an issue.
iamLisppy@reddit
Try and make a case for it if you have no other means of knowing temperatures outside of being onsite and manually checking.
Cheers.
EastDallasMatt@reddit
When I was a low level tech, I turned off both of our ACs in the server room while doing a big cabling project. We kept the server room at 65F blowing forced air directly onto the racks, so it was freezing in there. I finished my project and went home for the day only to pop up out of bed at 11PM realizing that I never turned the ACs back on.
I raced back to the office to find the server room at a balmy 90F. I turned on both ACs, called to have the common area AC turned back on, put a fan in the doorway blowing the hot air out. I did a lot of googling to see how bad a situation I was in and felt much better afterwards. Once it got below 80, I closed everything back up, went home, and never told a soul.
digitalnative00@reddit
Protect the servers and go through your SOP for shutdown. Facilities is holding them hostage, not you. When the C-Level's start scowling, point them in the direction of Facilities.
Once the dust settles, wrest control of that thermostat from them.
Crafty_Train1956@reddit
Facilities people are usually the grumpiest, laziest people out there.
DrAculaAlucardMD@reddit
Facilities should be able to remote in and set the system remotely. Hence why they didn't need to show up in person.
RCTID1975@reddit
That seems a little short sighted. Why would you tie the monitoring system into what you're trying to monitor?
TheYoungBung@reddit (OP)
Government
lovejw2@reddit
That isn't the issue. Whomever desided to do that was being short sighted. Governemnt isn't the answer for not getting the right tool for the task at hand. If you give the right information you get the right tools. If this is a federal government facility then they will throw money at a problem. If you have a small budget then somebody somewhere isn't being given the right information the right way or is in the wrong position.
TheYoungBung@reddit (OP)
So I was semi-joking with that answer, but let me explain.
The facility I work in does not allow any sort of wireless device to be brought in. No cell phones, no smart watches, laptops need to have NICs and Bluetooth transmitters removed. Not disabled, removed.
There are elegant solutions out there and plenty of people have suggested them (which I appreciate greatly), however our facility doesn't have the luxury of the best option, we are stuck with the approved option, which isn't great
thortgot@reddit
Your servers certainly have CPU temperature sensors, likely case sensors (intake, exhaust) as well.
Set up your monitoring to measure what you care about (server performance, temperature) rather than a proxy for it (ambient room temperature).
PRTG can be set up for free to monitor this (500 sensors for free), all you need is a desktop.
monoman67@reddit
Now imagine Facilities raising the temp alarm setting because they are getting annoyed by the alarms. We discovered this when showing up for a weekend project and the server room was well over 100F.
Old_Ad_208@reddit
We have a datacenter in a very large office tower. They scheduled a partial power shutdown this past weekend. The datacenter did not go down, but the A/C quit working. Our A/C gets chilled water from the building. The power shutdown powered off the pump for the chilled water plus it shut off the controller for the pump. The building engineers already moved the pump and controller to a UPS connected to the building generator. It appears they had started the generator prior to the outage.
Our datacenter is well protected from a power failure. We have a UPS with dual power feeds plus a standby generator. There is a transfer switch that will switch to the other power feed automatically. If neither feed is working then the generator will start automatically. We lost half the power on all of our racks once because our management didn't replace the bad UPS batteries and the transfer switch killed all power output when incoming voltage dropped. The UPS didn't work with bad batteries. Lucky, only a few minor things don't have dual power supplies. (The UPS got new batteries about six months ago.)
Old_Ad_208@reddit
Yes, we have temperature monitoring which is how we knew there was an issue. We actually have multiple temperature monitors.
cybot904@reddit
I've always found it hard to get facilities managers and IT managers to work together on common things that affect buildings and power, etc. I'm sure there are great people out there, just not here. The facilities managers just incapable of working with anyone else to alert about power / HVAC and other issues that will affect operations in any kind of controlled managed way. It's always last minute and after the fact. Fighting this battle for years.
NightMgr@reddit
The server to provide payroll for facilities has been lostā¦..
CulinaryComputerWiz@reddit
I have solved this problem by also being the Facilities Team. Nothing like wearing two hats for the price of 1 (or maybe 1.25)
__Arden__@reddit
Yep I am in the same boat. If it plugs into a wall, I am responsible for it. Good news I work for a Bank so I get all the toys including the room alert, dedicated AC for my server room ect.
TowerOfPimples@reddit
How does a BANK not have proper seperated responsibilities?
__Arden__@reddit
Facilities and IT are not really in conflict when it comes to fraud preventions that's how.
CulinaryComputerWiz@reddit
The nice part is that since no one here has a clue about what I do, if i say we need it the answer is pretty much "ok" unless it's crazy expensive.
WheredMyMindGo@reddit
If everyone can telework, why do you need the servers?
Imd1rtybutn0twr0ng@reddit
Trolling? Do you know how I.T. works?
DeadStockWalking@reddit
IT 101 says you should have UPSes with temperature probes and email alerts in your server room.
Never rely on another department and their equipment to monitor your equipment. Now you know.
rootofallworlds@reddit
For reasons I donāt know, facilities is responsible for our server room UPSes. A facilities employee, trying to turn off an alarm after a power outage, shut down all power to the server room.
Silence in that room is something I never want to hear again.
steverikli@reddit
Amen to that -- it's *eerie*.
My first datacenter job had one of those. It was a corporate facility rather than colo, so not enormous, but still had UPS and diesel generator etc.
On the fateful day, the local electric provider had a failure so nothing was coming in from the street. It had happened previously but rarely; still, that's why you have backup power gear, right?
Except the backup gear ... didn't. The UPS took load as it was supposed to, but only briefly, and then the room went cold and dark. "Oh, sh...."
I was in the command center at the time, and the silence was sudden and oddly deafening. The only lights were the exit signs and a couple of battery powered wall safety lamps.
We were told later that facilities' generator inspector had left the unit in bypass (?) after the last regular test inspection, so it wasn't in the right state to catch load from the UPS. Since the UPS was sized only to weather the initial outage and then toss to the diesel, there was nothing to catch it and down we went.
Moontoya@reddit
Ah yes the good ol' "silence descended like a church bell falling from its tower"
Arudinne@reddit
We had an AC die in a server room at one of our sites back in November. It's still broken.
Facilities is useless and it's really hard for me to deal with this crap from 1000 miles away. Supposedly we might finally have someone who can fix/replace it.
robot_giny@reddit
I miss having a dedicated facilities team. The facilities team from my last job was great - the director had been there for 20 years, his lead was there for about 10 years, and we even poached one of their guys for IT. It was a non-profit healthcare facility and getting IT funding was as challenging as you might expect - IT was managed by the CFO. But we developed a great relationship with the facilities team and we could always count on them. One of the last things I did before I left was build a ticketing system for them in SharePoint.
bythepowerofboobs@reddit
Do you have idrac/ilo/etc. on your servers? If you do you can enable high temp alarms in there as well as an extra fail safe alert.
technomancing_monkey@reddit
To: CEO@company.com; CIO@company.com; COO@company.com
CC: Facilities@company.com
"To ensure the safety of critical business infrastructure all servers were shutdown in response to a failure of the Datacenter Cooling Systems. Temperatures in the datacenter were reaching equipment critical levels as published by the manufacturer. Once facilities has confirmed the cooling systems have been repaired IT can restart the servers."
swimmingpoolstraw@reddit
Exactly this, put pressure on the idiots and shift blame
Individual-Teach7256@reddit
I still have IDF's that reach 100+ consistently all summer long. Surprised stuff is still working this long :)
lelio98@reddit
Work remote until the A/C is fixed. Why sit there waiting on them?
TheYoungBung@reddit (OP)
The place I work in has privileged access - someone needs to be there to let them in and escort them. Makes sense for it to be me so I can make sure the result is satisfactory and reboot everything as soon as possible
visibleunderwater_-1@reddit
You keep mentioning this government stuff, so...wouldn't that imply that (if you are in the US) you are supposed to be NIST 800-53 compliant? At a certain point, this stuff can become "gross negligence" as in "people have been told and just don't GAF", and certain legal remedies will be applied on you and them by external agencies outside your control. There are very specific controls around environmental systems that affect the availability of data and systems. AT-3(1) is one off the top of my head..."Environmental controls include, for example, fire suppression and detection devices/systems, sprinkler systems, handheld fire extinguishers, fixed fire hoses, smoke detectors, temperature/humidity, HVAC, and power within the facility. Organizations identify personnel with specific roles and responsibilities associated with environmental controls requiring specialized training." PE-14 is another that requires defining specific environmental settings, procedures to implement, AND ways to ensure compliance.
Seriously, you are one more bad move away from a disaster...these situations are why whistleblower protections exist. Someone at your org is RESPONSIBLE FOR ENFORCING this stuff, like legally responsible on a dotted line somewhere. You need to let your boss know that this is a "big deal", like potentially "getting called to testify in front of Congress" level if some national security related data gets fubared because your Facilities people lack the REQUIRED training to do their job concerning crititical IT equipment. You should have a registered FSO (facilities security officer) someplace, you need to contact them ASAP before you loose hundreds of thousands of dollars (tax-payer funded most likely) of equipment due to ineptness.
TheYoungBung@reddit (OP)
I do greatly appreciate your insight and concerns - I have a receipt of everything. You could likely track my entire workday down to the hour through the ticketing system I developed myself (They didn't want to buy me one)
I don't want to go too much deeper into anything, but this isn't news to me
TNT359@reddit
Had something similar happen to me 15ish years back. Air conditioning contractor (who were facilities' contractor, not IT's) informed us they were switching off the aircon to the server room now for 10-15 minutes for maintenence. So inevitably some other crisis intervenes and the IT team forget about the contractor.
2 hours later and we notice some sluggishness with responses from the servers and some not responding at all.
Yup. Server room is so hot I had to strip off layers of clothes(this is in Scotland) to go in. Hp servers in reboot loops. Exchange 2006 (I think?) DB is fucked. Sql db's take ages to restore and the SAN that most of our storage was on took 2 days to fully restore.
I (IT junior at the time) made a frustrated comment about the aircon contractor who complained I offended him to facilities š. When I was asked to apologise my response was does the business (public sector) get an apology for the couple of hundred thousand pounds of lost productivity never mind the potential cost of dead IT kit? Neither apology was forthcoming.
Apprehensive_Bat_980@reddit
Had a case where the AC unit failed (Iād thought someone was turning it off). Took the supplier numerous call outs to attempt to fix, then eventually replace the whole unit. If the server room is big enough get x2 AC units (in case one fails). Check with facilities that the AC is being serviced. Previous companies Iāve worked for facilities teams didnāt fix ACās themselves.
rcp9ty@reddit
For all of you running on a limited budget. Lux makes a thermostat controlled outlet. Designed for heaters and window mounted air conditioners. But you can easily plug in anything into it whether it's a light bulb, a siren, or strobe light. It's $25-40 at most home improvement stores.
lescompa@reddit
Get a portable AC unit you can roll into the room in case something like this happens. Home Depot sells them.
TheYoungBung@reddit (OP)
I've made the suggestion, hopefully now that they've seen the results, my words will carry a bit more weight since it's clear I don't ask for random garbage we won't need
KickedAbyss@reddit
I've had servers thermal throttle down to 0.9Ghz all core. Boy did our Dba throw a fit.
Facilities didn't care, we had to jury rig a solution ourselves
Bont_Tarentaal@reddit
My server room also got a problematic aircon. I've put in an order for a new one, however, company have financial difficulties, so ETA is not known.
Sucks to be in that kind of situation.
phalangepatella@reddit
I am Facilities & IT Manager for a reason!
All the shit that could nuke our infrastructure is directly though me. House power, generator, cooling, roof access, electrical room, cooking comms, etc.
Itās going to suck when we grow more and have to split those two roles up.
Capable_Tea_001@reddit
If I were Facilities, I would have just told you to raise a p1 ticket š
daven1985@reddit
That sucks. Servers would have been fine though.
Sounds like you need to talk to your boss about getting their cooling on a separate unit.
fresh-dork@reddit
what's the margin on telling a VP that the facilities turned off the AC and that as a result you will shortly need to turn off all the servers (shuttering all work) so that you don't risk massive hardware failure?
i considered direct threats to facilities, but then decided that getting an executive to be angry on your behalf was the better play
TheYoungBung@reddit (OP)
Well we have a Colonel rather than an VP, and he's far more worried about how we fix it rather than who did it
fresh-dork@reddit
awesome. military guys love a clear objective and "get those fucks to turn the AC back on" is eminently actionable
TheYoungBung@reddit (OP)
Bruv they're like that until they hit Major, then they become political as hell
fresh-dork@reddit
show me how to play this :)
BoobsThatArePooping@reddit
Iāve used rolling portable AC units that can be bought at Home Depot or Lowes. Youād think justifying the small cost of something like that would be easy, but it isnāt sometimes.
TheYoungBung@reddit (OP)
See and I'm not bad at what I do - I've made these suggestions. They get blown off until it can't be ignored any longer. Hopefully this is the time!
visibleunderwater_-1@reddit
This is all a huge NIST 800-53 violation; someone will loose their job if this goes on much longer. DOCUMENT everything that you have tried to do, emails, tickets, etc. If your in the US in a federal org, you could be facing serious legal repercussions.
TheYoungBung@reddit (OP)
Also, is you username a reference to The Poop That Took A Pee, the highly acclaimed sequel to Scroty McBoogerballs?
Geminii27@reddit
Time to install a remotely-checkable thermometer with temp-check monitoring alarms.
TheYoungBung@reddit (OP)
I've explained in other comments why the someone solution isn't the viable one
Hot-Win2571@reddit
Good luck working from home while the servers are down.
Not that the people at home will complain much, while they catch up on their movie backlog.
TheYoungBung@reddit (OP)
They can, not full scope of work, but it's completely possible
whatsforsupa@reddit
There are pros and cons to being the sysadmin as well as the HVAC guyā¦.
In addition to our climate control, we also run portable C units in each of our 3 server rooms, that pipe air into the. Ceiling tile, that then vents out the build. Itās icy cold when itās working.
When someone walks in and nudges the unit, breaking the piping, it exhausts mega hot air into the server room. Relatively easy fix if someone says something⦠which they never doā¦. Luckily we run SimpliSafe temp sensors in the rooms
_DeathByMisadventure@reddit
Years ago I worked at a nuke plant, and one snowy day the heat kicked on hard in the server room, and could not be turned off.
Neither could the servers, nuke plant you know, things could be bad. It got over 125 at one point.
We had the doors open, fans running, but every 5-10 minutes you'd hear the spinning of a hard drive die a horrible heat death.
Finally a facilities guy was able to get up on the roof and turn the system off. We lost something like 23 drives that day, and 2 server power supplies.
TheYoungBung@reddit (OP)
You my friend may understand the limitations I am working within better than most around here. My aunt worked in nuclear waste disposal and had similar basic qualifications that are required in my role
h3dwig0wl1974@reddit
If everyone can telework without the servers being up, what do you need the server room for?
TheYoungBung@reddit (OP)
I promise it makes perfect sense - let's just say I work for the government and leave it at that
iwashere33@reddit
I know a place that was having continual problems, they got a raspberry pie and a small battery back up for it. Put it on the network with the temperature sensor, and then had that serve a webpage that was then displayed on a large screen in the IT office. The background of the whole webpage would go red when it hit a certain temperature.
After which someone had to walk down the hallway and go manually turn the Aircon back on. Happened a lot after power outages where the Aircon wouldn't come back on. It was weird because sometimes it did, and as everyone knows intermittent problems don't really exist so you can't get any spending approved to fix it
Nik_Tesla@reddit
About a year ago, our facilities was installing new microwaves in the break room. Not just new, but additional ones, and surprise surprise, 4 microwaves on the same circuit, it blew the breaker. They go to reset the breaker, and fucking misread the very clear labels, and flip the breakers for the server room instead. We have a UPS on the servers, but not the core switch stack (because there's no data to corrupt and they're all PoE and the UPS would have to be massive to keep it powered for more than 1 min), so all the switches go off and every computer/phone/wifi disconnects, we're down while the switches boot back up.
We in IT have no fucking clue what's going on, scrambling to see if our switches area dying. About 30 min into troubleshooting, after it doesn't resolve the microwave issue, they flip the breaker again, and then proceed to disappear. Finally a conversation with warehouse guy who'd asking what the deal is, leads to him asking "Does this have anything to do with the microwaves?" to which we respond "What microwaves?"
Anyways, that facilities guy isn't allowed to flip breakers any longer, and the server room breakers are painted bright red.
TheYoungBung@reddit (OP)
The classic struggles of white collar vs blue collar!
Nik_Tesla@reddit
This more of an issue of Right side vs Left side of the breaker panel.
Sir_Vinci@reddit
Not that I recommend it, but this sort of thing is why I am now well versed in all the non-IT systems that support my datacenter. If the power or cooling goes out, I know how to get it running again.
You shouldn't have to do that sort of thing, but even if you didn't cause the problem, it's still your problem. No one fails to get to their application or times out loading a site and thinks "Facilities properly screwed this up!". Even if someone from Facilities gets chewed out for causing the problem, your customers still think it was you.
Look at it as an opportunity to expand your areas of expertise.
TheYoungBung@reddit (OP)
So luckily, I support a relatively small team and was able to go around and give a warning for what I was about to do before I did it, some users even went over to the server room to attest to the fact that it was indeed, very hot in there.
I also lucked out with the timing - nobody seemed upset that they got a half day right after a 3 day weekend followed by a telework day!
Sir_Vinci@reddit
Seems like that worked out about as well as a bad situation can.
mastachaos@reddit
One time they were replacing the ac in our server room with an in-wall unit for which they had to cut a new hole in the wall. To prevent dust from entering our servers, they wrapped them all in sheet plastic, while they were on. Several shut themselves down and started screaming (beeping). The door handle to the room was hot to the touch, that's how hot it got.
TheYoungBung@reddit (OP)
I've never once seen facilities show an ounce of regard for the mess their work might create, maybe that's for the best after reading this lol
Alert-Main7778@reddit
Been there. Sounds like you work at my company haha. Invest in some watchdog temperature sensors. That way when the temp starts creeping up, you get notified. Copy the maintenance director on it too.
Let them get spammed with alerts when it breaks again too.
slugshead@reddit
I had an electrician in a previous facilities team actually go out of his way to turn off the power to an AC unit, knowing full well what it was for.
I turned everything off, put a job into estates and watched chaos ensue.
Needless to say, the job was electrical so it went to him, he already knew what he had done (I did too, I checked the big red twisty switch and it was in the OFF position).
Took him three days to turn it back to ON after all sorts of excuses.
The exact same thing happened a week later.....
Dizzy_Bridge_794@reddit
I had a failure of a ac system in a single stack. Only took an hour for the servers to over heat as I raced in the car to shut everything down. Killed one of the DCās.
Few_World6254@reddit
Get some remote temp monitoring. We use avtech.com. Products for temperature monitoring that is independent of our HVAC systems.