Shutting down the oldest system in the data center
Posted by Automatic_Mulberry@reddit | talesfromtechsupport | View on Reddit | 114 comments
Long ago, in about 2005, I was given the task of shutting down some old, very obsolete systems in the data center. I got through quite a few, migrating to newer systems with newer OSes, newer application software, and so on. But there was one that was a total thorn in my side - the oldest system in the building.
This was an old Compaq Proliant 2500, running Windows NT4 and SQL Server 6.5. The hardware, OS, and SQL Server were all well past end of life, but nobody had been able to pin down who owned it or was responsible for it, so it just kept going, waiting for the irreparable and inevitable crash. I was the FNG, so I got the task of figuring out what to do with it.
We did have some notes about who owned it, so I started down that path. I called the designated owner, and asked them about the machine. "Oh, no. I haven't owned that in years. Try this person." So I called that person, and they referred me to a third person, who referred me back to the first person again. I even went around the loop again, this time asking if there was anyone else they could suggest - no dice.
Meanwhile, I dug into user accounts on the system. At the OS level, only the admins had access, as one would hope. At the SQL Server level, there were no domain accounts, only SQL logins - "standard security," as Microsoft called it. I tried to match user logins to names, but they were all generic "appuser" type logins.
In an attempt to see who was actually using it, I monitored logins for a week, just to see if I could even capture any evidence that the thing was actually in use, rather than just turning electricity into heat. I didn't catch anything.
All of the above took a few weeks, leaving messages and missing return calls and such. Finally, I went to my manager. "I can't figure out who owns the machine, and I can't even prove it's in use at all. I want to shut the SQL Server services down for 30 days to see if anyone complains. If no one gripes, I'll power it down for 30 days. If still nobody gripes, I'll yank it out of the rack and send it for scrap. I should have it off our list in 60 days." With full blessing, I shut off the services and set a calendar reminder 30 days later.
On day 30, I got a call from somebody I did not know - "Hey, our server is down, and I wonder if you can help us?"
It turned out that this was a database that only got used once a month, for some weird reporting thing that I didn't even try to understand. It wasn't even very important - they said they had noticed it was down, and just figured it would be up again later. After a week or so, they finally had to call someone.
Now that I had a contact, I was able to get in touch with the person who actually owned it. And the migration was quite simple. I moved their database to a shared utility server, and they were very happy for the improved performance. I even got the old machine out of the rack and sent to scrap before the 60 days were up.
jeffbell@reddit
The good old “scream test”.
Turn it off and see who screams.
Loko8765@reddit
The problem is when you find a server that is used once a year for the annual reporting that has to be done between Jan 2 00:00 and Jan 5 23:59 because legal compliance.
I worked in a bank.
Strazdas1@reddit
we had a once a year use server, the timetable wasnt so strict though, basically entire month of december.
Misa7_2006@reddit
Oof
KelemvorSparkyfox@reddit
Not just banks.
I used to work for a food manufacturer. Every few years, someone would get annoyed about the ever-expanding number of suppliers on the books*, and demand that it be reduced. Initially, the cut-off point was nine months - any supplier not used (ordered from or paid) for nine months or more was to be deactivated. Cool - guaranteed work for me and my colleagues.
About two months later, there was a panic. One business unit needed their fleet MOT'd. They had a dedicated supplier for that, and said supplier was only ordered from once a year - that being the MOT window. Cue lots of angry screeching and demands to reinstate said supplier NOW, if not sooner.
After the second time this happened, the head of Accounts Payable listened to me when I suggested that the cut-off point ought to be eighteen months.
*The number hovered around 22,000. There were historical (and hysterical) reasons for this, and the hoops that users had to jump through in order to get a new supplier set up were considerable. But still, there was an almost constant stream of new supplier requests landing on my colleagues' desks.
17HappyWombats@reddit
That's just a standard decay curve. The primary ingredients get ordered weekly or more often, replacement parts that last five years get ordered... wait for it... every ten years, because the minimum order is about twice what we actually need.
Have also worked in manufacturing, and the chosen solution where I was was to outsource maintenance to a couple of contractors and downsize the maintenance department to just one guy. Who mostly just called contractors when there were problems and made sure they got paid.
Turns out that the "complete list of al required maintenance tasks" was missing a few things, and after about 10 years a very expensive machine stopped working because an oil filter hadn't been changed every two years as required and the obvious symptom was a bunch of hydraulic things stopped working when eventually "gunge" started flowing through them. The cost of that fix more than made up for all the savings of not having three maintenance guys on staff. By a factor of about 10.
KelemvorSparkyfox@reddit
Penny-wise; pound-foolish.
Which ought to be the motto of the MBA.
WoodyTheWorker@reddit
MBA formula: How to make a cow eat less and give more milk? Feed it less and milk it more.
GuestStarr@reddit
My ex used to work for a liquid transports company. They had about 120 or so trucks roaming around, so fuel costs were of course high. Every September they started using arctic diesel, which would not freeze until -37 or so centigrades. The normal summer qualityhy would be good til -25 or so. The arctic quality was of course more expensive. So, one summer they got a young and eager dude joining the management. He wanted to look good and an easy way was to cut costs. He started by applying a new rule, no arctic diesel before November.. That year we hit -30 in the beginning of October. They had hundred frozen trucks with expensive and dangerous liquids expecting speedy delivery all over the country. He was promptly told he should not ever any more touch anything he didn't understand.
KelemvorSparkyfox@reddit
Chesterton's Gate smacked him hard in the face!
centstwo@reddit
Had to look that up. Found Chesterson's Fence. Thank you.
ih-shah-may-ehl@reddit
We have something similar at work. Being in a large megacorp, getting suppliers on it is a horrendous task and if they are not big enough in turnover, they get kicked and we have to use aggregate suppliers who take a cut of 30% to 3000% to order small quantities of something.
Every time they cull the list I keep saying that it's ridiculous because it's a database. Whether there are 2000 or 20000 or 200000 suppliers, that doesn't cost anyone anything,
KelemvorSparkyfox@reddit
Aggregate suppliers... Until a few moments ago, I was in blissful ignorance that such a thing existed. Yet more reasons to detest late stage capitalism.
ih-shah-may-ehl@reddit
Well it's not really capitalism related. Just motonic company policy
KelemvorSparkyfox@reddit
The capitalism part is that a new type of company sprang up to sell smaller quantities of other companies' stock at inflated prices.
SomeOtherPaul@reddit
New type of company? Isn't what you're describing basically the definition of a wholesaler?
KelemvorSparkyfox@reddit
It's not a wholesaler, though. It would appear to be between a wholesaler and a retailer.
food52012@reddit
Worked at a small tech startup selling to megacorps. We had to go through the process of getting on the AVLs multiple times. Found it funny when the company buys 3x what they actually need because of the minimum spend requirements to add us as a vendor. I don't understand megacorp logic.
salttotart@reddit
This are the types of servers that need to be documented to high heaven and updated often, if they are that important. If it uses COBOL, then it needs to be migrated and emulated before the parts on the machine are i.possible to find.
FnordMan@reddit
Fun fact: COBOL compilers for Linux and Windows exist.
Loko8765@reddit
Nah, the things using COBOL are not the problem. They are documented, and even if they are not they all run in the mainframe, so we know all about them, at least if they run automatically. (Then again, I was never responsible for that part.) The problem for me was all the PC servers brought in in the 90s and noughties by the young hotshots who thought they would show a thing or two to the old COBOL farts, and never got around to actually documenting what had failed and what had not.
salttotart@reddit
Yikes. Better practices prevailed, it seems.
Loko8765@reddit
Oh, it took a few years, but yes. Everything either in the mainframe, running in the private cloud, running the private cloud, or else very well documented.
Ranger7381@reddit
Yea when I read the plan to send to scrap are 60 days I was like “no, keep it for a year in case it has to do with a yearly report “
prjktphoto@reddit
You’ve just discovered the time to schedule your next scream test
Loko8765@reddit
Well. Not that reliable… tax season… basically the scream test was to keep servers unpowered at least 13 months before removing them. Of course we tried a lot of other solutions also, basically what OP did.
Misa7_2006@reddit
Quickest way to resolve the issue for sure.
revchewie@reddit
Yeah. Every time I’ve run a scream test, nobody screamed.
WoodyTheWorker@reddit
They wanted to scream, but they had no mouth
jnmtx@reddit
Should have offered ice cream.
floutsch@reddit
Did that myself quite a few times (duh). It gets interesting when you yourself start screaming :D Only happened once, fortunately.
_Terryist@reddit
Do you have that story?
floutsch@reddit
tl;dr: boringly told story lacking details.
The details are hazy, I was in a bad spot back then. Almost 20 years ago. We were discontinuing products, some IT Systems were consolidated, some trulx redundant ones were scrapped, some were mysterious. I was a product manager back then but involved in this (being a pm, ad interim for the discontinued products as well, and my background in IT). For one we just couldn't figure out what it still did. Switched it off and my product went down. Really anticlimactic if I tell what details I remember. It was a lot more chaotic, but I don't want to embellish it to compensate for my memory.
Sorry for wasting your time :(
rhandric@reddit
> Almost 20 years ago
So, in the 80s, right?
...
Right?
Sudden-Programmer-0@reddit
Yes, in the 80s. I refuse any other answer as that would be gaslighting or fake news.
ifixthingsllc@reddit
OUCH......NO. NO ITS NOT
Valheru78@reddit
Autsj, right in the grey hair....
floutsch@reddit
Yes! No further questions 😅
Dungeoneerious@reddit
Oof, that hit hard.
Refflet@reddit
Dude embellishing whether or not it compensates for lack of memory is what makes a good story.
Transmutagen@reddit
You scream tested yourself. That’s pretty damn funny.
Quango2009@reddit
Did that to myself a few months ago. Had shut down a server so I could reinstall. A few minutes after shutting down I get a call from the office. It was hosting a Redis instance I forgot about
Oops! Fortunately I had not started to reinstall
bschmidt25@reddit
I had to do this so many times when I started my last job. No one had anything documented or knew shit about what these old ass physical servers were doing. Given no other option, I just unplugged the NIC and let it sit for a few weeks. If no one called, it got put on the scrap heap.
Hebrewhammer8d8@reddit
Is there ice cream at the end?
jeffbell@reddit
IBM
UBM
We all BM
for IBM.
Alistche@reddit
Hello H.A.R.L.I.E !
DrHugh@reddit
Old? We're doing that right now at my company. There's a module that was tested but never implemented, but we still have servers dedicated to it. As far as we know, no one is using it, but since it is a module that's part of a bigger application, there's no user log to check and such. So, we're turning it off for a month to see if anyone complains.
thRealSammyG@reddit
Service Cutoff Response Evaluation And Monitoring
AdministrationRude85@reddit
I love this one. It has been copied into my professional vocabulary now.
noceboy@reddit
We call it the “knijp en piep” system (pinch and beep) in The Netherlands.
theoldman-1313@reddit
Works for a lot of systems, not just IT.
Jezbod@reddit
I did that with mobile phones contracts that were showing no activity, strange how many people were keeping the phone in their desk draw "just in case".
Managed to get about 20 phones returned and the contracts removed. Not a small sum in the 90's.
peterdeg@reddit
Have had a contractor with us for a couple of years clearing out legacy systems. 200+ scream tests so far.
KelemvorSparkyfox@reddit
How many screams, though?
peterdeg@reddit
Total silence.
fresh-dork@reddit
and they weren't even mean about it. "oh yeah, that one's ours. you say you can relocate us to better hardware?"
Chocolate_Bourbon@reddit
At my last company we did this sort of thing from time to time. Often the only outcry was from someone who didn't use the relevant resource, but from someone who thought others did or should.
Typically the complainer was the person who created the resource and was quite proud of it. So often we would put it back in service and it would keep rolling along, used by nothing and no one.
My boss would take note of the complainer. If they left the company he would repeat the scream test. In some cases the person's replacement piped up, in some cases no one did.
I was convinced that a large percentage of the output of one group were reports that were produced, distributed, archived, and destroyed. All without anyone actually reading them.
KelemvorSparkyfox@reddit
Write-Only reports, as the Dis-Organiser put it.
Where I used to work I was convinced that a number of management positions only existed to justify the creation of financial reports that compared actuals to forecasts. There were so many of these that there was never a good time to make any requested changes, let alone improvements! The financial year started in April, so the last quarter of the year was always preparing the budget and the new standard costs for the next year. After April, and the problems with the Year End process had been sorted out, everything was subordinate to the Roll 3 report (comparing the first quarter's sales to the forecast). Then it was summer, and there was no-one around to test anything. After summer was the Half Year Figures, which required reports for the C-suite to present to the City, and then it was Christmas, which was a large proportion of sales volume AND value. After Christmas was the new budget season, and the wheel went around again.
At any point in the year, someone would be desperate for a given report that would show nothing of real significance, and yet apparently a Grade 4 (or sometimes 3) job depended on it. Said Grade 4 (or 3) manager could disappear for weeks at a time with no measurable impact on the business, but someone like me (a Grade 6 Analyst of various flavours) could paralyse parts of the business just by being on leave...
Techn0ght@reddit
I love it when the ones screaming are the ones who said they didn't have anything to do with it.
_haha_oh_wow_@reddit
It works!
Dr_Adequate@reddit
A while back I was involved with a radio upgrade project. My organization's radio was colocated in the same room as a dozen radios used for the countywide 911 system. I was afraid the techs may have misidentified my radio - they all look alike. When the day came to shut it off and swap in the new radio I was nervous as heck. But nobody else wanted to shut it off so I did, half expecting my phone to light up with calls from the 911 backhaul number.
But none did, my techs got the swap done and all was good.
Strazdas1@reddit
You got lucky. We had a 2003 Oracle base. We tracked it down to two users. One user would upload data to it, then a few months later another user would download data from it. Two uses per year total.
DigitalPlumberNZ@reddit
At a past job, I was determined to shut down the FTP* server that some clients used for data transfer. This was in 2014/5, so SFTP was well and truly established. Most clients were not a problem, but a large financial services client was very resistant. They also provided a mass of transaction data for roughly 1/4 our country's population, and that data went into a lot of value-added reporting for other clients, so delivery was non-negotiable.
I thought I had finally got things across the line, SFTP account set up with a key that they had provided, firewall rule in place to allow their IP through, confirmation from the account manager, stopped the FTP daemon and... "WE DIDN'T GET $CLIENT'S DATA LAST NIGHT!!!!!!!" Three more months of fuckery before I was finally able to decommission that FTP server.
* FTP was always over VPN, before anyone has that particular freak-out.
See also https://www.reddit.com/r/talesfromtechsupport/comments/7tw518/you_fixed_it_therefore_you_broke_it_or_the_change/ for another story of woe with this client (occurring in the period between the aborted cessation of FTP and the successful migration of their data transfer to SFTP)
AlaskanDruid@reddit
wait.. wait! How did this story end? Or is it currently ongoing? Gotta create a whole new post just for this alone...
DigitalPlumberNZ@reddit
See the linked post. It was still FTP when I left, but eventually they did manage to get the connection across to SFTP (after that post was written). Between the entitled attitude of the data source and the arrogance of the relationship manager, it was never going to be straightforward.
AlaskanDruid@reddit
I love stories with a happy ending.
Thank you!
Astramancer_@reddit
I was part of a department that generated a number of weekly, monthly, quarterly and even annual reports. A lot of the reports were old and had survived a lot of re-orgs and we couldn't figure out if half of them even went anywhere anymore.
So we did a scream test. We did the reports but didn't send them. Ultimately about half the reports went unlamented so we stopped doing them.
stekkedecat@reddit
an ancient machine like that may be fit for a museum instead of scrap?
rezwrrd@reddit
Know any museums that need 90s machines? Asking for a friend.
stekkedecat@reddit
https://letmegooglethat.com/?q=museum+computer+history
rezwrrd@reddit
Let me rephrase that for you, smartass.
Do you know of any museums that are actively looking for/accepting/soliciting donations of 90s machines? Are they really of historical interest? Or are there still enough around that it's more of a storage/disposal problem at this point.
If anybody has computer historical preservation connections I'm interested to hear insights on this. I've already googled the fucking question.
stekkedecat@reddit
No, I don't, because I have no clue where you are located, you smartass. Therefor, any musea I know that are looking are VERY VERY likely not interested in the machines that are at your locations, with the differences in electrical systems and all... Best is to look up those musea near you and ask them...
Codeman119@reddit
Wow, a compact ProLiant. That is old. That’s like a dinosaur and IT time.
Harley11995599@reddit
I lived in Vancouver, BC. They have a Transit system "Sky Train" the nearest example I can use is a subway above ground. We are very close to sea level here. The first line opened in 1986. A lot of people will see where this is going.
About 10 years or so ago the whole system just stopped. The 486's network card gave up, it took them around 2 days to find a replacement. Hopefully they have migrated the system by now. I can see that poor little 486 just chugging along, and a peripheral (?) is what took the system down.
Scheckenhere@reddit
Good thing to wait for 30 days. Could be that the person you were trying to monitor using it (maybe even daily) is on vacation for two days.
Ken-Kaniff_from-CT@reddit
This sounds like fun. You should try working where I work. I am 50% of the IT department and we have both been there for about a month. One guy left right before we started and the other guy left months ago. No documentation, aside from documents that haven't been touched in years and have almost no relevance to our current environment. We're just slowly starting to piece together this environment with multiple domains, almost 40 SQL databases, 30+ servers, mostly virtual, on top of doing a full range of IT roles. And we've been thrown right into projects that were started before us with virtually nothing for us to go on. And we're not a company. We're a municipal government agency that handles the type of thing I'd think most people care about most when it comes to the govt which makes it all so much crazier to me.
sirmarty777@reddit
Not quite NT4 levels of old, but we have a Windows 95 box running still. It interfaces with our parking gates. No movement from the manager to replace it, even after giving them a brand new box. Our solution? Take it off the network. The only reason it was on the network was so they could remote to the box to add/update parking cards. Now they have to walk down to the basement to the box and make changes. Maybe that inconvenience will finally get them to replace it!
shadowofthegrave@reddit
Win95 predates NT4 in terms of release and EoL, although they were contemporary systems for a while
Overall_Motor9918@reddit
Back in the mid 90s I worked a project at a big insurance company to remove their old VAX machines and servers that had 4 mb hard drives. They ran entire accounting systems on 4 mb. We used 3.25 disks to format the drives. It was quite fascinating.
horizonx2@reddit
Love a happy ending. I've had a similar experience with a DB but with many users -- is it safe to migrate? Yes. (Later: Oops this one is misconfigured it hadn't been used in 60 days...)
keithnab@reddit
I would disable the switch port, so I could reenable it remotely if someone screamed while I was off-site.
I agree that powering off an ancient system and believing it will power back up again requires a lot more faith in technology than I have.
asad137@reddit
OP disabled the server service first; someone reached out the day before they were planning to turn it off.
sneakattaxk@reddit
ballsy to actually power off something that old and fully expect that it will come back up without issues....sometimes drives go to sleep forever.....
would have just yanked the network cable instead
asad137@reddit
OP didn't power it down, they shut off the service. Someone reached out before it was powered down.
dustojnikhummer@reddit
Agreed. I would have kept the database off for at least half a year.
pt7thick@reddit
Knowing that this is an actual issue, I decided to test this years ago. I had a few servers that had been running for well over 12 years non stop. Old Storage that had been migrated to some new system. We pulled the servers out one by one and took the covers off before powering them down.
You could literally hear the components cooling down. Capacitors and solder traces cracking and breaking as they cooled.
Think of the Xbox 360 33% failure due to bad soldering. Everything runs fine when on, the system cools after a power down and all the solder starts cracking and lifting.
We'd all know about that issue with old servers but it was neat to hear it and see it everytime we had to power some ancient system down, knowing it would never come back on.
Glasofruix@reddit
Yeah same. Powering down old rusty systems is a gamble, they might never power on again.
HKatzOnline@reddit
Some systems are only used at month end, quarter end, fiscal year end....
mafiaknight@reddit
"Long ago"? But...isn't it 2005 now!?
Automatic_Mulberry@reddit (OP)
I hope you're refreshed from your nap. I have some bad news for you.
reddit-doc@reddit
Uh... Remember the Rage Against the Machine video for Sleep Now in Fire from a few years back that showed a dude holding up a Trump for President 2000 sign? That noone took seriously? Well stuff happened...
Puterman@reddit
2005... Idiocracy was still a year away, wow.
FluidGate9972@reddit
Now we're living in it
Jofarin@reddit
Can we please get an Idiocracy 2 that just reenacts the current timeline with the Idiocracy 1 characters?
FluidGate9972@reddit
Why not watch Fox News? :/
Jofarin@reddit
Too close to reality, I want one level of abstraction at least.
mafiaknight@reddit
😫
Successful_Ad8912@reddit
Playing Aquatic Ambience
Kelvin62@reddit
Back when my employer was performing Y2K migrations, they found an important server at the feet of a secretary at her cubicle.
meitemark@reddit
Dual use as server and foot warmer.
Dranask@reddit
Classic computing, turn it off, wait, turn it on.
Stryker_One@reddit
You forgot the step of "pray it all comes back up".
BoganInParasite@reddit
Had a similar dilemma at an airline around 2006 although it was a comms line into the mainframe that supported our passenger services system. No known owner and no traffic. The longest cycle in an airline business is generally the twice a year change of seasonal schedules so we monitored for traffic for six months, nothing. So we decommissioned it. Next month we got a call from one of our largest airports that their annual baggage system failover hadn’t worked. Fixed it quickly and lesson learnt.
Shufflen@reddit
Ever wonder why you see a lot more old Compaq servers than Dell servers
uncanneyvalley@reddit
Unlike their consumer product lines, Compaq servers were rock solid
TwoEightRight@reddit
If I were running one of those scream tests, I'd wait until quarter end, year end, or fiscal year end, whichever is later plus 30 days, before scrapping it. Just in case it's used for some obscure report that only happens once a year.
20InMyHead@reddit
Long, long, ago the company I worked for had a similar situation, but in this case the old server was only used for some tax-related job that was run once a year.
Apparently an absolutely mission-critical, legal-ramifications-if-not-done tax-related job.
You can see where this is going. After all the due-diligence and waiting 30, 45, 90 days what-have you, finally the old machine was scraped. Several months later, that once-a-year tax job needed to be run and shit hit the fan….
I don’t remember the details, but a lot of money was spent, and a new company decree was issued: no servers could be scrapped before being mothballed for at least two years.
insufficient_funds@reddit
Perfect scream test execution.
What I’ve done with super old hardware like this was just disable the NIC within the OS or unplug the cable. I find that much safer than powering it off or stopping services. Only have to worry about ad object getting past its lifespan by doing that.
AmiDeplorabilis@reddit
That was actually beautifully done, and apparently handled very well.
Kudos!
ThunderDwn@reddit
A functional example of an occasion when the scream test worked as designed.
mindcontrol93@reddit
That reminds me, I need to tell someone in our Chicago office that they can retire my back up server and raid. That thing has been running for 10+ years.
Eraevn@reddit
Done a fair few scream tests, various servers/services that were no longer relevant and most the people who knew about them were no longer with the company, we opt to do the scream test cause last time we asked no one admitted to using it but wanted to and never did. After that we just kept the decision out of the users hands lol
r_keel_esq@reddit
I've decommissioned a fair few old servers in my work over the last couple of years - none as old as NT4, but a few on 2003.
While most were very straightforward, one became a scream test for my own team. I had failed to notice that an older Application Test box was also moonlighting as a node for the server&network monitoring tool (PRTG) until it looked like a quarter of our estate had failed.
Thankfully, this machine wasn't one of the ones so old that ILO could only be accessed in IE so I was able to get it back quite quickly.
Six months later and it's still powered up
keithww@reddit
I use to run a multi site WAN for a local government, had a port on a switch that wasn’t documented and the prior admin did lock anything down. I walked the building asking everyone and nobody fessed up. That port was responsible for 80% of the WAN traffic.
I went into the switch and disabled every open port, then disabled the port in question. Sheriffs office calls up freaking out that their intake system was down. Every other location would poll the state database when an arrest was made, then every morning at 0400, then again if the person was being bailed out. They were polling on a loop, and also polling for anyone with an outstanding warrant.
Corrections were made and I turned the port back up.
people may not fess up, but they will scream when you turn it off.