I swear this company has this stupidest and most hacked together patching process I've ever dealt with.
Posted by Delicious-Wasabi-605@reddit | sysadmin | View on Reddit | 52 comments
I work at a huge global company with layers and layers of management that just love to make up overcomplicated processes that is in no small part to justify their jobs. For this rant I'm going to piss on about the silly server patching process they put together. Now we have hundreds of thousands of physical servers and I can't even guess how many VMs are running so yeah I get it is a huge task. And you would think something as mature as patching servers, a process that's been happening for decades across the industry would be nearly completely automatic and transparent to the application teams. But no, far from it. Once every two months each application team, and there are 180 app teams, has to schedule a time with the Unix team or the Windows team to depending on your OS, and database teams if your application uses a DB cluster to patch the servers. And they will only patch by data center so for several hours you are required to have half your processing capacity offline. And it gets better, the OS teams are so swamped with requests half the time you miss the scheduled patch window which gets logged as a security incident and requires the directors to explain it to executive leadership during their meetings. And yes there is automation to deploy patches but there's so many steps to setup the automation and pull requests and change requests to be taken care of it would be faster just to download the stuff and install.
But anyway the one huge benefit that makes it all tolerable is my group has three teams around the world that use a follow the sun coverage so 4:00pm rolls around and I'm out. A 15 minute chat with the folks on the other side of the world at the end of the day and I'm done. No after hours on call. No late nights. No weekends. And cheap tacos (but dang good) when I do have to go in the office.
GeneMoody-Action1@reddit
How does an environment like this not have load balancing, clustering, etc that would allow for patching 24/7 at functional capacity? Are you just saying all services up and running slow on 1/2 capacity, or sites down?
I would expect desktop and server teams there would not even talk for the most part, there would be schedules and processes that just happen?
"so many steps to setup the automation and pull requests and change requests to be taken care of it would be faster just to download the stuff and install." this should be planned and repetitive procedure.
Don't get me wrong, I am not trying to trivialize any of that, the environment sounds huge and complicated, but some meetings, policy, and chain of command should boil tit down to work, not chaos and frustration.
This sounds to me to not be a technical issue at all, it sounds more like a management issue.
hosalabad@reddit
Man I’d make a lot of people angry until they got on board.
TKInstinct@reddit
I feel like they'd push for you to get fired before any meaningful change gets made.
hosalabad@reddit
Probably.
DaNoahLP@reddit
So far
ProfessionalITShark@reddit
Doesn't that leave them a month behind on patches as well?
So like half the year they are extremely vulnerable.
....is this healthcare?
mahsab@reddit
A month behind patches is not extremely vulnerable.
TKInstinct@reddit
Sounds like a good gig if you just want to coast. Things are so convoluted you could get lost in the scuffle and no one would notice. If you can tolerance the nonsense then collect the paycheck and ride it out.
Naznac@reddit
...and it's not like setting up a sensible automated patching process is that complicated...for windows at least... don't know Linux all that much but I can figure they're must be some way... And I'm betting at least 20% of the servers are dead weight 🤣🤣🤣
admlshake@reddit
Not until management or some project manager gets a bug up their ass that they weren't consulted. The amount of times I've had to sideline something because someone who should have zero input on a process like that, somehow worms their way in and pours glue on the gears is mind blowing to me.
patmorgan235@reddit
I mean if you want you can just put sudo apt-get update && apt-get upgrade in your crontab.... It's probably not going to break anything....
gumbrilla@reddit
My man, unattended-upgrades is the package for this, it's installed by default I would guess in most distros..
Sintarsintar@reddit
It works great just make sure to exclude things you know can break something. Nothing like finding out a auto installed kernel crashed a critical server or pho upgrade and broke half a website.
gumbrilla@reddit
Yeah, I don't use it on production, nor pre-production, but developer instances.. this I don't mind...
Ruben_NL@reddit
There's a difference: my default unattended upgrades only does security updates. The comment above updates everything.
xXxLinuxUserxXx@reddit
fyi unattenden-upgrades is the debian / ubuntu tool. for rhel it might be dnf-automatic (we are full debian / ubuntu shop).
anyway i guess automatic updates should be no big deal for any major linux distro.
saltysomadmin@reddit
Hmmm, need to look into this for my homelab because I'm also rolling dirty with crontab
gumbrilla@reddit
Cool, 20 minutes super easy, it's just a service, edit the config, it's all explained and choose when, should it reboot, and whether to just install security updates which are the main three things I care about..
I think it's even enabled by default in some distros but probably conservative settings.
Naznac@reddit
Well I'm more curious as if there are large scale solutions, like SCCM or azure arc for windows servers.
Then there's the reporting, you gotta have reports to know if they all got patched
Advanced_Vehicle_636@reddit
In Red Hat world, absolutely. Stand up a Sat(elitte) server for large scale infrastructure management using RH (RHEL, RHV, etc). It can be used to define patch cycles (including standing up local versions of your repos to reduce internet-bound traffic), manage licensing, etc. I suspect it can also be used for non-RHEL servers. Eg: RHEL derivatives like CentOS, Alma, Rocky, etc.
I've seen it with larger organizations, and it looked cool. We'll never use it though.
AlexisFR@reddit
But then, who patches the patcher?
Exkudor@reddit
Cron, obviously :)
jesuiscanard@reddit
I'm wondering if this is done in the root crontab and if it doesn't break anything. The apt-get upgrade does hold stuff back if they are questionable to do later as more updates come.
I would understand a dist-upgrade much more likely to break stuff. First small servers do small amounts of automation (literally running a few python scrips etc), it's probably easier to keep a copy of user space and run updates like that.
Delicious-Wasabi-605@reddit (OP)
Exactly. Microsoft has a solid solution built for their ecosystem that is almost point and click. Linux is a little more complicated due to all the flavors but it's still not a huge challenge for most experienced admins.
xXxLinuxUserxXx@reddit
well, if you stay with one flavor - e.g. rhel there is satellite which bundles many things like config, patch management which should be more like the microsoft experience. The only difference is that with microsoft you have to buy licenses anyway. With Linux it's not unusual that the company or people choose the cheaper way than going the full "buy-in" into the full suite.
sybrwookie@reddit
I'll say setting one up at my place was technically not all that difficult, but required tons of teeth being pulled and dragging people kicking and screaming into trusting the process and understanding that it's going to happen and no, they can't just say, "no, you can't patch at the scheduled time and no, we can't tell you when it's OK to patch, we'll let you know when it's OK to patch."
Naznac@reddit
And then you forget to cancel 1 deployment for 800 servers on a Friday night....and no one notices, not even the guy in the OPS center that has an 80 inch screen in his face that looks like a Christmas tree when all those servers reboot...he must have been out having a smoke ... 99.5% compliance on that deployment...never seen that before or since... The only person that noticed was my colleague that received the SCOM email saying a DC rebooted.
sybrwookie@reddit
Heh, nah, I have that shit automated end to end at this point. The only thing we have to do is check on servers that didn't patch usually because either a) the fucking SCCM client died AGAIN or b) Someone decommed the server and didn't bother to tell anyone.
Naznac@reddit
My answer to that is simple: make them sign a paper saying if your server isn't patched and there is a breach because of it, YOU are responsible for it. Suddenly most app owners would rather have the server patched
RonJohnJr@reddit
We call those Security Exceptions, and store them in SharePoint.
Comfortable_Gap1656@reddit
With automation tools you can manage a lot of machines with a skeleton crew. Wait until you find out about Ansible and OpenTofu
Cheveyboy@reddit
Sounds like a bank that ends in Fargo..
Dolapevich@reddit
You'll live to see man made horrors beyond your comprehension.
6-mana-6-6-trampler@reddit
Me reading up on these man made horrors so they are no longer beyond my comprehension.
1a2b3c4d_1a2b3c4d@reddit
You are an edge case of the extreme. Most peeps here have 1-10 servers. Some have 10-100. A few have 100-1000. Very few work with 1000-10000.
You claim to have 100,000 or more physical servers plus all the VMs?
Nothing that works for the small and medium server farms will apply to an environment as large as yours.
nocommentacct@reddit
yeah that fact that you're able to patch "hundreds of thousands" of physical servers running things so various that you can "cut out 20% of them" is kind of impressive. if they were all part of a giant cluster or something it would be more understandable but you're not making it sound that way
btc909@reddit
Sounds like you need a DOCE.
Immediate-Serve-128@reddit
That grammar, though. You'd seem far more intelligent by not using any grammar at all
trail-g62Bim@reddit
When I first started at my current job, one of the first things I did was implement a patching process for servers. We already had SCCM. It just wasn't being used.
After I implemented it, one of my coworkers seemed to get really irritated with me. Turns out, he was logging into the servers after hours and manually updating them. Our boss at the time had a two-for-one policy -- you work one hour after hours, you get two hours off the next day. So, this guy was logging in, pressing the update button and then taking two hours off the next day. The patching schedule was whenever he needed a couple of hours off the next day.
He never stopped being pissed that I took that away from him (ofc that wasn't my goal anyway).
Redemptions@reddit
At least there is not only a patching process, but one that seems to have intentional thought and planning put into it. Obviously there are places for improvement, every org has that.
On the other side of the triangle you have Dweedle Dee with their zero patch mentally, and Dweedle Dumb with their "yolo, double click that shizz and race to the parking lot" "did we schedule or notify the org of this? No, why?"
Or maybe I have Dee and Dumb flipped.
mrcluelessness@reddit
Everything must be patched or the hackers get in. We must run the latest patch to ensure security. We don't have time for testing or notifications. Sadly, work only let's me full send updates at 3 am when only 1/3 of the company is running. They think it would deter me but no it will be patched. An outage costs less than a breach!
Redemptions@reddit
Are you doing it at 4:50 pm on your way out the door? No, then I was talking about you. Calm down.
mrcluelessness@reddit
I used to but then they put an timer on my admin rights I can't make changes between 12 and 5 PM anymore. I don't see the point I usually claim having a migraine to be out by noon anyways as part of my medical conditions they can't discriminate on. I only had an update fail 4 times around 2 PM when I was feeling better that's an amazing success rate. The 3 AM rate is higher. At least at that time I just go home if things go down because there is no one there to complain or help me fix it. Let the other teams handle it.
Redemptions@reddit
All good. Given the fact 33% of the posts are helpdesk or "why can't a get a remote job that pays $200k with my limited skill set and refusal to learn automation or cloud tools" type posts, it might as well be shittysysadmin.....
mrcluelessness@reddit
I refuse to learn cloud and don't have time for automation. I make $200k. I dont understand why you guys shoot them down.
Just ignore the fact I'm an network guy with over a decade experience, 15 certs, blah blah blah. I feel like I only have helpdesk level experience this week everything's fucked.
Kahless_2K@reddit
You would have more time for automation if you did more automation.
RonJohnJr@reddit
Tweedle Dee and Tweedle Dumb.
Redemptions@reddit
I appreciate that. I'm not sure where the heck I pulled Dweedle from, I assume my brain insisted on alliteration. I'll leave it as is for the world to bask in my ignorance.
Squik67@reddit
maybe negotiate a recurring maintenance windows ?
TheThirdHippo@reddit
Nice to see your company is helping to keep the employment stats down at least. I suspect that’s a lot of man hours
cowdudesanta@reddit
Do we work for the same company lol
jrichey98@reddit
That looks like a dream compared to what we go through. Manual patching from WSUS & MECM for all servers, with updates over 30 days forced because some teams just can't be bothered. And they're re-imaging clients about every other month ... I don't want to talk about it.
Had sharepoint go down for half a day last week because the SQL servers that hadn't been updated force applied around midnight and didn't come back up, and it took them half a day because they were troubleshooting connecting to the wrong SQL servers.
I was burning down about 2 weeks of comp time I'd accumulated and was like, well work on it today and if you can't figure it out I'll come in tommorrow. Luckily they figured it out on their own because I was burnt out and needed the time off.
That's what happens when you put the department politician from 30 years ago in charge and in spite of things growing they refuse to ever change anything.