How often do you experience outages?
Posted by electric_medicine@reddit | sysadmin | View on Reddit | 5 comments
Company of 50, I'm managing the infrastructure and have a subordinate for L1 support and simpler tasks.
Usually, I'd say our infra is pretty resilient. 6 servers across multiple locations, 2 HA Proxmox clusters. It's been quite a while since there was a company wide network outage (around half a year ago when one of the core switches gave out and blasted so much broadcast traffic that it brought down all routing) that stopped all work being done (no ERP system, no time tracking system, no e-mail).
Thankfully my boss isn't one of the "what am I paying you for" people and will gladly pony up my monthly salary *because* most of the time there's no outage. Shows him work is being done.
However, the smaller, minor outages or rather inconveniences (or sometimes even preventative stuff) started to add up recently:
* SMART failure of an SSD in a Proxmox array requiring replacement
* Backup disks filling up requiring more disks
* Licenses expiring, requiring renewal and me having to explain the cost
* A lot of me explaining why we need to spend money on IT, but it all gets approved in the end.
And the most recent (around 2 hours ago) was the primary on-prem DNS server going haywire and starting to return 0.0.0.0 on some of our frequently used services. Total time until fix was around 20 minutes, but for those 20 minutes, everyone just had e-mail which worked, oddly enough.
That had me wondering: How frequent are outages and inconveniences for you? How often is it directly affecting users? How often are you the "bad guy"?
I'm lucky that even when there is an outage, sure, my boss would like me to resolve it in a timely manner, especially if it's a network wide outage situation, but I never had anyone breathing down my neck on it.
5 Comments
uptimefordays@reddit
OsmiumBalloon@reddit
Sasataf12@reddit
cmwg@reddit
stlslayerac@reddit