20 plus years in IT and I will be getting my first write up today
Posted by Frankaintmyfriend@reddit | sysadmin | View on Reddit | 739 comments
Been in every aspect of IT over the yaers. I have always had great reviews and never been written up...until today.
Yesterday I was migrating VM's from one datastore to a new one in vSphere. It was during the day, but it was a simple vmotion migrate, so no downtime. While I was migrating, I was cleaning up old datastores and getting rid of them. Not sure what happened, but I looked in one datastore that contains swapfiles and it showed no VM's, so I unmounted it (as I had done other datastores earlier in the day). Unfortunatly, I didn't see the files in the fiels section that contained the vswap files of the VM's I hadn't migrated yet. Unmounting the datastore caused a memory issue and sent the host cluster into HA recovery mode, rebooting nearly every VM! Total downtime was less than 10 minutes, but it took down the phone systems and other critical servers in the middle of the day.
Havn't gotten the write up yet, but I am almost positive it's coming.
So, lessons learned and a warning to others, don't unmount swap file datastores during a migration.
walkasme@reddit
During the heart of 2020 work from home, an org doing contract work for, 14k active users on a system, in a cluster. I have written a script to bounce the cluster for use during patching and extreme situations - not something I would do on a good day. I was so paranoid, the script was never saved on the machine and copy and pasted it from notes....Anyway, I was working on the QA cluster and was doing some patch testing, time to bounce it. Got a call from a tier 2 engineer and logged into prod server. Yes, I bounced prod, with 14k users on.
Group chat just lit up with moaning about how the VPN is overloaded with everyone working remotely and how it was affecting them. The NOC should have noticed the entire farm go down, no one noticed, no said anything and live with this knowing, thanks overloaded VPN for saving me writing reports and change non compliance....
lost6monthstoskyrim@reddit
A company I was doing remote support for were closing down and had a couple thousand employees. We were closing them down in batches over a couple of days as they offboarded the staff and left the site. I was using PowerShell and a csv to disable 100 accounts here and there as it was authorised. Came down to the last 50 or so people and about 8 were the CEO etc and supposed to stay active until the bitter end. The guy sending me the list of accounts to disable kept changing the colour highlighting in the spreadsheet columns and it through me a bit. Fucked it up. Locked the ceo and his chums out, and kept the 40 people who had just left the building. They couldn’t call or email me. I was in uk, they were in Saudi Arabia. A guy had to contact me on WhatsApp. I’d gone off for a well earned half hour poop so didn’t see it for a while. It was hilarious. No one died.
BryceKatz@reddit
In IT for 20+ years & this is the first time you've taken down prod? And for all of 10 minutes?
Those are rookie numbers...
ISeeDeadPackets@reddit
I once inadvertently restored a 6 month old copy of "the big database of everything we care about" over top of the production one instead of to the test environment as intended. Luckily I had a 5 minute old snapshot because I've been around long enough to not trust that "me" guy very much.
cats_are_the_devil@reddit
That moment imposter syndrome saves your ass...
clownshoesrock@reddit
That "me" guy is a total dimwit.. I have to watch over his shoulder the whole damn time an make sure he doesn't cause inconvenience to everyone repeatedly. I spend most of my time building safety rails for his attention deficient stupidity. And building checklists to make sure that the normal stuff doesn't get screwed up by his half assed recollection of how stuff works, as he cant remember minor changes we discuss in meetings..
I'd fire the moron if I could.
noitalever@reddit
Lets not even talk about the way he did things 5 years ago and how it was documented…
Powerful-Ad3374@reddit
Sorry can you explain this thing you mention “documented”. 20 years in the job and I don’t understand the meaning 😂
centstwo@reddit
Today me is much happier with 5 year old me when the 5 year old me decided to right lengthy comments about why things were done a certain way to help out future me(s).
Now when we look at 10 year old code we have to hate on the 10 year old me.
WendoNZ@reddit
Ha, it wasn't documented, so there is no proof of his stupidity!
battmain@reddit
But his 'good' scripts still run...
clownshoesrock@reddit
Could he not be Bothered to learn English?? You know the language he "speaks"!!
anotherhomeysan@reddit
“If you’re not embarrassed by the person you were five years ago, you’re not growing enough”
honpre@reddit
I bet you he doesn't even look at the checklists that you make for him. I hate that guy.
SoSmartish@reddit
CYA are the three most important letters in IT.
1sttimeverbaldiarrhe@reddit
It's like I'm got two voices/peoples - the sales me and the support me.
The "sales me" is cocksure that anything is possible and anything can be delivered on time. He can sound pretty smooth at times.
The "support me" fucking hates "sales me".
muff_puffer@reddit
Honestly this is good advice lol
sliding_corners@reddit
I did this exact same thing a couple of months ago. First big mistake in 8 years. It happens.
JonesCat_55@reddit
This gives me nightmares, and can make it very difficult to press the button.
yes I have a backup, yes I tested the backup, yes I backed up the backup and took a manual backup of the database.
Still I have that little voice telling me I did it wrong, I’m going to fail, it’s all my fault! So take a big breath - one last cya google or image copy maybe and hit that button, otherwise nothing would ever get done.
ISeeDeadPackets@reddit
I literally just got distracted for a minute and forgot to check a single box. Scared the piss out of me but a quick revert and a sync script I'd already had prepared in case we ever lost any time due to a rollback (though me causing it wasn't what I had in mind at the time) made it a near non-event. I tell it often to new team members to let the know mistakes happen, it's all about making sure you have a good plan in place to deal with the unexpected.
justcbf@reddit
The first thing I tell new helpdesk staff is they're going to fuck up, but not to worry as I haven't given them enough access to have a significant impact, and they need to learn from breaking shit.
The first thing I tell newly promoted junior SysAdmins is that no matter how safe they think a change is, have a backup plan, then a backup to the backup plan. If they can get out of the shit in minutes it's probably not going to cost them their job, nor me mine. Of course they don't really have enough access to create mayhem until they've proved themselves with non production systems.
Everyone makes mistakes. It's how they recover that proves their worth.
UltraEngine60@reddit
what are those?
merlyndavis@reddit
How long before you show them that secret stash of scripts you use to do your job so you don’t screw it ip yourself? (I had so many scripts to do all the dangerous, and not so dangerous, tasks. That way, all my lessons learned were contained in there, and that was backed up twelve ways from Sunday)
True_Maintenance5846@reddit
Never work an vital systems when a user is talking to you, learned that lesson the hard way.
Turbulent-Falcon-918@reddit
Ah the split second you find God and are born again between being finished and seeing what happens : then it’s back to complaining about the coffee lol
OkFunction8532@reddit
The restore button terrifies me, lol. I triple check before I hit the button, especially if we're restoring from a very old image because I'm convinced I'm going to write somebody's single junk word doc over a prod db (even though I know it doesn't work like that). But we just got an entirely new system and I put in a ticket for a vendor tech to do a walkthrough of a mailbox restore. I ain't messing that up if I don't yet understand how it works and what options I need to mod to get it to work. But when it does work it's great 😊
calcium@reddit
Early in my career there was a dev that had written a DB query that was to search the entire DB (several billion rows) and remove certain rows if they matched a specific set of parameters. He launched the query against our production DB and expected the query to take several hours to run but was perplexed when it took only a few minutes. Took him about 15 minutes to realize that his query did the opposite of what he wanted to do and nuked the entire production DB for our active online store that served millions of customers.
Took our DBA's around 5 hours to get the store back online and another 12 to get everything working the way it was supposed to. Heard we lost around 15 minutes worth of sales for a fortune 100 company. I was certain the guy was going to get canned, but he's still there nearly 20 years later.
Upside of that is that a lot of controls were put in after that. Mandatory code reviews, sign-offs from managers for database commits, and changes to be looked over by DBA's before changes to the DB could be made.
lrdmelchett@reddit
I find SQL deletes work best without where clauses. ;)
ISeeDeadPackets@reddit
This job is all about figuring out how to get that right of boom stuff in on the left of boom side before the boom happens instead of after. Sadly, a lot of that is tied to purse strings that don't get loosened up until the boom happens and everyone wants to know why the changes weren't made sooner. That's when the special folder of "things I begged for but you said no to" gets to make an appearance.
notthetechdirector@reddit
That’s the same guy that I have to watch out for!
Surface13@reddit
In my younger years, past "me" was always a dick who did things that was future "me"'s problem.
Now in current times, I make sure past "me" doesn't fuck future "me" over. It's been a lot better for all of our mental health (past, present and future "me")
True_Maintenance5846@reddit
You never need a snapshot...until that devastating moment that you do
Delta31_Heavy@reddit
Not…gasp… the BIG database?!?
ISeeDeadPackets@reddit
Yeah, the one that tracked a few billion dollars. People get salty when a few billion dollars goes poof!
torbar203@reddit
I had the opposite happen
I was doings something with , didn't realize it had a 6 month old checkpoint on it, and somehow deleted the files related to that so the whole thing rolled back 6 months.
Luckily I had a Veeam backup of it from like 10 minutes before so I was able to recover from that.
DarthTurnip@reddit
Hell, I have a gray beard, and I make backups of my backups
DeHub94@reddit
Yeah, op deserves a raise not a write-up.
vkay89@reddit
lol yep, try web consoling onto and ctr-alt-del what I thought was a Windows server to hit login, only to realise it was a companies FW and inadvertently took a nationwide company offline for 30min. They invested into HA after that.
Appropriate_Unit3474@reddit
Shit I used to cause broadcast storms when I wanted to work Saturday.
illicITparameters@reddit
I refuse to believe someone is a seasoned IT Pro if they’ve never accidently restarted a server instead of their workstation….
jokebreath@reddit
There should be a Murphy's Law type of rule for when you have 20 SSH connections open and accidentally reboot the wrong server, it will always be the most critical of the 20.
newaccountzuerich@reddit
Mollyguard.
Shell command substitution wrapping server reboot/shutdown commands; makes you type the host-name to continue. Wakes you up when you fail with the server name you thought you were bouncing!
jokebreath@reddit
Holy shit, that's genius, thank you!
newaccountzuerich@reddit
Apologies, the actual package name is "molly-guard".
Originally written and maintained by Martin F. Krafft (madduck) back in the mists of time.
Debian package details at https://packages.debian.org/bookworm/molly-guard
RandomPhaseNoise@reddit
Molly trap saved my ass. Not once...
SwitchbackHiker@reddit
I was installing updates in test but was ssh'd into both prod and test. Accidentally updated prod, and it was the only time an update ever caused an issue on that system.
Grimsterr@reddit
Forgot I was still SSH'd to another system "shutdown -r now" and then my system doesn't start rebooting.
"Shit!"
dasunt@reddit
We call that an unplanned DR exercise.
If it fails, then either it wasn't important or it lacked the necessary HA/failover config. Use it as a learning lesson.
FKFnz@reddit
My favourite is to schedule a reboot for 11.45PM, when nobody is using the server. And then realise at 11.46AM that I forgot the difference between AM and PM. Again.
revellion@reddit
And this is why 24hr time should be universal. And AM/PM abolished to the history books of stupid things we used to do.
RandomPhaseNoise@reddit
With other date formats which are not ISO.
And daylight saving with time zones.
We will not enter the real space age until we do that.
newaccountzuerich@reddit
Servers should always be 24hr UTC. Never any worries about morning or evening or what timezone is it or did DST kick in there yet...
Anyone that disagrees gets to enjoy the freedom of local conversion if needed.
revellion@reddit
And this is why 24hr time should be universal. And AM/PM abolished to the history books of stupid things we used to do.
Otto-Korrect@reddit
Or given a shutdown command instead of restart to a physical server they don't have access to.
illicITparameters@reddit
I’ve never done that. I don’t deploy shit without iDRAC, because I know if I did I would’ve done this.
anxiousinfotech@reddit
I had to learn the hard way that you always verify the iDRAC is functional before rebooting the physical server.
The joys of running ancient hardware...
RandomPhaseNoise@reddit
And you realize that the old dumb switch for all idracs broke in the afternoon.
MBILC@reddit
Sometimes you had those above cheap on on the smallest things...until you explain to them the actual cost of being down, time for you to get there, et cetera, and even then, they may still say no to an additional $200 or so on a $5k server...
illicITparameters@reddit
I’ve never run into that. Pretty sure if I did I would’ve started looking for new work.🤣
MBILC@reddit
Not everyone had / has the option to find a new job, for me it was an industry very close knit where I was living and all my experience was self taught and in my early 20's as a solo in house IT person. But after that event, I learned how to quantify requirements for what I needed to buy and didn't have any problems after that.
jackmorganshots@reddit
I mean, how else are you supposed to learn what wake on lan is?
Skyyk9@reddit
Once while working as a QA dude for a company that resells excavation equipment. I set everything to be 1 penny. I said Oh-No (queue Tom Scott and the Oh-No second).
My dearest developer fixed it in mear moments. But I was sufficiently panicking.
RandomPhaseNoise@reddit
Black Friday sales :)
joshuamarius@reddit
Or WOI for extreme cases :-)
trail-g62Bim@reddit
Issue restart command.
Close window.
Wait...did I issue restart or shutdown?
Ping the server.
Not answering.
Not answering.
PLEASE FOR THE LOVE OF GOD ANSWER IT'S 11PM AND I DON'T WANT TO DRIVE TO THE DATACE-
Answers ping.
Oh thank god
Otto-Korrect@reddit
SO many times. And we have some Lenovo servers that have incredibly long boot times, so it can regularly be well over 5 minutes. I've had my keys in my hand ready to head in a few times before the ping started replying.
RandomPhaseNoise@reddit
This must he some IBM legacy...
I met an old p3 Xeon server with multiple processors which neded about 15 minutes to get to the LILO. (The linux bootloader before grub) .
There was no hw problem with the server, it was one or two years old then. Just took ages for the bios and extensions get together.
FlickeringLCD@reddit
XCC is your friend! (or iDRAC, or iLO, pick your poison)
af_cheddarhead@reddit
Dell systems for me, those things take FOREVER to reboot. Even after 10 years it still amazes me how fast a VM can reboot.
catherder9000@reddit
Yes. A VM compared to something that depends on iDRAC to get booted first is like an eternity of suspenseful wondering.
trail-g62Bim@reddit
Same. It also makes you start to wonder if that particular server wasn't set up to answer ping.
_Dreamer_Deceiver_@reddit
Some of them take ages to get to the bios though and just sits there with "no signal" for 5 minutes
cluberti@reddit
Yyyyyyyyyyyup. If the machine wasn't in walking distance from wherever I might have been expected to be, it was attached to an IPKVM and had an ILO/DRAC card. Period, because I was an idiot. I still am, but I was, too.
Urgazhi@reddit
This is standard at my company and it drives me bonkers (programmer with too much access).
joshuamarius@reddit
This is where WOL has saved me a few times :-)
merlyndavis@reddit
Every server admins nightmare. Every time, the praying that the server comes back up…
Status_Baseball_299@reddit
Did this when I was upgrading firmware in a site where the site manager lived two hours away, I waited for 20 minutes anxiously but it came Back online. Always it’s the first you start praying
Sufficient-West-5456@reddit
Amen
mnvoronin@reddit
That's why we have iLO (or your preferred vendor's equivalent).
0o0o0o0o0o0z@reddit
HOLY SHIT, reading this gave me anxiety... been there and done that more than a few times.
_Dreamer_Deceiver_@reddit
"er hi, I just got a notification that a server is down, can you go check?"
MBILC@reddit
When you get the point you know almost down to the exact minute how long a server should take to reboot... and when you start getting seconds over that.. the panic kicks in........
dreamfin@reddit
Mostly when doing remote updates. Reboot the server, ping coming back takes foreveeeeeeer... On servers idrac/ILO saves the day but switches makes you sweat.
ThatOtherITDude@reddit
F3ndt@reddit
Can relate Greetings Sophos admin
thortgot@reddit
Fortigate upgrades also take exactly long enough to make you start worry.
ScriptMonkey78@reddit
EVERY.
DAMN.
TIME.
Top_Outlandishness54@reddit
Done this so many times
PoniardBlade@reddit
Oh, network issues on a remote server, easy, I'll just go disable the NIC and enable it again...
battmain@reddit
And as you hit the enter key, you utter Fuuuuuuu, fuck fuck. As ping goes silent, lol
sprocket90@reddit
I spec hp servers and now always get ILO licensing for it because of this
KarockGrok@reddit
Of all the things I can type, I can type
shutdown -a
faster than anything else.
j2thebees@reddit
Bahahaha! :D It's been probably 2 years, but it was an embarrassing text to a friend who is still on good enough terms to have a key (and an expert in the field, thus the embarrassing part).
MBILC@reddit
*raises hand* - Did this once on a VM host we had, which was a cheap server and had no idrac/ilo...My early days of learning virtualisation still..
Luckily it was done on a weekend, and I live a 30 min drive from the office.. but still, it did impact some parts of the company as we were a 24/7 operation.
After that, i spend weeks making sure we were 110% redundant and redid most of the infra I had originally set up, never happened again and ever server had proper remote management going forward..
If we are not failing, we are not learning right?
af_cheddarhead@reddit
Make a typo on the new IP address of the remote system, then lose all remote access to that system when you confirm the change. "Whoops, I'll be right back."
blyatspinat@reddit
IPMI is your best friend then :)
spobodys_necial@reddit
Used to update physical network devices that always took a good 10+ minutes to reboot and I'd sweat every single one of those minutes until it came back.
Otto-Korrect@reddit
I rebooted a Comcast fiber router for a point-to-point last night. First time I've had to do it. All lights came on steady and stayed that way for FAR too long for comfort. 5+ minutes, felt like 30.
Clivna@reddit
or shutdown instead of logout after rep.
ReverendDS@reddit
Doing some overnight work on a server that was about 2 hours away.
Wasn't paying attention to which cli was which, and sent ipconfig /release to the Hyper-v host instead of the VM I was working on.
Fortunately, the warehouse had some 24/7 folks and after they started calling, I was able to walk one of them through accessing the host and renewing the IP lease.
Only took about 2.5 hours to get them back. And that's not the worst outage I've caused.
Otto-Korrect@reddit
You had a host on DHCP? I never put any server or infrastructure on DHCP. Just one more thing to break in the event of network issues.
ReverendDS@reddit
Completely agree, but wasn't my choice or setup.
It did spark a project to ensure that all infra was updated to static and servers to reservations.
limitedz@reddit
That little bit of anxiety when you're rebooting a server that's hundreds of miles away...
battmain@reddit
Or connected to the wrong server because of a one character difference...
illicITparameters@reddit
Done that as well.
srbmfodder@reddit
That feeling in the pit of your stomach as the disconnect message shows up
illicITparameters@reddit
Yup….
runasadministrador@reddit
…. Why hasn’t it come back up yet !!! (Glued to ping time outs) 💀
awkwardnetadmin@reddit
IDK that's nothing. I once accidentally rebooted a router at an ISP. The senior engineer across the room was like "what did you do" from across the room because he told me to reboot the NID and somehow mixed up the tabs on Secure CRT when I got back to my desk. Somehow the only thing my boss told me was don't do it again.
illicITparameters@reddit
That’s what your boss was supposed to do. That’s the appropriate response to a mistake. I’ve never gotten mad at a mistake. I’ve gotten mad at laziness. There’s a difference.
SilentLennie@reddit
I ones had a remote desktop open on a physical Windows server in a datacenter and had the network properties window open and my mouse twitched and clicked the disable button.
So yeah... I had to get to the datacenter.
Some of our first servers didn't have any remote KVM.
dhardyuk@reddit
I spent 45 minutes holding the reset button on my 486dx2 66 cad station.
I intended to poke the turbo button (yes we had a physical button to make it compute harder back then) but I missed and hit the reset button.
If I let go it was going to reboot the PC.
I had waited 6 hours for my AutoCAD r12 for DOS to finish regenerating 13 A0 drawings covering the length of the A228 Leybourne and West Malling bypass from Kings Hill to the M20 J4 (and the proposed but never built J4A)
I need the plot files for all 13 drawings to be exported to a floppy disk so someone could copy a:*.plt LPT1 and get them printed for tomorrow.
If I let go I’d have had to start again.
I had a colleague that needed to remote onto a server and restart the Routing and Remote Access service in an office 200 miles away.
The muppet had the choice of Stop or Restart for the service. He hit Stop. But it was me that had to drive to Northampton to click start.
All of which reminds me about the stitched bridge deck.
If you are driving on the M20 in Kent and look up at the bridges as you go past junction 4 you will see that the bridge nearest London is actually two bridges next to each other. The junction originally had two concrete bridges forming the spans over the motorway. When J4A was ‘value engineered out’ the alternative was to increase the capacity of J4, taking the London side bridge to 4 lanes and leaving the Maidstone side bridge at the original 2 lanes.
The engineering challenge was to keep the existing concrete bridge deck and join it to a steel bridge whilst having a single carriageway contiguous road surface over both bridge decks. The stitch bridge solution was how the two radically different materials were spliced together.
You can’t tell from on top, but from underneath the different structure are really obvious.
sorderon@reddit
The old mstsc roulette! - Thankfully shutdown/restart is removed, but you can still shutdown /f s /t 0 last thing on a friday .....
Sintarsintar@reddit
I'd like to add a few things. broken production, unplugged the wrong thing, typoed an IP, or shutdown the wrong thing then yeah I highly doubt it
Writing someone up for it when it's one off occurrence is stupid and counterproductive.
j2thebees@reddit
A bit off base, but when asked how to pick a beekeeping mentor I always tell new folks, "Find a local keeper. Ask them how many colonies they've killed. This should be an integer >20, with no back-peddling excuses. That's your mentor."
Same goes for anything. I haven't done a royal botch in a while, but it's 1:28pm. Probably a good time to go home (contractor). You get much better at backing up your backy-uppy backups, when it comes to data loss, ... but no one is perfect.
dreamfin@reddit
No, but shutdown.... lol.
radiomix@reddit
Like an idiot I disabled the wrong NIC on a physical server at a data center, thus disconnected the only access I had. The second I hit Okay I screamed. My wife asked what was wrong and I just said, "I'm an idiot and have to go into work". Drove the 20 minutes to the data center and just finished the task while sitting in a folding chair at the DC to punish myself.
MrSanford@reddit
I'm 20 years in and I've never down that.
SmoothBrainedLizard@reddit
Not quite as severe, but I rebooted everyone's computer in my building last month. I meant to just do one specific one, but I'm my haste I forgot there was two tabs, one for the specific computer you have highlighted and then right on top of it is the tab for them all. They look essentially identical. Hit restart confirm and I had 4 calls in about 30 seconds. Oopsie.
Credibull@reddit
Buddy had two windows open, one to a prod server and one to his workstation sitting behind him. Issued a reboot on one and closed both windows. "Wait.... which did I reboot????" He muttered. There was a lot of stress between both of us until we heard the hard drive on the workstation start churning after a minute or so.
NoURider@reddit
Just the other day! Built out a new SQL server, made a duplicate of an existing remote connection - modified the display name, but failed on the Connection - oops. Reboot production SQL. Glad it was virtualized...came up fast that the only one who noticed was apparently me. ID 10 T error
ElectricOne55@reddit
Damn ya I had this happen before was sweating like crazy. Thought I was on my regular computer and hit restart on the other server.
kingcobra5352@reddit
My first big IT mistake was this. I got my first IT job at an MSP as just a bench tech. After about six months they had allowed me access to certain companies’ AD. I was logged into an AD server for a client and clicked shutdown instead of log off. Luckily, that client was across the street from our office, so I just walked over and turned it back on and had it up in five minutes. I never heard the end of that from my seasoned coworkers.
pertymoose@reddit
Or deleted a production database instead of a test database
(fortunately SQL server is an obstinate ass that just put it in single-user mode instead of actually deleting because damn that would have been a bothersome restore)
ScriptMonkey78@reddit
I may or may not have kicked off an in place upgrade on my laptop instead of the workstation I was remoted into...
Key-Calligrapher-209@reddit
Just the other day I went to reboot a VM that wouldn't impact production, and accidentally rebooted the whole hypervisor instead. I'm breaking shit like a pro over here.
renegadecanuck@reddit
Yeah, I accidently clicked the wrong button on my RMM once and sent a "reboot now" command to every single server we manage.
The cherry on top of that was our PBX being one of the servers that restarted, so people called in and got a "Your call could not be completed" message.
Seth0x7DD@reddit
Huh, so that's how you get people to stop bugging you when stuff happens?
Grrl_geek@reddit
This tech knows how to REBOOT!
MBILC@reddit
OWN IT ya.. if you are going to do something, do it right, that includes breaking things, go big or go home.
Rick-powerfu@reddit
Bro might be Neo
Lu12k3r@reddit
“No change Friday” just a lil change right here… no one’s gonna notice…
Upper-Affect5971@reddit
Amen. Fucking kids.
leob0505@reddit
Seriously ! I remember when I was migrating some users ( 500 ) from exchange on prem to m365. I don’t know why but in the csv file comparing source with destination but I skipped one row from everyone. Which means that instead of migrating user1 to user1, I started migrating user1 to user2, user2 to user3 and so on.
Imagine my face on the next Monday when I realized the mistake and suddenly one of our interns had the email messages of the CFO on their inbox
Mattythrowaway85@reddit
Shit! How did that go down after? You keep your job and able to resolve them rather quickly?
leob0505@reddit
Luckily the migration had the possibility to roll back. ( when we migrated the data, we still had the legacy data in the exchange on prem environment).
I made clear to my boss what happened, where it did go wrong, and how I planned to fix this ( delete all 500 accounts , and redo the migration from zero), however that means we would have to postpone the golive date of the migration one week after.
He saw how “scared and anxious” I was with the whole situation, and he said “we are not doctors, we are not doing surgeries. Do your best to achieve what you said in one week and if someone tries to escalate this, I’ll handle any annoying stakeholders. Best thing is that now you’ll never miss any data from any CSVs for the rest of your life because of this migration lol”.
And then things sorted out. I never forgot the example that manager gave to me in my life, hence why today as I am an IT manager, I always try my best to protect my team from the office politics, escalation drama that always happens with sysadmins…
aheartworthbreaking@reddit
That sounds unfortunate
RCG73@reddit
Did ya loose data? No. Then shit happens, please don’t do it again
Wise-Reputation-7135@reddit
Now an actually serious one I've experienced was a kid who physically hard-rebooted the NAS and then when it came up 4 HDDs were completely toast. Completely broke an entire raid, no spare drives on hand, it was busted for like 3 days while we waited on an expedite then recovery.
Valheru78@reddit
We recently had a trigger-happy fire suppression system in our brand new state of the art (EU) university datacenter. Apparently they installed the wrong nozzles on the cylinders so when the system kicked in a high frequency wave went through the servers causing a disconnect of all the backplanes to the raid sets.
After this we had to manually go on to the raid controller bioses to restore and import the configuration. Managed to save 95% of the almost 5 peta byte of astronomical research data.
Some of these projects had been running for 11 years and we're literally the life work of the astronomers involved.
Yes we needed backups but did you ever try to backup this much data? There really is no funding for that.
Biggest oopsie i ever seen, luckily not my fault.
Sgt-Tau@reddit
Higher ED really has no excuse when it comes to backing up mission-critical data. Usually, someone has experienced the trauma of no backups by the time they finish their bachelor's. When it comes to "life's work" you would think they would have that tripple backuped.
It was a culture shock when I worked for higher education after spending about 20 years in the corporate IT world. Penny wise and pound foolish describes most of it. I just about wept when we would send good equipment out to recycling because it couldn't or wouldn't be stored until it could be repurposed elsewhere. Then there was the way they abused "other duties as assigned."
AussieDaz@reddit
Same thing happened to me years ago, routine fire system test didn’t go as planned. Pressure wave caused head crashes on a SAN and we lost multiple TB’s of data.
thestupidstillburns@reddit
Fire suppression related, not data loss, co-worker of mine dumped the halon system on accident.
Pazuuuzu@reddit
Is it still that expensive on tape?
YodasTinyLightsaber@reddit
Higher Ed is WEIRD. You still need licenses, iron to run the backups, staff to manage and remediate problems. I could see this being a thing.
merlyndavis@reddit
Damn….that sounds like the debacle I went through in ‘98…
Wise-Reputation-7135@reddit
Much more recent than that lol, not you I promise!
north7@reddit
Naw, my data be tight.
Cloud-Attached@reddit
😂😂🤣😂🤣
RCG73@reddit
I would try to claim bad spellcheck but nope that was all me. God I’m tired. Is it ok to add whiskey to your coffee on a workday? Asking for a friend
lpbale0@reddit
It's situational.
Just a standard run of the mill shitty day, then no.
The HMFIC, your boss's boss's boss with whom you have a close relationship both in and outside of work, getting shit canned for [literally] political reasons at a board meeting and having to go to him and start asking for his technology and onboard his replacement at midnight after countless hours of closed door meetings... Yea, I have a bottle hidden at my desk for such occasions.
Mammoth-Variation-76@reddit
Pro tip: if you add Rum at any point you become a pirate instead of an alcohol abuser.
sicklyboy@reddit
Depends on who's asking lol
Brufar_308@reddit
Only if the day of the week ends in Y, or so I’ve been told.
Notkeen5@reddit
Do you mean ‘lose’?
RCG73@reddit
As I mentioned in another comment I can’t even claim bad spell check I’m just tired and misspelled, it’s been a long week. And now I think it’s funny so I’m gonna just leave it rather than fix it.
hardboiledhank@reddit
I mean, at a minimum he is 38…
TheSusWalrus@reddit
Hahaha!!! I worked for a now large cellular corporation in the late 90s. And I took the billing system for for 3 days straight. Back when each text message was 25 cents a piece!!! Oops!!!
You’re still taking rookie numbers…. Tell them Oops…
WesBur13@reddit
Pshh, I was involved in the recovery from backup when an entire host’s datastore was nuked. 8 hours of downtime and a hard lesson for a few techs.
DeadlySoren@reddit
Seconding this comment. I've only been in IT for 4 years and while I haven't taken down Prod yet, I did work with a guy who accidentally restarted the developer VDI pool on his second week. There were a lot of angry devs who did not save their work to onedrive lol
superiormirage@reddit
Right? I took down a retail store for half a day, costing the company around $50,000.
My punishment? I had to develop and write up new procedures to make sure what I did never, ever happened again.
Ttwister@reddit
Word
thursday51@reddit
LOL...dude that's the first thing that crossed my mind too. 20+ years without tanking prod during a workday is either insanely good luck or far to risk averse
gdwallasign@reddit
Running a pentest in a small regional bank, had a sparc core processor and a light breeze took down banking.
YouCanDoItHot@reddit
I've taking down prod a few times in the 30 years I've been doing this, but only got wrote up once, for saying someone was a dumbass.
battmain@reddit
For me, my one time was 'Get the fuck away from me NOW! ' as I was trying to deal with an incident, start incident bridge, and send out IT notifications. Note that I had told him politely, multiple times in the minutes before, that his issue was a single user problem, completely unrelated, and I would deal with it after, since a lot of people were currently affected. Kept touching me as I was trying to think and type.
I was unprofessional and he felt threatened because my fists were clenched. YES! Don't fucking touch me!
The most amusing part, the write up was a year later by some fuck face of a manager retaliating because I refused to let him yell at me. Even HR person that called me in, said it's a formality, but it will go in your file with a wink. (Meaning it went into the bit bucket. ) HR was also surprised by the timeline.
Trikecarface@reddit
Haha I did 300 clients at 5.30 on a Friday, took all weekend to fix. I was not popular with the colleges for a while
True_Maintenance5846@reddit
I was about to say.....I took prod down first week at my job. Gotta set th goddamn standard early.
pizat1@reddit
I took down prod dB vlans on our dc cores two weeks ago for an hour. Fixed it and life moved on. Filled out a rca form after I brought them up and live moved on lol. That wasn't the first time. Shit I deleted all the vlans on a very populated port channel two years ago. Took out our main día circuit and 10 other services and we fixed it and life moved on 😂😂😂😂😂. They didn't get mad at all.
FailedCriticalSystem@reddit
CrowdStrike has entered the chat
battmain@reddit
Lol! Our phone queues were an hour plus wait times that day.
AegorBlake@reddit
Our helpdesk has taken AD down for longer.
PainOfClarity@reddit
lol sadly I fully support this comment from hard earned experience
nccon1@reddit
Amateurs. Over 25 years I’ve been chewed out and written up a “few” times.
fried_green_baloney@reddit
I was close to an outage of the "half an hour more and it makes the Wall Street Journal" and nobody got written up. Just a calm post-mortem and some guidelines on being sure emails and Slack messages were more clearly written.
Shujolnyc@reddit
This is not write up material. It’s “you fucked up Bob” material and drinking stories material.
battmain@reddit
LMAO! Right? Even doing things by the book during change window has caused production incidents.
nightwolf92@reddit
Took my entire business unit down by running a script and it changed everyone’s ad profiles to only be allowed to sign into a hostname that was the name of our town.
Did a set-aduser command with -l for location but for some reason it turned on the hostname restriction.
GByteKnight@reddit
Right? This guy is either the luckiest sysadmin ever or he’s more full of shit than the bottom of a birdcage.
Frankaintmyfriend@reddit (OP)
Neither. I have rebooted more than my fair share of servers over the years. This is the first time at this company. First time in 7 years.
evolutionxtinct@reddit
Dude I’ve been at it 23yrs and have some burn marks dang
Frankaintmyfriend@reddit (OP)
No, sorry, maybe I didn't explain it correctly, I have taken down MANY servers over the years, moreso in the first few years. This was the first time at this company (been with them for 7 years). I double check everything now...or normally do.
And happy to report that so far, no write up.
MrFirewall@reddit
I took a whole office offline for 15+ minutes while working on building their new firewalls. I thought it a good idea to erase the new ones to redo the configuration. I was wrong, I wiped and rebooted the production ones.
Luckily, I had them just move the cables. 15 minutes later, they're on the new firewalls and 3 hours of post op later everyone was fine.
Mistakes happen. Just own them ASAP and provide a write-up on how it happened and what will be done to avoid it in the future.
Pazuuuzu@reddit
Yeah the writeup will be something along the lines of "please be more carefull next time"
Kakabef@reddit
Amateurs man.
GodMonster@reddit
I once powered down both core switches at a site at once to do a swap because we had downtime scheduled for the site, but we also had server maintenance scheduled during the downtime and the cluster wasn't brought offline, which used the core switches to connect to their data store for quorum. The whole cluster lost quorum and we had to spend about 18 hours rebuilding 26 servers from backup. I didn't get written up because it was done at 2am on a day that had started at 6am, but I was told not to do it again, very sternly, and got to spend my flight home rebuilding servers over the plane's wifi and VPN.
1h8fulkat@reddit
I always say, if you haven't been taken down prod it's for one of 3 reasons. Either you're perfect (and nobody is perfect), you're a liar, or you don't do jack shit.
Maro1947@reddit
"We call this Lunch break"
turudd@reddit
Let me tell you time I Dcpromo’d a windows 2000 backup domain controller during a 2008 upgrade….
killer2239@reddit
Lmao that was my first thought.
Baylordawg16@reddit
Agreed, we all take shit down at some point. And its never a good time to take it down.
capyburro@reddit
I accidentally nuked the home directory on a server that wasn't in prod yet but was being built. Since then I've basically been crippled with doubt. I'm terrified when simple tickets come in. I just updated our dev servers after two months because I'm convinced something will break and I won't be able to fix it because I'm literally the only Linux sysadmin and this is my very first sysadmin job and I don't know shit. And the prod servers are painfully not updated because I'm scared to do a simple yum update.
3cit@reddit
Just the other day I took out our DHCP servers on a Friday, during homecoming with our biggest expected football crowd of the year. Ive only been officially employed for 2 weeks at that point! I didn't even have a folder for my write up to go in
Sintek@reddit
Yea.. they haven't written him up yet because they are working on his raise.
sinclairzx10@reddit
Come back when you take a Datacentre offline ya rookie … still no write up…
twoheadedhorseman@reddit
When I interview people for senior roles if their biggest failure is I took down production once that usually raises an eyebrow
TheLastOfTheWhite@reddit
Lmao yeah, I was not even 3 months into my first full time IT position and took down the edge firewall supporting the traffic of maybe 5k people for maybe 3 hours and got nothing beyond having to write a report.
Seranfall@reddit
If you made it 20 years without taking down production during the day are you really living?
frame45@reddit
This ☝️
TheGooOnTheFloor@reddit
I've never been 'written up' but there is now a section in the employee handbook that lists my name 3 times.
Shamr0ck@reddit
He probably has an actual test environment, not mad just jealous.
Shamr0ck@reddit
He probably has an actual test environment, not mad just jealous.
satchelsofgold@reddit
I take down prod weekly purely out of spite and then tell em "Nagios says no" when anybody asks if we were down just now
trouphaz@reddit
That's what I was thinking. I rebooted our prod database server instead of our prod app server because I connected my console to the wrong box. Luckily, the problem was because my boss labeled it incorrectly so I didn't get too much heat. Later though, when I accidentally corrupted our prod database's SAN disks by accidentally syncing some luns one way and some luns the other way. That took a full weekend to recover from with very little sleep.
elecboy@reddit
In my second week as a SysAd, I changed a firewall rule on a FortiGate, which dropped internet access to everybody. I had to call a consultant to fix it, and it caused almost two hours of downtime.
I learned from that mistake; I was later a consultant at an MSP, and I got calls all the time for the same thing that happened to me from other SysAds, live and learn.
MyClevrUsername@reddit
If I was written up for every mistake I’ve made I would have been run out of IT decades ago.
whitewail602@reddit
That's why good orgs consider every mistake to be the orgs mistake, fix it, and then figure out how to not let it happen again.
Tetha@reddit
From recent experience, up to a degree though.
For my current juniors, their mistakes are my mistakes. And I'm perfectly ready to square up with anyone in the company about this. And I build our infrastructure with 1 severe mistake tolerance wherever possible.
But the guy we recently let go... jeez. But that's the org's mistake I think. I or our team lead should have recognized that people started to straight up give up on them due to a lack of improvements.
Urgazhi@reddit
It would be silly to fire somebody after spending that much money and downtime to train them.
Candy_Badger@reddit
I have taken production down multiple times. I once just shutdown a server with VPN, which was the only way to get to servers. Had to drive \~100 miles to power it on.
norcaldan707@reddit
Lol, yea, 10 min ain't shit..
Coworker changed all passwords.. 4000 users... Turned his machine into a mail server . Coworker leaning against a rack, some how tripled, and ripped out multiple fiber links..
I wouldn't even loose sleep over it.
Shit happens.. accident? Carelessness, or ignorance, all depending on the above.
oracleofnonsense@reddit
Pour out some Mountain Dew for the homies and move on.
A_Random_Encounter@reddit
Absolutely. I spent about a decade in Hospital IT - typically nothing strictly patient care related but some pretty big tangential systems still. In my first year I took the whole hospital down for about 20 minutes by sheer accident because I'm a clumsy fool. Intranet, voip, internet, everything.
I'm quite a bit more careful now.
itsmuddy@reddit
I once ran a deletion script against our entire DB and didn’t even realize it until I noticed it was taking way longer than it should.
Felt better when my boss did something somewhat similar a year later.
WraytheZ@reddit
10 minutes.. hehe rookie numbers indeed. My first big oops... was massive, country level f-up for 20 minutes. Boss gave me a "well, no point shitting on you - you'll never make that mistake again"
Mistakes happen, its how you deal with remediation and responsibility (owning up) that is important imo.
Delta31_Heavy@reddit
Yep! Try taking down the internet because you missed the end quotes in a PAC file. Or leaning on a power strip in a data center…whole rack down.
zipcad@reddit
and he got in trouble for it? sounds like a bad place
eastamerica@reddit
You gotta get those numbers up!
(actually, don’t. Good on you)
CriticalDog@reddit
Many years ago as a Computer Operations drone, I had to respond to red lights on servers that were found during my evening shift.
Found one on a blade console. Called HP support, and they had me UNSEAT the damn thing, then push it back in.
Turns out it was just involved with our VMware cluster and I took down about 15 production servers.
Didn't get written up, but it was a close, close thing.
bungee75@reddit
It's 10 minutes, that doesn't even count.
Necessary-Peanut2491@reddit
I once took down the Amazon Seller Central homepage for 45 minutes with a botched deployment.
Didn't get fired, but it was close (for unexpected and dumb reasons, lol).
newtrawn@reddit
ha! That's exactly what I was going to say.
Strict-Ad-3500@reddit
The key is to avoid a change management request. Instead of upgrade failure you will be saved production hero!
lustriousParsnip639@reddit
Can here to say the same thing. OP should have the spam can of shame on his desk until the next person fucks up.
daffy_69@reddit
Yeah, in my 20's I downed a production line that made $1M USD per minute, for nearly and hour.
Maxwell_Perkins088@reddit
25 years, no production downtime here. Any button push that could take down a system was done after 5pm. “What’s the worst possible thing that could happen and is it worth it ?”
A10010010@reddit
OP, add this to your resume and you’ll get hired on the spot.
tankerkiller125real@reddit
I took out an email server for damn near 3 fuckin days. This is basically a high school level rookie.
(Make sure your backup servers are connected to high speed networking.... The previous IT guy had it connected to a 100Mb switch and I didn't figure that out until the restore was damn near complete.)
cagedgosling@reddit
Amen to that
wavemelon@reddit
Yeah, for that long a winning streak you should get an award, not a write up. Also. Everybody makes mistakes. If your boss doesn’t just laugh it off and cover for you with management then he or she’s an arse.
kali_tragus@reddit
So much this. If you want nothing to happen you instil a fear of making mistakes in your team.
You make a mistake, you learn from it. You make the same mistake again, ok, if you can't learn from your mistakes then maybe it's time for a write-up.
YetAnotherGeneralist@reddit
Perfect savant or never does anything? You decide!
Roanoketrees@reddit
Getting wrote up for that? Thats nuts. It was an honest mistake.
Cisco-NintendoSwitch@reddit
Been in Infra 2 years took down our Linux Environment by blowing up our RHEL Satellite satellite server and had to recover from an Array Snapshot.
Am sure I’ll blow up many other things before I hit the 20 year mark.
weanis2@reddit
Right, do it right. Unplug the whole SAN! Now that was a spicy moment.
cats_are_the_devil@reddit
Honestly, the first thing I thought of. We all do dumb things sometimes. 10 minute outage is childsplay...
rubmahbelly@reddit
OP deserves a raise and two weeks of paid leave. Not a write up.
gotfondue@reddit
He's now one of us.
anonpf@reddit
lol fr.
jackmorganshots@reddit
Far too many years ago, when IT people made great money anywher and I had far more hair, I asked a experienced and knowledgable teacher on my course not far above minimum wage why they taught and not did. Causing "over 20 man years of downtime" was their answer.
0nc3@reddit
If you were in my team, I would try to cheer you up a bit and afterwards you would have to tell everyone in the next team meeting what and how it happened, so nobody else had to do the same thing again. Maybe write a note into the standard instruction for migrating VMs for future reference.
Else: Why should this be a write up? No matter how exactly we plan, how exactly we document, there always can be issues while going through with it. Sometimes things are (eventhough they shouldnt) different in productive environments than on the other stages (*), sometimes we just have a brain glitch, because we are humans after all (eventhough some IT people don't like to admit it). A write up for me personally would need a) intent b) gross negligence. I don't think one of these does apply here.
Mistakes happen. Realise, analyse, fix, document, carry on. You got this.
* Dunno if you notice, I work in a department which does Deployment like in '99 so please translate to modern environments
Creative_Onion_1440@reddit
Sounds like you were trying to do too much at once.
Not sure if that's a write-up, but definitely a lesson to learn.
Write ups for a one-time mistake like this is either overly strict or someone above them is demanding action.
NoPossibility4178@reddit
It's not even trying to do too much, it's just deciding things on the fly. "Oh and empty datastore, don't mind if I do!" Like, what are you doing?
iamsobluesbrothers@reddit
That was the lesson he seems to not have learned. Wait till what you are doing is finished before starting something else. You never know what unforeseen issues may pop when trying to do too much.
perthguppy@reddit
If the company is in a regulated industry or has a mandate to implement things like iso27k, then change control is a mandatory policy and a breach of it most likely does require a write up.
I’m actually kind of shocked so many people are saying OP shouldn’t be written up. I get that we all make mistakes, but that’s why things like change control have become pretty standard at large businesses. OP was conducting two undocumented changes at once, and while the vmotion would have been covered as a pre-approved change, the unmounting of production data stores in most businesses I’ve dealt with would 100% require a change control be submitted and reviewed, and the change window set for non business hours.
uhdoy@reddit
Agreed. I've noticed a lot of my colleagues don't fundamentally understand the Change Management process and it causes a lot of frustration. Folks get frustrated at having to explain their highly technical change to someone who couldn't begin to understand the nuance. Change Management is not a process there to make IT's life easier. We're doing Change Management to reduce risk to the business and create an audit trail. So if I'm asking questions around impact, it's not be difficult, it's so that I can get as close to understanding as I can before accepting some level of accountability when I approve the change.
MBILC@reddit
This was my first curiosity, was there an approved change control for this..
Someone who has been in IT for 20 years, even if they have no official change process, should at a minimum send out an email to stake holders (their boss at least) and let them know what needs to be done and why and when.
Solkre@reddit
A write up is a lazier PIP. Resumes going out for that. Even if it’s 6 months to move.
Creative_Onion_1440@reddit
I agree that change management policy can be an issue people run into in some orgs.
Obvious-Water569@reddit
Honestly, I don't think you're gonna get written up for this.
Squik67@reddit
if vmware allows you to unmount datastores with file still in use on it... tha's a shame for the product lol !
guesdo@reddit
I was once paged to look at what seemed to be an attack by a malicious actor, we were getting thousands of requests per second that resulted all in 401s in prod. That triggered a LOT of alarms across the company (big US retailer). While tracking the source(s) to try to mitigate the attack, I found out the "attack" was coming from a private IP inside the VPN. Before panicking, I asked around the QA team channels. Apparently one of the testers forgot to change environments in his fuzz load test script and went for a coffee... 🤣
allthesnacks@reddit
Been in IT half that time and taken down Prod 3X as much 🤣 shit happens I've never been written up for honest mistakes, write ups are for reoccurring issues.
martynbez@reddit
It happens it happens! Learn from it :)
Embarrassed-Ear8228@reddit
About 15 years ago, on a Friday night, I accidentally shut down an on-prem Exchange server instead of logging off. I didn’t realize the mistake and left for the weekend. Over the weekend, I received a few puzzled text messages from my bosses asking, "Is the email down?" It then hit me what I had done, and I knew exactly what needed to be done. On Monday, I showed up at 6 AM, turned the server back on, and got things running again.
My boss asked, “What happened? Was something really bad?” I replied, “Oh yes, it was a major outage. I spent the entire weekend fixing it, and I got everything back up and running!” The boss responded, “Very good! You really know what you're doing. Keep up the great work!”
The moral of the story is this: it's important to remind non-IT folks that IT exists for a reason, and it plays a crucial role. Take advantage of situations like this to highlight your value and importance. Even in a negative scenario, you can find positive aspects and showcase your efforts. In the grand scheme of things, you’ve earned the recognition!
IdentifiesAsGreenPud@reddit
lol that's nothing ....
I was asked to add a vlan to the vSphere trunk ports and if you know anything about Cisco and ios configuration, then you may know where this one is going. You'd think
adds a vlan - I mean it does, but it also removes all other ones that aren't in the command.
Long story short - I removed about 150 VLANs from the trunk port sending a whole cluster into meltdown that light up our alarm board like a xmas tree.
Want another one ? Ever worked with Linux ? You know what happens when you don't reboot a Linux server in a while ? It runs a drive scan (unless disabled) - which btw. can't be stopped.
Anyway, had to swap a few drives and THOUGHT I had the ok to reboot a couple of servers. I did not, got the wrong number, rebooted some huge database and file servers.
The largest database server was running the drive scan for A WHOLE WEEK. Literally - 7 days of scanning ... took down an e-commerce shop for a week. Well we had to restore a backup (which was quicker) but that was a day out of date.
Anyway, like someone says - rookie numbers :)
yehlalhai@reddit
DAY 1 of my 22 year IT career:
Dropped a table from the SIT database . Yes I had DDL access to the DB as a rookie tester
My team lead had a minor heart attack when every test scenario started failing, and some screens won’t load and error out .
Worse still, the replica of SIT database was pushed down to dev/test and all data-struct libraries stated failing …. 18 devs affected for 4 hours
department_g33k@reddit
My policy when dealing with mistakes is to sit with the engineer and ask "What did you learn from this?" The answers typically go one of two ways, either they deflect blame, criticize the environment/tools, etc. or they own it, discuss methods for preventing future instances of it happening again, and share what they learned.
Their thought process is the single biggest influence in how I deal with it.
dsmooth74@reddit
write up for that? thats an honest mistake by someone who it sounds like hasnt really made a habit of making those kind of mistakes at the company so i think your fine.
The amount of fuckups i did during my admin career lol. Couple months on one job and i didnt apply the Mcafee ePO tags correctly and over rode the default one used for most customers with the specific one for the customer I was working on, so all the exclusions were wrong for all of our customers, survived it but got a telling off.
Lavatherm@reddit
20 years and this is the first? I’m in it for 26 years and have had 5 stupid mistakes at least and 20+ moves I don’t care about how people on the user end felt, because it was needed at the time.
Xzenor@reddit
Where people work, mistakes happen. You're only human
shmehh123@reddit
Today my boss and I took down our CoLo for 5 minutes on accident. Caused a network loop somehow.. still not sure how but one of our main SQL servers was completely fucked for 3+ hours taking down a ton of DBs that run a lot of our sites...
The DBA was waiting on all the DBs to restore themselves but what ended up fixing it was just restarting the SQL service which we suggested 5 minutes after we noticed an issue...
crankysysadmin@reddit
a write up? i'd never write someone up for this. i dont view this as a resume generating event unless you didn't follow procedure.
techieguy07@reddit
You can call yourself a IT tech without bringing down your company at least once.
SapphireSire@reddit
Sounds to me that you wrote up yourself.
hotmaxer@reddit
Sorry to hear that but the lesson to learn is. It about the deep file . The lesson is never perform a task that is not part of a change control. Always test it and have it approved by another technical person and your manager this way you are shielded from any written warning.
Congrats on being sane after 20 years
apatrol@reddit
I brought down Compaq world wide prod once for a few hours. Boss told me everyone gets one huge mistake in their career and to make the most of it. Don't do it again. And I haven't. I triple check everything everytime if it is on a prod system. I don't even open prod racks lol
malikto44@reddit
I can't see how one can get a write-up for this. Everyone makes mistakes.
The place where I see write-ups happen are if management is looking to get rid of the IT people (that siren call of the offshore contractors), if a manager already has an issue with the person and just wants to push them closer to a PIP and out the airlock, or (rarely) if the person is habitually unskilled and refuses to fix things.
CheekyChonkyChongus@reddit
Relax, I'm an operations manager and user logged official complaint on me today because I told him no, I don't have time to help you with recovering a key to bitlocker, you were sent presentation on how to do it on mobile or any other device.
I'm operations manager, it's not even remotely my job.
It was funny.
battmain@reddit
Lol! Sounds like one of those entitled users. He refused to help me. I thought that's what IT was for!
That was like me getting a complaint because I told the user to call their Internet company for help in configuring their firewall for our equipment. User wanted me to configure Internet equipment beyond and outside our scope of responsibilities.
CheekyChonkyChongus@reddit
Oh yeah, for sure. Sadly most of my users are like that
Some were mad that I won't help them because they couldn't connect from home due to problems with their own personal wifi at home..
nocans@reddit
learn and grow
accidentalciso@reddit
You would get written up for that?
The only write up needed is an incident report and root cause analysis. The focus should be on improving the process to prevent mistakes, not placing blame.
If this is going through HR for a write up, that sucks, and makes me worry about the culture at your employer.
rvf@reddit
I've gone down some painfully deep rabbit holes trying to do a root cause analysis of weird one-offs that could never be completely explained that I probably would have preferred a write up by the end of it.
runonandonandonanon@reddit
I am so sick of this attitude and it crops up again and again everywhere I work. Like dude I don't give a shit why it happened, let's place some blame and kick some ass!
Hashrunr@reddit
Firing people for honest mistakes doesn't help. It's how you end up in tech debt.
lordmycal@reddit
It’s the vendors fault. Was a bug. I just patched it though so we should be good in the future. /s.
Throwing your employees under the bus just encourages them to cover shit up in the future. They won’t come to you when there is a problem, because they could be written up for it
battmain@reddit
Or make up an unrelated root cause that sticks or blocks further post incident analysis.
Tymanthius@reddit
Then how do you prevent it from happening again? Why is always the right question. Not the only question tho.
Frankaintmyfriend@reddit (OP)
The culture is very relaxed and chill for the most part. It's honestly the best IT job I have ever had. I believe it will be more of a CYA kinda thing. Doubtful it would make it to HR. My boss was a co-worker for about 6 of the last 7 years and was just promoted to supervisor last year. So he is overly cautious.
merlyndavis@reddit
I’d say instead of CYA, it should be “Congratulations! Now you get to write the runbook on how to do this next time without causing an outage!”
ISeeDeadPackets@reddit
He should be cautious about setting the wrong tone and incentivizing people to cover up mistakes instead of admitting them.
yer_muther@reddit
So much this. I get wrote up for making an honest mistake that sends the message I shouldn't work there loud and clear.
NotFlameRetardant@reddit
Or you write up your Sr. Sysadmin who has 7 years of institutional knowledge, who decides it's time to open up one of the dozens of messages from recruiters on LinkedIn and start interviewing with a place that's less micromanaged
thedarklord187@reddit
or at the very least keep problems hidden instead of telling people due to fear of repercussions.
battmain@reddit
This. Hurry up and fix it or make up a root cause that will stick the analysis.
taemyks@reddit
Good time to implement a change mgmt program. Shit happens
schmeckendeugler@reddit
He's making a bad move with this write up. You should not feel bad about such a small mistake. Yes, small. He's stacking his bricks in the wrong plot. If I knew this person personally, I would tell them (Me, I'm not suggesting you do it), they are WRONG with a capital W to do this and will discourage free thinking, and, furthermore, is evidence of a weak personality and they have lost my respect and are in danger of losing all my respect. In other words, I would rip them a new asshole for writing you up like this.
FYI 35+ years in IT, sysadmin for 20.
xdrift0rx@reddit
It's nice to see these replies standing up for OP. I was just let go after 1 write up and then 1 more unrelated mistakes following after. Never been in a place that immediately writes you up instead of being given a warning first hand
battmain@reddit
Chin up. Sour place to work for. Let someone else deal with their shit. Life's is too short.
freon@reddit
OK, I think when most of us think of a "write up" we're explicitly thinking about a negative note being added to your HR file that can be used as evidence against raises/for termination.
It sounds more like your boss has to write up an incident report, and that the cause is going to be listed as "Frank fucked up" and the resolution being "I've told Frank he better not fuck up anymore". Which is fair.
tankerkiller125real@reddit
In which case they should maybe be investing in "Blameless Reports" to figure out why "IT Employee" could fuck it up, and how the org can change its processes to avoid it in the future.
Jarlic_Perimeter@reddit
Yeah I think you got the right read here, OP is understandably a little shellshocked from causing downtime, my brain would always to go straight to "I'm going to be fired how am I going to eat".
slackmaster2k@reddit
I think you’re making a mountain of a mole hill. A write up that doesn’t involve HR is not a write up. Your manager may discuss this with you and may even write himself a summary but that is not a write up.
You sound like you have a good work environment. You also sound like you scared the crap out of yourself and are now over analyzing things, imaging these mysterious “write ups.”
If you want to be proactive, you should try to work out how this mistake can be prevented in the future, as anyone can make it.
Also, one of the interview questions I always ask candidates is “what’s your biggest mistake in IT,” and I’m suspicious if it’s “I can’t think of anything.”
I once killed production for an entire 200 person company for two days. I’ve accidentally rm -rf’d root on a production Linux server with bad backups. In management I once flushed six figures down the toilet on a failed tool implantation and a bad contract I shouldn’t have signed.
Making mistakes happens. Just don’t make the same mistakes repeatedly. If you do end up being in an environment where there would be some HR action for a mistake like this, then it’s a sign that you’re working for a company that doesn’t value humans.
itishowitisanditbad@reddit
Not a write up then.
Shrug and move on.
A overly cautious person wouldn't write someone up for this.
Thats arbitrary management for management sake.
Your boss sucks asshole
TrainAss@reddit
Ya, this is a mistake that ANYONE, green or grey beard, can make. You were not malicious, you were doing a task I'm sure you've done countless times.
All this is, is a teachable moment to double check before making a change. If this goes on your record and is a strike against you, I'd be worried about the culture at that place.
I took down half a server rack once, and left an entire network share (complete with HR folders and data) wide open with everyone having read access over the weekend once. Oh and I took down the entire print server after an in-place upgrade (made the mistake of changing the virtual NIC from the Intel one to the VMWare one and didn't update the DNS record).
Was not written up. My manager was also a former co-worker and former sysadmin and knew that this stuff happens. You've gone 20yrs without a single write-up, that's damn impressive.
Hopefully this doesn't get you down. I'm sure you're kickass as your job.
mikeyb1@reddit
Then he should be cautious about throwing you under the bus. I'm not torching the trust of my team to cover my ass.
PeacefulIntentions@reddit
Instead of accepting a write up, get with your leadership and suggest that a problem resolution process is better for everyone that just doing a CYA, which helps nobody.
If you don’t have a process and you are starting from scratch read up on the five whys and you can base your process on that.
MyClevrUsername@reddit
Writing someone up for a mistake like this is something he should be worried about. If I worked there and saw someone get in trouble over something like this then I would be looking for a new company to work for.
Gnomish8@reddit
If that's the case, it really should be a coaching conversation with a follow-up email. If it happens again, he can point to that and go, "We've already talked about this..." And in that instance, you can probably expect a write up. If it doesn't become a pattern, then great, you learned. Op success.
At least that's how I'd handle it.
Damet_Dave@reddit
As long as there was a change control in and it isn’t your 50th time doing boneheaded things that take down production, can’t see any reason for a write up.
When the start paying 7 figures they can start demanding zero mistakes.
ThemesOfMurderBears@reddit
The idea of a "write up" is weird to me. I have not been written up since I worked a shitty warehouse job, before I got into IT. You would get "dinged" if you were a minute late. Too many of those in a year, it's a verbal warning, then a written warning, then a final warning -- and you're fired.
Within the first couple months of me getting the job I'm at now, I accidentally executed a script that had a production impact. My supervisor printed out a meme that made fun of me ("When I test my code, I do it in production"), and that was it.
I know people at my current job who have been fired, so I'm sure there is a formal process if people are constantly stepping out of line. But I've never received anything like that.
drloser@reddit
That's true. If the change was risky, it had to be done outside office hours. Your company's change process should govern this. It's these processes that are at fault, not you.
perthguppy@reddit
If the company has defined change control policies, it would be a write up. If I caught my engineers unmounting production flagged datastores on a cluster during business hours without a change control, I would 100% write them up for ignoring policy.
djholland7@reddit
RCA is easily answered with 5 Why's. The 5 Why's, 99% of the time, result in a self-acknowledgement of making a mistake. This is OP's fault 100%. It’s a major interruption to other people. The shock of downed systems, down time, lead time to get back to work, etc. It all adds up.
Does OP manager guarantee an amount of uptime annually? Hopefully that wasn't impacted.
Drakoolya@reddit
Write up's are for not for one off mistake's but for repeated egregious actions. No decent employee should get a write up for something like this from decent management
solman96@reddit
God this entire thread is so cathartic.
I’m mostly a network/security guy and all I can say is until I learned the reload in command life was significantly harder.
AutomaticGarlic@reddit
If people got written up every time they made a mistake, nothing would ever change. The entire IT department would freeze up in fear of job loss. The only write up you need is a kb article. Learn from this and fix your cleanup process so it isn’t so vulnerable to human error.
wonderwall879@reddit
"simple vmotion migrate, so no downtime." ha, that's what everyone says.
rcp9ty@reddit
The only time I've been written up in I.T. was years ago when my coworker ( a level two tech like me at the time ) kept asking me the same stupid questions about a computer I was working on that was infected really badly and kept just saying nuke it ( which I couldn't do because of the content on the computer ) every two hours he would ask me the same questions at 2pm ( the 5th time ) I said either you do this for me or you can shut the fuck up ... I said it loud enough that everyone in the office ( accounting team, server data base team, level three admins, level one support team, IT manager, and IT director ) all went silent. I called my coworker Difficult David ( not his real first name ) everyone else called him dickhead Dave.
ElectricOne55@reddit
I've made this similar mistake too before shutting down a server. I think what matters more is working together to solve the issue. But, in tech you have so many know it alls that don't want to help and expect you to "show initiative" but when something happens they wanna throw you on the bus. At the same time there not there for help.
rcp9ty@reddit
In the end they laid me off because they didn't think I was a good fit, then took two years to find someone to fill my role and the only reason they found someone is that the dickhead found a different job. I currently make 2x what they paid me so I no longer care about that place but originally I was very depressed when they laid me off.
ElectricOne55@reddit
Ya I felt that way with this weird startup that let me go after only 8 weeks working there for no other reason than culture fit lol.
rcp9ty@reddit
Better than a startup thinking you're not a great fit because of a theme song choice in an interview lol
edthesmokebeard@reddit
Everybody gets one.
gandalf239@reddit
Had a VIP accuse me of malfeasance when they couldn't access their email any longer...
We were transitioning from Odious Notes --> Outlook, and I forgot this person didn't have their own, named account, but were using a business account to login to Notes...
My bad. One needs to have their own account in Outlook in order to delegate access to a site/shared mailbox...
I was banned from supporting them, but when their new phone came in we had to take a field trip to see this VIP. Because I wasn't allowed to touch the device, by my boss was, he handled it--while I gave him instruction.
Shipkiller-in-theory@reddit
Lotus notes ewwwwww
SaladRetossed@reddit
I took down a bank for 10 minutes. Teller machines, ATMs, phones, branch sites, all gone cause I accidentally flipped a WAN IP from static to DHCP. I thought I just flushed my career down the toilet. I thought the write up was coming and I was too stupid to work in IT.
It never came. My boss and I laughed about it later and I use it as a benchmark for mistakes now. "Hey at least I didn't take down the bank again".
You'll be ok :) mistakes happen
UnfeignedShip@reddit
I try to make people comfortable in senior engineering interviews by starting out with all interviewers talking about our largest screw ups and then asking how they’d have handled it, from an engineering perspective, a leadership/mentorship perspective, and finally a peer perspective.
Almost no one has a pre-canned answer for that kind of thing and so you get some pretty genuine unrehearsed answers, it gives you insight into their technical acumen, their soft skills, and depending on what they disclose and how it was broken and/or fixed.
If you’ve never broken something important in the environment… you’re either lying (so too dishonest)… statistically due (so I’m having you buy my lotto tickets)… or too junior (and I’ll be polite but will tear the recruiter a new one for wasting both of our time).
weightyboy@reddit
Do you not have a change MGMT process in place? Should put a simple one in if you don't it acts to cover your ass.
I once joked at a bank I worked at that if I raised the change correctly I could take a piss in the back of the mainframe and not suffer any blowback.
TowelPretend@reddit
Oh man the amount of times I've taken down a production facility, usually only for a quick minute all because i didn't realize i was modifying the main trunk port on a switch. Welp let me walk over and power cycle the switch good thing i didn't write to mem. But the icing on the cake was when we had just acquired an entire sales organization of about 70 users. It was a multi million dollar company and we were in the process of migrating them from one O365 tenant to another. Well long story short my partner on the project was trying to get AD sync working and was like you know what let me just delete all the users out of O365 and allow it to re-sync O365 should match up the mailboxes. NOPE. Calls started rolling in of users having nothing in their mailboxes, teams, etc. His face dropped and panic began to set in. I looked at him and said buckle up buddy it's gonna be a long night. THANK GOD for O365's 30 day retention policy for all mailboxes. After a few chatgpt queries and some testing i had put together a powershell script that would go through and associate all the new users back with their old exchange mailboxes. All in all it was about 4 hours of down time. He swore to never again make a change like that before consulting me. If that hadn't worked we had pst backups but imagine having to upload all that to the cloud that would have taken a few days. If you haven't brought a company to it's knees are you even learning?! haha cheers.
RBeck@reddit
The only writeup this could cause is going from Wild-West (do what you want whenever) to submitting change requests to people who don't understand them.
EvelynVictoraD@reddit
45 years in IT. Shit happens not matter how careful we are. Walk it off. It’ll buff out. Admit nothing.
cm7272@reddit
Dude..... we all try to measure twice and cut once, but give yourself some slack... we all step in digital poo sometimes. They should understand your contentious approach and hard work thus far. This is a test for the employer; if they throw a fit and you are all good in the change control sense of things, that's a new story.
azzokk@reddit
Held down a button on a network switch while trying to help a sys admin move it so he can jam in a firewall. The switch… power… shut down the whole (small) company for two hours while they fought to get the aging hardware online. Shutting down the switch dropped some configs.. port routing..
Six months into my first IT job and I thought I was toast. Nope.. boss made it a learning lesson for the team. And we built redundant systems afterwards
MustangDreams2015@reddit
I really don’t think you should be written up, shit happens, you fixed in 10 minutes they should be thankful your that good at your job to see your mistake and quickly correct it.
kzbash@reddit
Say there was a bug and you saved the company in 10 mins. You’re the hero.
Pleasant_Deal5975@reddit
if I were your manager, I wont write you up. but you owe the team lunch (under my expense), but be ready to be the joker of the day..... oh fun time
Upper-Affect5971@reddit
Ask the asshole writing you up if he’s ever made a mistake before.
That shit happens all the time.
architectofinsanity@reddit
Was it a mistake? Did you break protocol doing this change? Did you learn from it? How can we prevent this in the future?
If this wasn’t against change protocol and you just made a mistake - write up shouldn’t even be considered.
charleswj@reddit
A good way to teach employees to hide mistakes and try to cover them up or fix them quietly (and potentially not correctly) is to write them up for honest mistakes. OP probably wouldn't have been able to hide that they s/he did it this time, but getting written up would tell them to try next time.
habaceeba@reddit
You're not wrong, but I think OP is assuming a write up. There may not be an asshole in this situation. I would really question my loyalty to a company if I got written up for something like this.
tankerkiller125real@reddit
I wouldn't question my loyalty, I'd be writing my resume and pushing it out to every decent looking job opening I could find.
Kitchen-Awareness-60@reddit
Why in the world would you have “loyalty” to a company in any situation? It’s a dog eat dog world. Act accordingly
rotoddlescorr@reddit
Same. The second I get some stupid write up because of a mistake is the second I start looking for a new job. Policies like this just cause people to hide their mistakes and blame something else.
dvb70@reddit
This does kind of depend on the work environment. For instance if you make a mistake like this during a change freeze then they get you on the fact you really should not have been doing anything. It's not the mistake but more the fact you ignored a work place mandate.
Frankaintmyfriend@reddit (OP)
Not during a freeze, but I do work for a utility company and it was during a pretty bad rain storm. Map servers went down as well as phone systems.
draeath@reddit
You're definitely correct, but:
Sometimes it's difficult to predict this, but you also usually don't want people doing nothing either, and what level of risk is the threshold can be difficult to define.
This sort of thing always deserves consideration, never going directly to write-ups. If the policy says someone must be written-up, that policy is shit. Judgement should always be used.
jdptechnc@reddit
Delete operations in the infrastructure shouldn't be occurring during a change freeze. Generally agree with you though.
jefe_toro@reddit
What's the point of a change freeze if you are still gonna say changes can be made?
3MU6quo0pC7du5YPBGBI@reddit
I've triggered a bug that restarted the BGP process on a core ISP router just by running a specific "show" command.
jefe_toro@reddit
Exactly. When a freeze is in place, best not to touch anything unless it is a break fix type thing. Weird shit happens and investors or company leadership isn't gonna care when you say all I did was this or that.
draeath@reddit
Even just logging into vSphere to look at non-production stuff carries a non-zero risk. "Oops, I typoed and acted on a production VM instead of the non-production VM I meant to touch!"
The only way to really do this, is to completely partition your production environment out from your test/development and other support infrastructure. Lots (most?) places can't or won't do this.
(/u/dvb70 tagging so I don't have to reply to both of you)
Bitey_the_Squirrel@reddit
All places have a test environment. Some even have a production environment.
OkCareer6502@reddit
This is perfect.
Unless there’s something we aren’t privy to, getting a write up for a 10 minute outage that was easily recoverable is ridiculous. Validated environments are a different story or if this guy works in a call center that relies on phones for everything, but main prod having a 10 minute outage is a hiccup.
S0phung@reddit
He made a very important distinction:
I did even worse than OP because our test and prod used the same disks. I deleted everything. Every. Thing.
--it took a weekend to recover (happened late Friday while working in the test environment). Thankfully it was the VMware desktop environment and not the server environment (which also used the same disks but a different LUN) but anyway, was fired. 10/10.
dvb70@reddit
Yeah I was going to add in my company they really do expect you to do nothing beyond what's defined as admin tasks like user account management type stuff. No infrastructure type changes at all is the order of the day with the exception being break fixes
FanClubof5@reddit
I love when the business decides they need a global change freeze, means I can sit around and do nothing and have a really good excuse for why im not doing anything.
khantroll1@reddit
This is where I'm at it. If this is a heavily regulated place, or there was some other mandate going on, then yeah, it's policy violation and worthy of a writeup.
If it isn't then, it's a "don't ever, ever do that again or you will be written up for negligence" situation, and an opportunity to improve your process.
Tech_Mix_Guru111@reddit
Yeah but could this have been prevented with just a little more thought and thoroughness to his work? Absolutely and that’s what the write up reminds you to do next time.
The problem with mistakes is that they’re subjective and can be explained away that we’re learning, we’re just people , etc, but if you never decide where to draw the line you end up having a team of sysadmins making mistakes and never held accountable or taking accountability in their own actions. That’s what make you a senior, something this sub continuously forgets which is why mediocrity seems to be the norm now. I hate it for OP, but after 20 years, he shoulda known to check and double check. This isn’t a skill issue but just being due to laziness
illicITparameters@reddit
It doesn’t matter, it’s a mistake.
If this is your mindset, please don’t become a manager.
Tech_Mix_Guru111@reddit
Well my solutions and processes are still in place long after I’ve left those gigs. If everything is allowed to be a mistake then that explains why new directors like yourself can’t ever seem to get a thing over the finish line because of some “innocent mistakes”.
If you don’t expect greatness you’ll drown in mediocrity
illicITparameters@reddit
Want a fucking cookie?? I’ve lost count of how many previous employers are still using processes and somutions I implemented.
You’re making a lot of assumptions for someone who is trying to kick someone’s back in for no reason. You aren’t fucking perfect, and you sound exhausting to work with.
You’re just a miserable person.
RayG75@reddit
I think you are wasting your time trying to explain common sense to someone who call himself a guru lol. I’ve seen so many self proclaimed gurus with more than 10 certificate badges in their signature lol… end up lacking not only knowledge but a common sense. The view is obstructed for them since their head is you know where…
illicITparameters@reddit
I spent the last few years of being a sysadmin cleaning up the messes left by all the morons with all the Cisco and Microsoft alphabet soup. Some of the dumbest IT people I’ve ever met have had MCSA and CCNA’s.
battmain@reddit
Agree with that completely. BTDT. Experience can be a wonderful teacher. My favorite was a manager arguing with a colleague about 'hurry up and put the fucking scsi drive in the ide only PC.' Note that this twat had two lines of certification acronyms in his email signature. He lasted 90 days and my opinion to the tech VP was the deciding vote out. After that conservation, someone messaged me with the disable request ticket.
Wait there was more. Another one had masters degree in computer science but could not logically think how to do anything in our structured environment and he was supposed to be my teammate to help because I was by myself. At the end of 60 days, went to the boss at the time and said he had to go. Instead of helping me, he tripled my work load based on fielding user complaints, and having to follow up on 'completed' tickets.
DoogleAss@reddit
To be fair your making assumption in same manner your condemning the other poster for
You followed their response with your own D measuring while somehow talking down to them for doing it first? “Want a fucking cookie?” I mean come on now lol
Second you are making assumptions that working for them under their current processes is or would be miserable but you don’t actually know that hence why it’s an assumption
If you want to get your point across this isn’t the way my guy lmao
illicITparameters@reddit
No, I said THEY are a miserable person. I said nothing about their processes. Also, since they have absolutely no management experience, I refuse to defer to them for how to PEOPLE manage.
And no, if someone is going to talk like an arrogant dickhead out of the gate, they’re gonna get talked to the way they act.
DoogleAss@reddit
Ok well you’re still being ignorant even that case. All they did was state an opinion that you disagreed with which they followed up with evidence that they feel backed their point of view whether true or not and you decided to get butt hurt
You then make a childish rebuttal and an assumption that someone is miserable based of a two message convo
Lastly good luck with that stance first off you interpreted it as arrogant asshole speak they didn’t actually do anything that warrants your response
Btw change management is a thing for a reason if followed properly OP wouldn’t have even been posting this and that is why some companies choose to discipline. You have no idea what OP actually did in their work flow and you have no idea what their policies are. Would I write OP over it NO but not my company
I mean no offense but based off what I read your likely the miserable one if you get that butt hurt over an opinion posted on Reddit
illicITparameters@reddit
“It doesn’t matter, it’s a mistake” is childish???
DoogleAss@reddit
No asking someone if they want a fucking cookie for making a statement holy hell brother.
I’m not here to debate who has change management and who doesn’t is pretty ironic tho that you would reinforce my point with your own. I do believe earlier I stated you don’t know all the details of OP or their business policies meaning there are unknowns and now your using the fact that change management is an unknown to some orgs as an argument. Can’t make that shit up lol
Are you with out a doubt sure OPs company doesn’t have change management because if not your entire argument falls like a house of cards
“For saying people make simple mistakes?” Yea no I said that because you got butt hurt for no reason and then chose to try and talk down to someone over a difference of opinion. Thats not how you have a productive conversation as an adult but do you
Tech_Mix_Guru111@reddit
That expects shit to run well and engineers not to make stupid mistakes, especially those who are seniors in this industry.
illicITparameters@reddit
Get off your fucking high horse. You aren’t fucking perfect. You’re why people hate IT Professionals. Arrogant and ignorant.
Upper-Affect5971@reddit
When’s the last time you made a mistake? If you’ve been in this job long enough, you have fucked some shit up, accidentally.
Quit Monday morning quarterbacking
Tech_Mix_Guru111@reddit
I’m not Monday morning quarterbacking. I’ve made mistakes, I’ve run teams too with sysadmins that never took responsibility for their actions and just dumped the work on seniors. So believe me I know, but you strike me as the type of person that doesnt like to take accountability for their actions.
As a previous VMware admin and popular hypervisors admins, you don’t make moves in those environments without knowing what action you’re about to take will do or has the propensity to do. Anytime you do work on data stores you make damn sure you know what’s affected or using that data store… if you don’t know you don’t move anything till you do. Don’t get me started about database impacts of vmotion.
Make mistakes sure, but don’t come at me as a 29 year veteran of IT and be like I forgot to check what VMs are using that data store… that’s year 2 newbie sysadmin shit. But in your world we’re all part of the team as long as there’s a senior there to do the hard work you don’t want to put effort in, why should you when you can just pull out years of hard work through innocuous platitudes
Upper-Affect5971@reddit
You must be a delight to work with.
Tech_Mix_Guru111@reddit
You sound like everyone else in this sub. Ranting and complaining about everything and everyone bc you don’t take responsibility for your actions
Upper-Affect5971@reddit
You ever think maybe the problem is with you.
Tech_Mix_Guru111@reddit
No, Reddit is filled with people with disdain for their jobs and orgs and nonconformist who come here anonymously to complain because they don’t have the courage to do so in real life. So they come here, reinforce bad mentalities and since you have a lot of upvotes you go about your merry way believing your right bc people agreed and upvoted you…
In reality shit has to work, regardless of how we feel sometimes and we may not agree but shit had to get done which is indicative of a culture that puts themselves first before anyone else and this is the result… people who believe their feelings are more important than anything else
Upper-Affect5971@reddit
How long you been in the field?
It’s nonconformity that made this industry what it is today.
Tech_Mix_Guru111@reddit
There’s a difference of n challenging the status quo and legacy mentality of doing things… that’s a far cry from saying people shouldn’t be held accountable or to come and complain bc their feelings are hurt
narcissisadmin@reddit
Today's post wouldn't have been a thing if change management were in place.
illicITparameters@reddit
I love how people think this is always a thing, especially at smaller orgs.
mvbighead@reddit
It really depends on the policies of the org. If op was operating outside of policy, write up is expected. If operating within, sure, mistakes can happen.
However, such accidents are generally why policies are created. So, over time, things become more stringent on what you can and cannot do during business hours. Make one too many mistakes, and you have some CTO with a chewed butt from upper management who is demanding that nothing happens during business hours to prevent unplanned outages.
Does shit happen? Sure. Should you avoid it and do better where possible? 100%. Me personally, vmotions is fine. Datastore removal/dismount you really should be checking for any signs of life before removing the thing.
djholland7@reddit
Agreed. But you're wrong. Change management is crucial. This interruption by OP is a violation of the A in CIA. A write up ain't deserved. I've fucked up major in the past. Restarted entire office building’s PCs in the middle of the day.
To ensure this doesn’t happen again requires a true and honest acknowledgement of the mistake.
Zenkin@reddit
Does a write up make someone "more accountable" than them owning up to their mistake? I could see this going either way, but honestly if you're doing work then sometimes you're going to cause an outage. If they're doing this multiple times a quarter, okay, that sounds like something which needs to be formally recognized and addressed.
It's certainly their fault, but it's also a pretty odd issue which I've never seen before. Disconnecting a datastore without any VMs on it and taking down a whole cluster is wild.
Tech_Mix_Guru111@reddit
Idk that’s for the person being written up to decide, but hopefully they’ll see the value in double checking their work before hitting the switch next time. If they don’t and pass this off as “my manager is a dick” or goes to his peers to convince them it should have just been a “mistake” he’ll probably never be self accountable
Zenkin@reddit
Just going off what he posted..... his attitude seems pretty good? He isn't blaming others or otherwise trying to say it wasn't his fault. I don't know, if you're just hitting someone with a stick because that's protocol for an outage, it seems counterproductive. This wasn't deleting a prod LUN because he was looking at the wrong name (something I've done), and the impact was far greater than someone would normally expect, even if they were double checking their work.
Tech_Mix_Guru111@reddit
OP is fine. It’s the people saying it’s not a big deal that trolling.
illicITparameters@reddit
No, but he’s just being an asshole.
Antique_Grapefruit_5@reddit
Absolutely. Write up people for NOT BREAKING STUFF...that's a general sign that they aren't working. Otherwise, failures are lessons learned. We've all been there and have the PTSD to prove it.
Antique_Grapefruit_5@reddit
Absolutely. Write up people for NOT BREAKING STUFF...that's a general sign that they aren't working. Otherwise, failures are lessons learned. We've all been there and have the PTSD to prove it.
JazzlikeSurround6612@reddit
Yep this is exactly my first though too. Write up? The fuck kind of man child is OP working for?
Sciby@reddit
It’s quite literally a rite of passage for an IT career - everyone worth their salt has had one monumental fuckup that they tell at the bar after work.
OP is just a little tardy getting to their fuckup.
uhdoy@reddit
I think it really comes down to how Change Management is handled at your company. Where I work, if this change hadn't gone through the proper change management process, it would be grounds for termination. If it had gone through the process, everyone would say "shit happens" and there'd be an RCA, and we'd all move on. A bit more nuance than that (is it a pre-approved change, has this person had other instances of not following the process leading to business impact, etc.)
MJS29@reddit
Was gonna say I’ve never seen a “write up” for a genuine mistake like this. Happens weekly here 😂
phorkor@reddit
I work in broadcast TV in a top 10 market. Anytime we go into "black", that means we have nothing on air. When that happens, we roll commercials which means free advertising, which is bad when you rely on commercials to pay the bills. I made a change to our monitoring system and fat fingered an IP for our SNMP config and once the change was made, our switches started rebooting. Turns out that we got hit with an IOS bug on our Catalyst 3850s which didn't release the memory and all our switches were pegged at 99% utilization for the previous 6 months (we had no monitoring for some stupid reason, so when I started here I took that as my first task to fix). When the monitoring system started trying to pull data, the switch said, "fuck off, I don't know you!" and tipped it that tiny 1% which caused the switches to tank. 15 minutes later things started getting back to normal. 15 minutes is about 5 commercial breaks, so you can imagine how much free advertising we gave out. Once we figured out what caused the issue, I nearly shit myself. I had only been there 6 months and thought for sure I was going to get fired. In that 15 minutes, I cost us a few hundred thousand dollars.
I went home that night and was talking to my wife about it and was down as fuck on myself. I loved the job, I loved the people I worked with, and it was the first job that I really felt at home at in my 20 years of working in IT. The next morning our VP of Tech comes in and asks me to join him in his office. He says, "Yesterday sucked. It sucked bad. I took some heat, but what I want to know is, what did you learn from this experience? We all make mistakes, and I don't mind going to bat for you guys. What I won't accept though, is a team that doesn't learn from mistakes. So again I ask, what did you learn from this experience and how can we avoid it in the future?" We had about an hour long discussion that morning and I will honestly say it was probably the best professional discussion I've ever had. 7 years later I'm our team lead and still working for the same VP and he has helped me tremendously in growing into a leadership position and especially how to remain calm when shit hits the fan.
So yeah, anyone that writes someone up for an honest mistake needs to be out of management.
800oz_gorilla@reddit
Yes-n-no. Some mistakes are understandable. But if the expectation was set not to screw with the prod environment during critical hours (or outside of a maintenance window) the write up could be very well deserved.
OP, I hope that's not your case and I'm sorry you are taking this one on the chin.
Genesis2001@reddit
Also, if they really are that quick to write you up for an honest first mistake, they're looking for a way to fire you. I've heard of this happening in friend circles around me... friends who either knew someone being targeted or were targeted themselves.
SknarfM@reddit
This depends if OP's company has a change control process. If they do and OP has made changes to production outside of this he should be in big trouble. If not, then sure, an honest mistake. His manager should probably ignore a one off
Ron-Swanson-Mustache@reddit
That's what I'm saying! Unless they hid something or blatantly failed to follow a procedure that caused the downtime, why would punitive measures be taken?
thebrax27@reddit
100%. A write up for a honest mistake is absolutely stupid. It's human. If you're making mistakes often, maybe, but rare? Absolutely not unless they fully believe it to be intentional, but that would be grounds of termination.
LiberContrarion@reddit
Find an error in the write-up itself...and demand he get written up.
Remindmewhen1234@reddit
He hasn't been written up, he is assuming he is going to be.
minoltabro@reddit
This. I’m a manager and I get annoyed for someone who doesn’t take responsibility for a mistake or skates around an explanation for what happened.
WaywardSachem@reddit
This was my thought as well. If OP was displaying a pattern of it, that's one thing. But if it's their first mistake of this kind, that'd just be shitty boss behavior.
Or maybe OP is just being paranoid.
jackmorganshots@reddit
Too many managers seem to take the bond villain approach - shoot the experienced ones who fail you and wonder why bond keeps getting past the rookies who are left.
Big-Industry4237@reddit
Oooh I like this one
techw1z@reddit
huh this is weird.
on one hand I agree with you, honest mistakes shouldn't be written up immediately, on the other hand, I would totally fire a person if they screw up like this for 3 or 5 times in a row and I think things that could lead to firing make sense to be written up.
yes, I realize my own cognitive dissonance here...
ms6615@reddit
To me a “write up” is part of a disciplinary process for breaking policy. This is just being dumb and bad at your job. Maybe it’s common or maybe it’s a fluke that never ever happens. But tracking that type of job performance is different than enforcing company policy.
techw1z@reddit
english isn't my first language so when I read write up i don't really think of it as immediate disciplinary action. putting this event in a separate log where IT-staff mistakes are entered would count as write up for me too, but I guess that's not quite what english natives are using that term for.
thx
freon@reddit
Especially (only?) in America, a "write-up" is signalling that the company is building a paper trail to fire you.
samtresler@reddit
Keeping a record of employee mistakes without disciplinary action is reasonable and smart.
I wouldn't bother with a write up if it wasn't associated with a plan for employee improvement.
20 years? Doesn't sound like OP needs to improve if this was just an honest mistake. Record the incident in the log and move on. If it becomes a habit it could be a write up on the 2nd or third offense encompassing "multiple mistakes".
But unless a write up can improve the business goals there isn't much point.
techw1z@reddit
english isn't my first language so when I read write up i don't really think of it as immediate disciplinary action. putting this event in a separate log where IT-staff mistakes are entered would count as write up for me too, but I guess that's not quite what english natives are using that term for.
samtresler@reddit
Oh! No... here it's usually legal language for "we need a history of warnings before we can fire someone 'with cause'". Frequently affects a year end raise, or other action. Multiple write ups can occasionally lead to automatic termination.
Depending on the company it can be routine or could be very damaging to an otherwise great employee.
nj_tech_guy@reddit
There's a difference between a genuine mistake and negligence.
That difference is usually confirmed somewhere around the 3rd time making the same mistake.
Library_IT_guy@reddit
3 or 5 times in a row in a short period, OK sure. But 20 years in IT and doing a good job and one honest mistake that was only 10 minutes of downtime? Bit extreme.
elitegoodguy@reddit
I mostly agree with this. As someone on the other end... We don't know the full story. Is this a one time thing? Or does OP have a history of making quick actions bringing down the environment? Did he submit Change control for the actions taken?
If we take this as this is the first incident of this it would be more like a conversation in my office that would go more like "Hey man thanks for helping bringing systems back up, you really need to be more careful next time"
If this is the 3rd time I've had that conversation then it's a formal letter sent to you, so it's in writing that we discussed it but not a "write-up"
4+ times then yes Write-up, PIP, termination might be needed.
Tarcanus@reddit
Seriously. As long as you acknowledge your screw up, fix the screw up ASAP, and do documentation or other due diligence to ensure the screw up doesn't happen again, you should be fine.
A good manager makes sure that stuff happens and only gets upset if you keep making the same mistake over and over again.
ebenizaa@reddit
Especially since it’s the 1st time in 20 years. If I was OP’s boss, I wouldn’t even consider a write up. Quick convo to get his version of events and once I heard how he fixed it l, I’d just nod and walk away saying “talk to you in another 20 years”
kirksan@reddit
Yeah. I’ve made mistakes that cost real money, as have folks who’ve worked for me, without any kind of writeup. 10 minutes downtime is nothing in comparison to an entire career, at most I’d make the person who did it buy a round of beer for everyone who had to help cleanup, and even then be mostly joking.
Bitter-Fee2788@reddit
The person I used to work with didn't get fired for telling a partner to dry an entire server rack, that the partner had spilt coffee on, with a towel and hairdryer. The spill was bad enough that it caused a short circuit for the entire building as well (luckily it was only the cooling fans, and the partner smartly called back as he thought that didn't sound right).
OP made a genuine mistake, he'll be fine..the person I worked with was completely incompetent.
anna_lynn_fection@reddit
Depends on if it's regarded as a careless mistake.
RunningMan66@reddit
You think that’s bad … when I was fresh out of school and working at a help desk for this managed services company. We managed several different systems (mainframe, AS400, HP UX, Novell NetWare, etc…) for primarily various government agencies.
Since I was a rookie, I barely knew anything about all these different platforms, but took it upon myself to learn as much as I could. I was given root access pretty much from the start to all of the HP UX systems to perform my tasks, and one day decided to change all of the ownership of files in my home directory to my user. It took a while to get the prompt back after running the chown command, only to find out I ran the command from the root directory and not my home directory!! Needless to say I owned the whole system, to the point where root couldn’t even log in anymore. They sent me home and told me they’d call me eventually.
The company spent thousands on flying in top consultants who spent days recovering the system. After a week, I got called into a meeting with the president of the company, and all levels of management, including my boss who was sweating bullets. I was sure I was not only fired, but also being held liable for what happened.
The president looked me straight in the eye and told me “thank you” because what happened should never have happened, and I had exposed a serious flaw in their operations. There is no way a rookie like me should have had root access to government servers from the start.
They did a complete overhaul of their procedures, and not only did I keep my job, but because I showed initiative and willingness to learn, I eventually got a raise and was promoted!
So don’t beat yourself up for one mistake. As many have said already, we all make mistakes. Some bigger than others :-)!
Rouxls__Kaard@reddit
Haha I’ve done far worse and still living write-up free! Welcome to the oopsie outage club :)
HockeyFan_32@reddit
I once refactored the value of zero on a Unisys mainframe.
Don’t worry so much!
Cheap-Ad1290@reddit
Iv lost count of how many fuckups Iv caused in 17 year I.T career!!
I ones deployed a script via ansible that took down a DC server cluster for an ISP causing a 5 hour outage.
ApprehensiveVisual97@reddit
I was responsible for a high availability product and being software, it failed. I so wish I could list the house hold names but I’ll say manufacturing lines, e-commerce platforms, collaboration platforms, mobile carrier (a big one) and more. When it worked, it was amazing. One e-commerce platform had 43 people working on the call - intense
Xenophore@reddit
I once made checks late for 400,000 Mary Kay consultants.
HWKII@reddit
I wouldn’t allow one of my managers to write you up for this. Shit happens. You fixed it.
If you’d hidden what you were doing, or lied about it - that’s worth of a write up. You don’t write people up for a mistake.
merlyndavis@reddit
I spent a long holiday weekend rebuilding a NetWare server after the RAID array sh*t itself all over the server room floor. That was entertaining.
If I didn’t get written up for that, or for the time I forgot to save the firewall config before rebooting and blocked all traffic to our e-commerce site for the thirty minutes it took me to drive to the data center and fix it (because stupid me hadn’t setup any way to access the DMZ that didn’t go through the firewall), or for the time I wiped the CEO’s home directory and and to restore it from tape right before the quarterly investor call or….
Relax, it’s IT, Murphy’s Law is our primary rule for a reason. Just learn from your mistake. If you do it a second time, THEN you deserve a write up.
Complete-Dot6690@reddit
If that’s a one time thing and they write you up, then leave them. 18 years healthcare IT here.
dpkg-i-foo@reddit
My workmate once deleted some storage volumes that were used for financial servers which caused 3 days of downtime :D He still works with us... Pretty sure you'll be fine
Prostatortot@reddit
Congratulations! You've been promoted to Engineer. Your raise will be effective next pay period.
gmlear@reddit
Show me an admin that hasn't taken down a server and I will show you an admin that doesn't do anything.
My best was taking down an Exchange cluster trying to recover from the "I love you" virus in the early 2000s. Happened on a Friday at 4PM. Slept in the server room Fri, Sat and Sun night. Seven pizzas and two cases of dew and we rebuilt the entire system from the raid controllers up. LOL.
Six_days_au@reddit
Back in the day the person to accidentally dump CICS, would get the plastic gumby award, to display until the next person did it.
spittlbm@reddit
You get a beer, not a writeup.
FunScore645@reddit
You need to enter recover mode and specify recover files, type rm -rf
zmirza2012@reddit
Not the worst, I accidentally ran a test script on a Primary DC instead of my test server... downtime was 2 days, still no write up.
In my defence... I have no defence.
adamixa1@reddit
I am not sure if this deserved a write up. I did worse, basically wiping a live server, and still survived
battmain@reddit
Rookie. I took down Eastern Airlines when they were in their prime... During the day.
That was back when the keyboards were big and heavy enough to kill somebody with.
jlipschitz@reddit
Mistakes happen. Learn from it and move on. Don’t beat yourself up. I have been at it since 28 years and some change. If your manager is worth their stuff they will understand that.
LAKnerd@reddit
I remember my first time after my third year in IT... It was a rough 6 months developing something that directed ACH payments. The day finally came for its first night run. As I'm monitoring the databases it uses, I notice that it's oddly empty and the job suddenly stopped the entire iSeries from processing any further nightly scheduled jobs. I costed the company tens of thousands in fines for late payments.
I about broke down in tears until the director of software development sent me a message "you messed up, you know what caused it, you planned for recovery, and I'm sure you won't make this mistake again."
Anyone who writes you up for a mistake you didn't think about is someone who hasn't been an operator or has been in management for too long. Even losing data is bound to happen eventually.
WorriedFlies@reddit
Pfft. I have PAGES of write ups.
ImpossibleLeague9091@reddit
Ya if they're writing me up for that I'm telling them off
Flatline1775@reddit
I've been in IT leadership for almost 20 years now and I've written people up a few times.
Driving over 100mph in the company car and getting a ticket.
Lying to me about the status of a project.
Lying to me to about the status of an issue.
Just straight up not showing up for work one day.
What I've seen that didn't result in a writeup.
Shit happens. If you get written up for this you should really start looking for a new job because your current manager sucks.
Solkre@reddit
How many memes can I post in the department chat before a write up?
Drew707@reddit
Quality over quantity. Lead with Goatse and see if you can get through Lemon Party to Tubgirl.
VexingRaven@reddit
Gotta keep count so the one that crosses the threshold is worth posting.
_THE_OG_@reddit
haha... depends on the meme...
McGarnacIe@reddit
This is practically a live DR test. Just shows that systems are resilient and 10 minutes downtime is acceptable. He did a great job!
bbx1_@reddit
Good lesson learned. This is why change control/change management is important.
civiljourney@reddit
I brought down our email system for a couple hours one time. Worked myself up pretty good over it because it was something I should have caught. Told my boss about it and he was like "ehh whatever, thanks for letting me know."
Life goes on, people make mistakes, and good bosses don't get worked up over that.
shiftdeleat@reddit
mate if you get a write up for one mistake (that is easy to make) then that is ridiculous. We've all done stuff like this. So many ways to misclick or not think through what might happen, its just so easy to do.
the_other_guy-JK@reddit
Oooooh I can relate to this one.
I was doing some significant SAN restructuring following a major outage (that happened while I was on vacation, of all things). Amusingly, this outage was in part due to failing backups.
I flipped two SAN datastores around and instead of deleting the correct empty one that was empty I blew up the other. I'm pretty sure the labels were switched too and that helped increase the confusion. I knew shit was bad when I watched VMs start dropping of vCenter. Worst 12:30AM Maintenance Window Ever.
Mind you, the original work was all approved by my director and this was honestly some of the last items to move around, I had spent the prior several days doing this work on other data.
I still got a write up. "For doing work not approved by management."
That, frankly, was complete bullshit. But it never really came back to bite me, as far as I knew. I'm long gone from that shop now.
seibd@reddit
Reminds me of the time I took down our entire VDI cluster by running a script in the middle of the day that unmounted every datastore from every host in the cluster. There’s no stress like looking out at a sea of cubicles and seeing everyone chit chatting because they can’t work.
Learned a lot from that mistake, had full management support throughout and after, and even used that as justification to fund a full lab environment. I work for good people.
dgmib@reddit
If you get written up for that you have an incredibly shitty manager.
I’ve been a C-Suite IT executive for 13 years. I’ve had employees make mistakes that cost the company 6 figure numbers.
I would never write them up for making a mistake like that. I tell them about a couple of my own fuckups (of which I have a great many to draw on), and then we discuss strategies to prevent someone from making a similar mistake in the future.
Something, my mentor once told me “You can’t fire your way to success.” Every team makes mistakes, every team has outages. The key thing is to learn from them.
Bordone69@reddit
Call me when one of your peers clicks through like three warnings to delete a volume on a Netapp and takes for the entire production environment. Took 24 hours to restore. Oh and not even a write up, the dude brought in a single Costco cheese pizza for the 10 of us that had to clean up his shit.
prodsec@reddit
Change windows with approvals should help with this. Assume any vmotion can cause problems idk
theoriginalzads@reddit
20 years and this is the first time you took down production services?
You cannot be serious?
It is an undocumented requirement that you are not a real systems administrator until your first production system wreckage.
BathroomUpper9140@reddit
Once removed and RDM LUN from a live active Exchange Server hosting thousands of users, reconnected the LUN and performed eseutil in world record time, company was ok about it. Think I still haven’t got over that and it was 15 years ago.
Majik_Sheff@reddit
20 Years to earn your first "Took Down Production" badge?
Do you wear PPE at your desk?
Frankaintmyfriend@reddit (OP)
Sadly, not the first time in 20 years, but the first time in the 7 I have been at this company
brianozm@reddit
I wouldn’t advise doing anything at all during a migration, just tempting fate. Time to sip a coffee or dream a dream :)
teeweehoo@reddit
Working at an MSP has taught a few key lessons about working in production. Only do one operation at once (migration, deletion, etc), and if at all possible only delete things during down time / maintenance windows. Also "disable" is always better than "delete".
thisguy883@reddit
A guy in my department decided to reboot all DCs for a site all at once and left for the day.
It caused an outage because DNS takes up to 30 mins to start after a reboot.
Well, if you take down all DCs, it will cause an outage for at least 30 mins. It's not a big issue, and it's a common mistake, so long as you let someone know, which this guy didn't.
So hours are going by, with various reboots because local site admins are calling in having other agents in our department reboot the servers because they think that is what was causing it.
The next day, our supervisor wanted to know who the hell rebooted and told no one about it. Homeboy didn't say a damn thing, so we pulled the logs and saw it was him.
That was the first time i saw our client lose his shit and recommended that this guy get let go.
We are contractors who work for him. His number 1 rule is to own up to your mistakes and never lie.
jelflfkdnbeldkdn@reddit
yeah ive taken down 2 productive environment in my first 3 years. even tho 1 was planned the way it did go down was not planned at all. anyway its part of the job, were humans and can make mistakes sometimes. if you get treated badly for it its a sign to change the job
catherder9000@reddit
Hah, I took down half of the entire federal election enumeration system way back in the early 90's (91/92) for the YES/NO referendum (Quebec separation from Canada, Charlottetown Accord I think it was). I got a letter of recommendation and fast tracked for security clearances for other consultant jobs because of it. I owned my mistake, documented how to prevent it, and notified everyone up and down the chain of what to never do.
All that really happened was enumerating a few million Canadians were delayed by a day or less from being sent their voter cards two while the databases were repaired and the enumeration re-counted in a handful of areas. Voters never even heard about it.
Taught me really early in my career to never run from mistakes, instead just own them and learn from them.
inputwtf@reddit
Getting written up for a 10 minute outage lmao.
say592@reddit
Unless there is some other factor, like you were told not to do it during working hours, you should have known not to do it during working hours (as in it's against policy), you have an extremely strict internal SLA (healthcare/people die kind of SLA), or it caused a customer to make an SLA claim, you shouldn't be written up.
Any company that writes you up for this and doesn't have some additional really good reason isn't worth working for.
iotic@reddit
Hopefully your manager realises the actual nature of the job - you can't work with fire without getting burnt occasionally. So hoping for the best for you and that you have a manager who has been there before.
Warsum@reddit
“You can’t work with fire and not expect to get burned every now and then.” Honestly that quote is so appropriate to our line of work.
battmain@reddit
Plus, you haven't worked in IT long enough if you haven't fucked up a production server during business hours, doing things properly. (Or mostly by the book.)
Gus_McCray@reddit
Support clients that involve Life Safety systems. So as long as no one dies we are all good. Biggest down time we had was four hours. The guy normally supported this piece of equipment was out for some training. Equipment was acting up so I wanted to reboot it. Got to a screen that read "Cold Reboot". Select it and system reboots back to factory default! Was not a fun four hours.
usa_reddit@reddit
Let me tell you about the day that Jimmy dropped the Oracle database, thankfully on a Saturday.
cheknauss@reddit
Thanks for the warning. GL to you bro
Zirha@reddit
Just be glad you aren't the guy that hit the wrong button at Reddit today.
All joking aside, you will be fine.
MoocowR@reddit
Never worked anywhere with writeups before, I think it's weird and juvenile. Employers can't just have a normal conversation to find out what caused the mistake and how to prevent it in the future?
random74639@reddit
Can someone from the US explain whats a fucking “write up” for those of us from countries that don’t use kindergarten vocabulary in professional setting?
RunYouSonOfAGun@reddit
Reprimanded for not following policy, underperforming, or causing harm to a business typically.
moralboy@reddit
So did you get written up or what? Sounds like you got paranoid for nothing.
Bartghamilton@reddit
Lots of other variables here…Was there a change approved for the middle of the day? If not, then I get the write up. If yes, someone accepted the risk. But depending on your relationship with your boss they might have to just write you up to cover the outage with their management.
Natfubar@reddit
Probably making some assumptions but I think the lesson learned is really - don't deviate from the planned change.
Cleaning up old data stores seemed to be something that you did on a whim. Was that well considered? Was it meant to be performed at that time? What other unauthorized activities do you perform with your privileged access?
There's more to this than meets the eye I'd say.
Memitim@reddit
I dumped the entire local company network offline for over two hours while working as a contractor. At the end of my contract, they tried to hire me on full-time to help kickstart an engineering-focused team with a new manager that they hired.
These things happen. Mea culpa immediately, have a remediation plan that mitigates the human factor, and remain calm and professional throughout. You might be surprised.
Iwillcallyounoob@reddit
Ha they wouldn't dare write me up for 10 minutes. A write means they are 1 step closer to firing me. Im not going to wait around for the next step. Im not going to gamble my house or my car for some fuckwad. i'd update my resume and just take a little look.
Cyril2016@reddit
Shit happens. At least they know their IT stuff can fail but also be recovered by you. Often people are taking admins for granted becaused everything is running so smooth. At those times they appreciate having you there solving the issues.
mimic751@reddit
If it makes you feel any better I cost Best Buy 3.5 million dollars in business for taking out a few stores by rebooting all of their switches by accident. I also took down all the hospitals in a major Hospital chain in the Midwest for 2 hours because someone put some kind of VLAN in a port group and when I was merging Port groups to clean it up it took down the network I just had to undo that and it started working again
Outages happen. The best thing that you can do is write an email to your leadership and maybe someone who oversees you like a senior tell them what happened, Wyatt happened, how you can prevent it in the future and lessons that you learned. You show a willingness to grow you generally don't get in trouble
piccler@reddit
Who hasn't taken prod down before?
qejfjfiemd@reddit
Why would you get written up for an honest mistake? Like, I understand getting written up for being malicious, but we’re all human beings, we all make mistakes. This is just a learning opportunity.
ThrillHammer@reddit
Yeah don't sweat it everyone takes down prod at some point. That said, I would never drop storage during the day, don't care if it's "just unmounting", removing storage always has potential for bad things to happen.
WhoWont@reddit
Fired!
Zentriex@reddit
20 years and this is your first network takedown? I did that in my first 3 months. You're actually a legend haha
tbstoodz@reddit
You don't get written up for this shit in any company worth working for
dano5@reddit
if you get a writeup it's a shitty place to work, taking down prod is a rite of passage and meant to teach, especially when there's no data lost :)
Took down prod completely in my second year, and only because I was overly cautious the first year :p
Hi_Im_Ken_Adams@reddit
This is why change management exists.
You don’t just decide to do something else “along the way”.
thewhippersnapper4@reddit
Yep. This exactly.
sleepmaster91@reddit
Lol. We've all done worst that me included
You'll be fine otherwise you're employer is petty as fuck
cop1edr1ght@reddit
My worst mistake was emailing 10,000 people accidentally from a test system. I first realised it happened when the CIO came over to ask what the email I sent them was.
IT-Command@reddit
You can do better! My org has gone hard down for 4 hours because an networking admin that absolutely knows better plugged a voip phone into the network twice and caused a massive networking storm
radelix@reddit
Lemme tell you the time I knocked out the single VPN concentrator for a client.
Worked for an MSP of sorts that specialized in last mile connections and vpns.
I was configuring a stack of Cisco 871s. Part of the process is I have to paste 2 lines of the gre config into the concentrator.
Make sure the VPN lights up
Run test pings to prove traffic
Wr
Reload
Prove it comes up
Slap it in the box and make it the shipping departments problem
I had a stack of 15 to do and we had enough public ips that I could do them all at once.
I run through the process above, do the writes on all 15. But I am on a roll.
I specifically set my tab in securecrt for headends to be in red with large blocky text as a trigger that I am on in and not the default pleasant green of ssh or a console connection
I hit wr then reload in quick succession. Acked the prompt. Then shit myself.
Started a constant ping. Waited...waited... Waited. Nothing. Go to the engineer for the account, he was packing up to go home. Tell him what I did and watched him die inside. He put his stuff down.
It was found the headend was in common when someone physically got to it. Replaced the flash card, mounted a new iOS, dumped in enough of a config to talk to our CMS and all was well.
I bought the engineer a sandwich the next day.
It happens.
Good job, op.
Leopold_Porkstacker@reddit
20 years and you’re getting your first write up?
Are you worried that it’s going on your permanent record?
HittingSmoke@reddit
This isn't a write-up level issue. Far from it. This is a "Hahaha did you hear what Frank did? Dumbass!" issue.
thedarklord187@reddit
was the host not in maintenance mode ? it should have kicked all the vms over to the other host while you were migrating ?
diito@reddit
Getting written up for an honest mistake is a red flag for either poor management or an excuse to get rid of you. You never want to people to fear making a mistake. If you do that nobody will ever own up to them or be honest with why they happened. Everyone screws up from time to time. The goal should be to learn from mistakes so that you don't repeat them. That's management 101 shit.
On the other hand if they just want to get rid of you it's the perfect excuse to do so with "cause" to cover their butt for whatever the real reason is.
tanzWestyy@reddit
No sweat bro. As long as you were honest about your mistake and identified where you went wrong; you should be fine. Owning up and taking responsibility is a testament to your skills is a good professional quality. Have no fear. :)
jbtrading@reddit
It took me all of 5wks to accidentally bring down a production system. Then I followed it up by walking into a glass door, breaking my nose and bleeding all over the carpet in front of my boss's office. Honestly - breaking my nose probably saved my job.
Secret_Account07@reddit
This wouldn’t be a write up at work. Maybe a verbal warning, but even then unlikely.
Calm-Aspect2795@reddit
I had a mgr who told me she would kill me with a knife. I retired from state IT. Whewwwww
beneficial_deficient@reddit
I'm not sure i understand why you'd get a write up, accidents happen. I've taken out a whole province before on accident making changes on an isp side before, human error is a thing. They gotta understand that
DukeOfRadish@reddit
Shit happens, don't sweat it. The lesson is not about VMs. The lesson is to create and adhere to a change management process.
Scope, rollback plan, process review might have helped or minimized the impact.
TiltedWit@reddit
Who the hell gets a "write up" in this day and age.
andocromn@reddit
I'm honestly not even sure what that's supposed to mean. My manager once "wrote me up" for something. I sat in his office resisting the urge to tell him that it meant nothing to me. I still don't know what effect it was supposed to have.
Antnee83@reddit
It's usually used for high turnover, menial labor positions. Where the managers are simply too overworked to give a shit about any of the faces under them, so they systematically "process" people to get them out the door without getting HR all up in their asses.
I got wrote up when I worked for Burger King as a fucking teenager. The manager thought it would straighten me out, but it uh... did not.
andocromn@reddit
I still don't get it, why would that "straighten you out"?
MBILC@reddit
Could be company policy if a regulated industry as someone above noted. Or, there is more the OP may not be telling us, and this was a even that the boss had enough (is OP often doing changes with no change control or approval from bosses?)
mrkehinde@reddit
Patching an AIX 4.3 box that supported a remote call center at 2am. Issued a ‘shutdown -fh now’ vs a ‘fr’ Blasted thing typically takes 20+ mins to come back online. Realize my mistake after 1/2 hour. Hop in my car to drive 3+ hrs to the remote office to literally push the power button. Sh!t happens.
IStoppedCaringAt30@reddit
Your boss would write you up for this? I'd look for a new job.
This is a learning experience and opportunity to create policy. A write up is absurd.
IStoppedCaringAt30@reddit
Your boss would write you up for this? I'd look for a new job.
This is a learning experience and opportunity to create policy. A write up is absurd.
dadbodcx@reddit
Written up?
HerculesMKIII@reddit
You f'd up, but I don't see a write up happening over it.
MBILC@reddit
Well, the only one they are telling us about, many possible reasons why this happened...
Perhaps they have done this before, but it just didnt have as big an impact...perhaps their boss got torn a new one because this impacted critical systems and phones, thus, plenty of people were affected and wanting to answer as to why this happened and who approved such a change during the day..
HerculesMKIII@reddit
I’d see the Christmas bonus taking a hit before HR go handing out a written warning
Feeling_Saucy@reddit
If they write you up for that then they can seriously go fuck themselves!
Blckdragon258@reddit
I made it 6 months for mine. Comes with the territory. 20 years has to be a record….
TrainAss@reddit
Won't be the last time Op. I've taken down half of a rack when trying to determine a model of a PSU.
Server was showing failed PSU, no indications of make/model of server. They have redundant PSUs for a reason, pulled the failed unit and next thing I know half the rack goes quiet. Quickly get everything back in and start powering the unit back up when the boss comes rushing in to the server room to ask what happened!
Oops!
MoonOfTheOcean@reddit
If there's seriously a risk of getting a write up over this, you need to leave whatever kindergarten you're working at.
reedevil@reddit
Junior SA: OMG, I took to long to complete this task. I'll be fired.
Senior SA: Nice, another prod outage named after me!
Usual-Chef1734@reddit
This does not even count as a worthwhile story for 20+ year vet. You are good dude.
captkrahs@reddit
You’re not getting wrote up
chalbersma@reddit
Be sure to brush up your resumue. A writeup is normally a sign that it's time to move on.
Fordwrench@reddit
I know nothing!
https://youtu.be/HblPucwN-m0?si=A9vcar0l6tlMbhfd
catcherfox7@reddit
Shouldn’t this be automated?
Sintarsintar@reddit
Shit if I got written up for that I would be making my exit plan. The last time I got written up over bs like that I got a job lined up and waited until it was the worst possible time for them and walked out but this was also a place that walked you out if you put in two weeks so why bother.
Shoddy-Line94@reddit
Pfft Try -
(*) Do a firmware update on 5 computers in a row, not actually checking to see if it works on one first, and then bricking those 5 computers because of this.
(*) Load of students come in to do exam on computer. Decide to ghost a few PCs elsewhere. Forgot that the ghosting server is on the exam server (because the school was cheapskates). Ghosting process freezes up the server and the exam software doesn't run. Exam rescheduled for 60 disappointed students.
(*) Experimented with Shadow Copies, set it on all the servers. Problem is, I set it on the C: drive. Disks run out of space and Active Directory stops working. To top it all off, I'm absent due to surgery. Classrooms unavailable for days. Fuck knows how I kept my job when I returned.
(*) Used Group Policy to deploy Adobe software to all computers. Set in the evening. People naturally turn on their PCs in the morning. All the PCs lock up installing the software for at least 2-3 hours.
(*) My personal favourite - Windows 2003 allowed you to deploy configuration templates to Workstations - essentially set permissions on files. Me and the boss argee the SYSTEM account doesn't really need permissions for all those files. We remove access. We stop EVERY computer in the network from running because the System account needs to access certain files. We cant undo the changes via Active Directory. The IT Group had to go to each PC, log in locally, and give the SYSTEM account access to all files to restore access. Downtime was horrendous.
30 years in IT. I'm considerably more cautious and test everything before I push it out!
mattyg2787@reddit
One of my favourite interview questions for senior roles is “tell me your worst f up on the job and how you fixed it” if someone gave me something like this I’d probably laugh
schnellwech@reddit
I would recommend to never make more than one change to a system at once. Thats a general rule that makes your life easier. As always in life there are exeptions but most of the time it fits.
2c0@reddit
You say
"Whilst migrating the VM's I found a critical flaw. It caused the VM's to all reboot. It was only a matter of time before they were down permanently. I have solved it and it won't re-occur. Unfortunately there was an unavoidable and unforeseen down period of around 10 minutes"
MBILC@reddit
Don't lie...
Because now they could ask what was the critical flaw, have you reported it to the vendor (vmware) and how do we mitigate it in the future.
OP was not doing their due diligence and paying attention by the sounds of it, got complacent. There are several ways to verify if a data store is in use or not.
WashedPinkBourbon@reddit
If someone write you up for this, polish your resume and get the fuck out.
MBILC@reddit
/Devils advocate
Someone who has been in every aspect of IT for 20 years, should know that anything you plan to do that "wont impact anyone" often always does, because of oversight or not properly planning through their change or documenting it.
With that, did they have approval from their boss or who ever manages them? Or is OP often doing changes when ever they want, and it finally caught up to them, with out informing anyone, and it affected critical systems, so likely higher up people were impacted.
We know so many changes can break things and most end users wont flinch, but as soon as something impacts a manager or exec, they want to find the reason and place blame and make sure it never happens again....which likely resulted in the write up....
I learned this myself in my early days, so even when my boss was the CEO directly, I would just send out emails letting people know I was working on systems, and while it should not impact anyone, an FYI and let me know if anyone has any issues. But I also worked to do most changes during no busy hours (we were a 24/7 online poker site so that was hard, but we had slower periods)
dengar69@reddit
Write up for 10 minutes of downtime? That's f-ed up.
MBILC@reddit
Critical infra taken down during business hours.....
Was it an approved change?
Was OP just doing what they wanted when they wanted?
When you take down critical infra and entire phone systems, likely had plenty of managers and execs and other impacted and they wanted answer as to how did this happen and who approved it?
I am leaning towards OP's Boss got torn a new one over this so the bosses only choice was to do a write up.
wesinatl@reddit
Fuck companies and their bs write ups. Show me someone who has never made a mistake and I will show you someone good at hiding the bodies.
MBILC@reddit
/Devils advocate
Someone who has been in every aspect of IT for 20 years, should know that anything you plan to do that "wont impact anyone" often always does, because of oversight or not properly planning through their change or documenting it.
With that, did they have approval from their boss or who ever manages them? Or is OP often doing changes when ever they want, and it finally caught up to them, with out informing anyone, and it affected critical systems, so likely higher up people were impacted.
We know so many changes can break things and most end users wont flinch, but as soon as something impacts a manager or exec, they want to find the reason and place blame and make sure it never happens again....which likely resulted in the write up....
I learned this myself in my early days, so even when my boss was the CEO directly, I would just send out emails letting people know I was working on systems, and while it should not impact anyone, an FYI and let me know if anyone has any issues. But I also worked to do most changes during no busy hours (we were a 24/7 online poker site so that was hard, but we had slower periods)
fungusfromamongus@reddit
lol I hope that you do because management will realise that these are silly numbers to be writing up for. Cheer up bud. It ain’t going to happen.
Update us if it does happen
djholland7@reddit
You moved their cheese!
I don't mean to sound like an asshole, but you performed unsanctioned change and interputted people's work. This is not good. I don't think a write-up is warranted. You should be extremly careful with change management. Espcially with people that don't understand IT as well as you do.
I'd suggest the lesson learned be to improve your change management process.
MBILC@reddit
Question to ask, has OP been talked about before on doing changes during core business hours, if not, this is a failure by their boss, and likely their boss got torn a new one by many people because critical systems went down, so now they are covering their butt also by doing a write up.
djholland7@reddit
Respectfully, an IT pro of 20 years should know better. To say it’s their bosses fault is lazy. Sure, if they’re new to the industry, but not in this case.
Just own the fuck up and work to never do that again.
chance_of_grain@reddit
If you get written up for an honest mistake like that I'd leave. Mistakes happen and downtime was super low. Learn and move on
MBILC@reddit
/Devils advocate
Someone who has been in every aspect of IT for 20 years, should know that anything you plan to do that "wont impact anyone" often always does, because of oversight or not properly planning through their change or documenting it.
With that, did they have approval from their boss or who ever manages them? Or is OP often doing changes when ever they want, and it finally caught up to them, with out informing anyone, and it affected critical systems, so likely higher up people were impacted.
We know so many changes can break things and most end users wont flinch, but as soon as something impacts a manager or exec, they want to find the reason and place blame and make sure it never happens again....which likely resulted in the write up....
I learned this myself in my early days, so even when my boss was the CEO directly, I would just send out emails letting people know I was working on systems, and while it should not impact anyone, an FYI and let me know if anyone has any issues. But I also worked to do most changes during no busy hours (we were a 24/7 online poker site so that was hard, but we had slower periods)
chaoslord@reddit
As much as I hate VMware now, we all should worship at the feet of whomever wrote the initial snapshot technology (maybe it wasn't even VMware?). It saves us (sysadmins) SO MUCH TIME AND COVERS OUR ASS SO OFTEN.
MBILC@reddit
Snapshots would not of mattered here. they took down the entire host because they disconnected a datastore that was still in use.
Which, I thought VMware wont let you disconnect or drop a datastore that has active connections to it?
davy_crockett_slayer@reddit
Why would you get a write up? Mistakes happen.
perthguppy@reddit
You’ve gotten complacent. Take this as the wake up call it is.
1) only perform one change on a system at a time. 2) as much as we all hate it, change control would have saved you in this case. The vmotion is fine but something like unmounting formerly production data stores needs a change control and should be done off-hours
MBILC@reddit
This.
We have all done it "this wont impact anyone, well, since I am in here, I may as well do this also"
It seldom ever ends well, as OP found out.
moventura@reddit
20 years ago we were testing using winpopup as a way of sending messages across the network.
I sent a message to a co-worker calling him a dickhead. Accidentally sent it to the entire school I worked at, including the principal.
That was my first write up
Thankfully it taught me early on not to play with prod during business hours
No-Percentage6474@reddit
That wasn’t write up worthy. Maybe a good nic name. We had a guy restore a week old snapshot of our main sql server. Since then he was named the destroyer. The name even followed him to the next job.
MBILC@reddit
Well, it could be write up worth if OP tends to just do changes when they want, with no change control, and no communication out to anyone who might be impacted, and this time it finally caught up to them after 20 years....
Could also be their boss got torn a new one from others about taking down critical systems in the middle of the day and why were they doing such changes mid day.....so now the boss is trying to cover their butt with a write up because they knew what work was happening.
Many potential pieces to this we do not have the information on.
Mymatejon@reddit
This ladies and gentleman is why change control exists!
_Dreamer_Deceiver_@reddit
Haha 10 minutes in 20 years. Is that all?
TEverettReynolds@reddit
WHy would you get wrriten up for an accident? Are you in a regulated industry and now have a fine for being offline? Or did you lose customers? Or did you have to pay because of a breached SLA?
If not, tell them to fuck off; accidents happen, don't sign anything, and stick up for yourself.
If the company didn't want accidents to happen in the middle of the day, they would have policies and procedures to prevent it. Was there any change control on what you were doing?
If yes, the process failed to find this risk of you crashing the systems.
If no, then your company got exactly what they planned for.
~former IT Manager, current consultant IT Infra Proj Manager.
MBILC@reddit
/Devils advocate
Someone who has been in every aspect of IT for 20 years, should know that anything you plan to do that "wont impact anyone" often always does, because of oversight or not properly planning through their change or documenting it.
With that, did they have approval from their boss or who ever manages them? Or is OP often doing changes when ever they want, and it finally caught up to them, with out informing anyone, and it affected critical systems, so likely higher up people were impacted.
We know so many changes can break things and most end users wont flinch, but as soon as something impacts a manager or exec, they want to find the reason and place blame and make it never happens again....which likely resulted in the write up....
Jadoo_21@reddit
Welcome to the club mate
RefuseRound4943@reddit
ouch. IT happens. Best of luck.
opus1one1@reddit
Senior director here.
I would never "write up" an engineer for something going wrong in the course of performing their job.
There's no one to blame here. Something went wrong, and we should seek to understand why through a post-mortem, but assigning individual responsibility to this kind of thing sends the wrong cultural message.
If this turns into anything more than a teaching moment for you, and a better process and controls moment for your team/department, it might be worth a think on whether this is the kind of culture you want to be a part of.
HR is for things like payroll issues. Their opinions on the appropriate response to an infrastructure mistake are about as valid as my dog's.
MBILC@reddit
My first question for this though, did OP have a change request in and approved to do this work by their boss?
If there company has no change policy, do they just do changes any time when they feel like it, with out informing their boss or others?
I learned a long time ago, even with out an official change control system, just a simple email out to stake holders, or potentially affected department heads, was enough to cover my butt if something went wrong, or to deflect people sending 100 tickets in if something did go wrong.
gtripwood@reddit
I took an entire data centre offline for 45 minutes once and didn’t get written up for that. I owned my mistake and called the major incident explained exactly what I did and how I had already put a fix in (ARIN ROA SNAFU) and everything would be back in a few minutes…. Didn’t even get a telling off….
I_am_not_Spider_Man@reddit
Meh, try destroying a whole domain for a company and then having 24 hours to rebuild it.
rrmcco04@reddit
It should only be a Writing up an RCA for this. Honestly, if everyone is afraid an honest mistake, we'd never get anything done! I'm amazed it took you 20 years instead of 20 days, you should get an award!
fishweb@reddit
So let me get this straight….you think you are getting written up for having 99.9999999999999999999999999999999999999999999999999999999999999999999999999 percent up time? I actually don’t think I put enough zeroes in there. But do the math when they come to write you up and have a laugh about it.
ITnerdsunited@reddit
I really prefer having stories like this instead of the rants clogging the mainpage. I'm pouring one for your (not even a bad one tbh) mistake!
MBILC@reddit
same here.
Darkheart001@reddit
Deny everything, blame the network, must have been a blip!
FEMXIII@reddit
"Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand."
Ams197624@reddit
So... What exactly is a 'write-up'? Sorry for my ignorance but I've never heard this term before (I'm from the EU)...
MBILC@reddit
Just an official way for the company to note you made a mistake, it is often used in a 3 strike scenario, to be used to fire someone or put them on parole so to speak, usually for employee's who keep screwing up and do not listen or learn.
In North America, is usually consists of a verbal warning first offence , then followed by a written warning, and if a 3rd one occurs.. that could be the end.
TofusoLamoto@reddit
in Italy we call them Lettera di Richiamo, a sort of formal complaint from your employer.
Ams197624@reddit
Ah, I see. I don't think we have those in the Netherlands.
mvbr_88@reddit
We do :) It's called "officiële waarschuwing"
frosty95@reddit
If they write you up for an honest mistake then they dont deserve you. This is just a slightly more interesting day than normal in the IT world.
Objective-Skill2926@reddit
It's not a downtime if national media dont report it... Rule to live by.
MBILC@reddit
Did you have a change request in approving this work could be done and defining the scope and time frame? Or just decided to do it?
MiDikIsInThePunch@reddit
Sorry to hear about the snafu. Hopefully they recognize all the great work you have done as well as the criticism. If they are hard on you, it’s really up to the company to set standards for change control. Unless you violated a change control policy you should be good as SH!t happens.
juanmaverick@reddit
Mistakes happen… you learn from them and move on. Also, it could have been wayyyyy worse
Ruben_NL@reddit
Nah, you will be fine, especially if you take this as a learning exercise. Don't hide the mistake. You will be fine.
heapsp@reddit
Its poor management to punish people for these types of things. The people who do the most work are just by default the people who will cause the most accidental issues.
Person who does nothing = causes no issues
You don't want to incentivize people to do nothing.
HOWEVER, you need to change processes. This was clearly some important work and its important to cover your ass. "Hey boss, I'm doing X" and if it might result in some sort of downtime or destruction you layout the potential risks.
We would never be unmounting datastores and doing this type of work without everyone being on the same page and without downtime consideration.
TheFumingatzor@reddit
Shit bitch, what kinda rookie numbers are you posting here?
lenovoguy@reddit
10mins likely not enough to justify a write up
dickydotexe@reddit
Shit happens man, like most people said if this is your first time in 20 plus years having downtime you have done well :). I've been doing this for around the same time and had a few myself.
rchr5880@reddit
If I got written up for this I’d hold my hand up and say it was an honest mistake. If they want to push the write up I’d tell them to do one.
The company I am at now didn’t even write me up when I accidentally deleted the whole payroll data store going back to when the company restarted. I was new and still fining my way around the system and was under the impression we had backups. We didn’t!
Was able to get back everything with the exception of the last 2 months. HR manager was crying and everything!
Then were just glad I jumped on it and didn’t stop until we had something again. They took the attitude that it was a mistake. If I did it again however it would be looked at differently I’m sure.
fwambo42@reddit
if there's anything that people reading this thread need to take from it it's this. this was the really bad assumption that affected this scenario
rchr5880@reddit
Without a doubt…. I was new at the company (been there a couple months tops) so wasn’t overly familiar with the setup the previous guy assured me everything was setup correctly. Couldn’t have been more wrong!
atw527@reddit
If you are written up for something like that then I should be in prison.
googlequery@reddit
I wouldn’t write you up for that especially with a good track record and no prior incidents.
Just take accountability, apologize and have some measurable steps you can provide to prevent it from happening again.
Shit happens.
sujamax@reddit
Whatever your change control policy is, did you follow it? (Understanding also that the answer in many cases is, “we don’t really have a set policy…” If so, consider this question’s answer to be, “yes.”)
Also, were you honest abs forthcoming about your actions once you realized?
If the answer to both of these questions is an affirmative, then I don’t see a good cause to write you up. Plus, said write-up would create a perverse incentive and is therefore questionable management.
But also, wait and see. You mentioned that no one has said anything yet. As much as you can, try to live in the, “what can I do now,” as opposed to worrying about what might later happen that you can’t control.
vitaelol@reddit
Hey man, shit happens. Don’t sweat it, it was an honest mistake. You simply learned another way of trying out the HA mechanism.
roncadillacisfrickin@reddit
A Tier 3 engineers said ‘see them folks over there? That’s 10 feet of beard and about 9 production outages; most of our operations procedures were written for them, ain’t that right ‘no push Friday Frank?’
ka-splam@reddit
"Yes, it's right that we don't have a good plan, procedures, test environment, automated staged rollouts, HA, failover, easy rollback, and instead we - almost religiously - ban certain kinds of work on certain weekdays, point fingers at people low down in the command heirarchy, and have a work environment where bullying and jeering and finding people to blam is the norm. Welcome to your new job and coworkers".
SeventyTimes_7@reddit
My only write-up ever was for sending out 40+ switches without them being quality checked to meet a deadline for a massive project.
We had a policy where any equipment going out had to be QC'd by a more senior employee. I was the most-senior for our networking department so I had to send them to either the COO or VP. Told both of them multiple times that the equipment was ready and needed to be QC'd over a week before the shipping deadline, but couldn't get either of them to do it. I was written up by the COO but I refused to sign it since I knew they wouldn't fire me.
nocommentacct@reddit
Not bad not bad
scrumclunt@reddit
I am pretty new in the IT space and fucked up a production firewall. Caused chaos for about half an hour before I got the backup running
HyBReD@reddit
lol, you're fine. you should be bullied a bit to sear it into your skull, but that's about it.
TheGreatAutismo__@reddit
You need to bump those numbers up, I took down production for 49 minutes once by applying a power policy via GPO and not focusing it on non server machines.
Plus side, our domain controllers had a lovely little nap when they went to sleep. It was precious seeing the orange light pul….OH FUCK ITS ASLEEP!
“Autismo!? What was that?” Nothing dear.
Timmaayy562@reddit
If you get written up for an honest mistake that only cost 10 minutes of downtime, I'd look for a better company to work for, because that work culture sounds rough.
WarpGremlin@reddit
I'm on board with the, "are you really in IT if you haven't broken something important in your first 3-5 years" camp.
I'm going on 25 years and I broke something yesterday-- fortunately I caught it, stalled for time, CYA'd and fixed it before the end users could complain.
retroblade@reddit
If you get a write up for that I would be looking for another gig. Shit happens, no one is perfect and this should be a learning experience instead of getting disciplined. Times like this is how you find out how great your manager is. And the fact prod was down only 10 minutes, that’s rookie numbers.
Repulsive_Tadpole998@reddit
I remember an MSP I worked at a few years ago, we had a "I broke all the things" sign that would be posted on who every was the latest to bring down someone's production. That thing was moved at least weekly, I remember when I did it for the first time I was so embarrassed, but we all get good laughs out of them later. 10 minutes isn't bad especially with no actual data loss.
ISP issues are worse than that!
occamsrzor@reddit
That’s not great, but the more responsibility one has, the bigger the effect when mistakes happen, so the scale isn’t abnormal.
Reprimands are also based on frequency, not result (or should be at least). If you had a history of negligence, then a write up would be warranted. If you don’t, then regardless of the seriousness of the result, not write up should be given.
If you get a write up, look for another job where management expectations are both reasonable and achievable
Gh0styD0g@reddit
No, you shouldn’t get a warning for a one off, if you do it twice then you deserve one.
Mistakes happen, we learn from them, if you penalise people first time they never dare to try something new.
mavack@reddit
Where is your change management process, if you don't have one then time to impliment one. Doesn't need to be fancy depending on your size.
If its within your standard framework no write up, if its outside of your change process and you delibrately avoided it then maybe. I have made many errors over the years and always been covered via change process, root cause can be a lot of different factors.
19610taw3@reddit
My first writeup wasn't even IT related!
We were moving offices. One of the movers or construction workers hit the building's entry desk with a cart and damaged it. The desk apparently cost over five figures. I also managed to run over my foot with a cart near there.
The IT manager messages everyone asking if anyone hit the desk. I'm too honest for my own good so I mentioned that I did run my foot over but I didn't think I hit anything. I knew it was right under the cameras so I figured it would be caught on camera.
The holy hell that was released on me. It turned into a full on investigation with HR and the CEO. Like I had stolen from the company. I had an entry and they were threatening my job unless I was honest and told them I did it. I didn't get a promotion that was supposed to happen because my ethics were in question.
That's what hurt more than anything. I'm not much but I'm honest.
Finally , right about at the point where they were going to fire me or make me pay for a desk , I brought up that I was working with someone when it happened. He saw it happen and I definitely didn't hit the desk. Just ran over my foot.
Turns out a few weeks later they found out one of the construction people hit the desk.
aggro_nl@reddit
I see this as nothing happend but a malfunction with the server.
Investigate it for a good hour and report back to the customer that everything is in order now!!
Ron-Swanson-Mustache@reddit
I wouldn't write anyone up for that. Shit happens. Punitive measures in response only encourages hiding.
Now if one of my employees tried to hide something, even minor, that would be a write up.
Doodleschmidt@reddit
25 years in IT and I also just got wrote up. I've forgotten about it, obviously.
SpilldaBeanz@reddit
try taking down the internet for 2 days
Penultimate-anon@reddit
This is where change control is your friend.
vc3ozNzmL7upbSVZ@reddit
What a strange environment.
Totallynotaswede@reddit
Shit happens, getting a wrtite up like a child because of some oversight is crazy. As long as you've vocally made sure people understand that you're aware of your shortcomings and take responsibilty, why would you need a write up?
Mindestiny@reddit
A writeup? For that? Last time I had a guy take down our network (rebooted a switch without writing the run config/confirming the backup was recent) he was terrified he was about to get canned. I laughed it off and the users dealt with it for an extra 20 minutes while we rebuilt some port VLAN assignments that got lost.
Being the cause of your own outage is a right of passage. Wear it with pride. A good boss will use this as a teachable moment on process and procedure instead of some sort of punitive bullshit.
gumbrilla@reddit
Write ups for a tech error, nah, that would be insane.
It would be insane as then the change management practice does not match the appetite for risk.
If you can get a write up for a change going wrong during the day then I'd stop doing changes during the day. Insist on testing the process on a non live environment. Have someone review it. I would then have a statement from the business accepting the risk, every hoop in the fucking book.
A disciplinary notice for a change is stupid.
grapplebaby@reddit
I remember bringing down the network as an intern because I was incorrectly ghosting over the network lol
Sengfeng@reddit
Dang, I recently got written up for failing to do what management never told me was on my list of job duties.
Wish I had downed production, would at least feel like it was justified.
AlaskanMedicineMan@reddit
I will be surprised if you get written up for this. I assisted a top 3 client in removing an unknown server after stressing that removing an unknown server will have unforeseen consequences, letting my manager know, getting my contact's company CEO's written approval to proceed.
Well it turned out that was their backup internal DNS server and their current primary DNS server had failed over quietly years ago without anyone noticing. This was literally my first call of my first day of being on the phones, it took down everything for that client. I ended up managing a crisis bridge call for 4 hours as we got the server back online and reconfigured with their security team.
My manager said I passed the sink or swim test and I did it on my first day. To this day that was probably my favorite overall job. I didn't get a writeup and in fact the client commended me for the multiple warnings I gave them and I ended up their preferred point of contact for server changes.
redbrick5@reddit
If you haven't already, just get in front of the issue. Communicate. Take ownership. Start writing a RCA. People will respect that and soften any blow. Likely to take a ding on perf eval for screwing up uptime numbers for the CIO.
wernox@reddit
Unless you tried to hide your foul up, I can't see why you would be punished.
dritmike@reddit
Bro. How long have you been around and not caused a critical system outage?
Cmon.
immortalsteve@reddit
I once brought down our entire forest in the middle of the day for a solid hour or two due to an oopsie woopsie, so you're fine OP. Anyone writing you up for this is missing the opportunity to update documentation and learn from it because you're just gonna be pissed off about it and want to move on.
thortgot@reddit
As many others have stated, this isn't a write up. This is an opportunity for better planning and documentation of processes.
BoltActionRifleman@reddit
Calls come in asking “what happened?”. Reply “We’re looking into it but it appears there was some kind of a power fluctuation that caused the UPS to malfunction and hosts went into protection mode, thus shutting down”. Or something along those lines.
Sgt-Buttersworth@reddit
Seriously if your boss writes you up they are an asshole. Shit happens. If you make the same mistake again and again sure, but a one off issue... Jeeze. If my team makes a mistake, my expectations are clear, own the mistake, fix it, and learn from it. And accept any friendly ridicule in the future about said mistake from the other members of the team.
TheBestHawksFan@reddit
Why would you get written up for this?
spazmo_warrior@reddit
Because some IT managers/supervisors are assholes. Ask me how I know.
xboxhobo@reddit
Dude I made a mistake that we had to tell all of our clients about. One of them almost sued us. I was not written up or even admonished. You'll be fine.
Cotford@reddit
So you wish to restart (the domain controller) now or later no no no NO NO NO NO FUCK. As my phone melts itself into my desk from all the ringing.
DSMRick@reddit
Mistakes are a normal part of any work. One of the things I used to say is that when you work on projects that are 10s of millions of dollars, 1% mistakes cost 100s of thousands of dollars. I'm not sure making a mistake is cause for a write-up. As a company you might consider why you are doing this sort of thing manually, but I think you would find yourself in good company with the current state of systems admin.
_TryFailRepeat@reddit
No offense, but if you havent cause days of downtime by now, you’ve been faking it all this time. Noob.
cagedbleach@reddit
21 years in IT. I’ve had legal hold AD accounts permanently deleted, lost all of our direct attached iSCSI shares because I didn’t realize the backup software wouldn’t backup those drives, failed an Intune deployment to the entire company, set a login message that everyone’s desktops were subject to monitoring…..hell, just this month I’ve taken down access to the largest file server in our company 3 times trying to demote legacy domain controllers.
It’s all in good fun but you really haven’t lived in your IT career if this is your first major screw up.
WaldoOU812@reddit
WOW...
20+ years and not a single write up? I seriously want to know who you were working for during that time.
mmiller1188@reddit
Over twenty years and this is the first time you broke production and it was only ten minutes?
I'm envious.
Remindmewhen1234@reddit
Is the work being done in Production?
If yes, do a freaking Change Control request.
That is my response when someone asks about doing work work during the day.
Crot_Chmaster@reddit
If you get written up for this, I am officially offering, at no charge, to call your boss and tell them to lick my grundle.
Mistakes happen. 10 minutes is nothing. Your bosses need some GD perspective.
vlku@reddit
I'd laugh the fucker handing me a write up for this all the way out of the office... GTFO. Everyone makes mistakes
ElectricOne55@reddit
Ya good attitude bro. I've made this similar mistake too before shutting down a server. I think what matters more is working together to solve the issue. But, in tech you have so many know it alls that don't want to help and expect you to "show initiative" but when something happens they wanna throw you on the bus. At the same time there not there for help.
Bradddtheimpaler@reddit
You caused a ten minute outage and that’s it? No track record of reckless stuff? If they write you up for it I feel like they’re being dicks. I wouldn’t come down with official disciplinary stuff if one of my techs caused a ten minute outage unless it was a recurring problem. Your explanation makes perfect sense; hard for me to find a reason to think you were being irresponsible. You fixed it right away and more important took responsibility for it and know what went wrong.
msalerno1965@reddit
If you even get written up in the first place, on the next job interview:
"I know how to handle HA and failovers in vSphere. Yeah, we had an incident one day, and the HA I setup actually worked, so..."
I mean if you hadn't done your job right to begin with, it'd have been much worse.
It's an "HA event". yeah, that's the ticket.
itguy3001@reddit
Only way I’d write up one of my guys is if they made the same mistake repeatedly or realized the mistake and then left without seeing it through.
corbei@reddit
This guys just bragging
slayermcb@reddit
I knocked down the network for an afternoon by messing up a vlan setting. Than I got thanked for fixing it even after admitting I caused it.
Superguy766@reddit
Did you communicate these changes while in production to management?
Always ask yourself if the change you’re going to make might affect multiple users. If it does, always communicate it to management to cover your ass.
ALWAYS CYA!
zeliboba55@reddit
You are a hero because you fixed production in 10 minutes after a major bug. The rest is irrelevant and no one needs to know.
Redemptions@reddit
Wait, you get written up for making a screw up? Did you try and hide it or blame it on someone else?
BadSausageFactory@reddit
I act like they're giving me an award, because in the corporate world mistakes are expected and failure is a form of experience.
I am fortunate to work at a place that does not look for scapegoats and assumes everyone has good intentions.
TenaciousBLT@reddit
Yeah you should never get a write-up for a simple human error that had minimal impact. There's being negligent and there's making an honest mistake that was easily rectified. If you had a pattern of issues that's one thing but this should not be unless there are other issues cause for a write-up.
ianpmurphy@reddit
I got written up once years back for actually fixing a system. This was back when PCs were only just turning up in companies. We had a little tool which ran continually and did some sort of data transform on files. Unfortunately the guy who set it up didn't know much about how to optimise DOS memory usage when you had network drivers loaded. This is going back a long time. This just happened to be something I had a lot of practice with. However it wasn't my system and I didn't have to deal with it. Me boss had actually told me to never touch it as it was delicate and failed a couple of times a week. It's Friday night, I'm still there working on other stuff, late as usual, when I notice the PC is displaying an error related to lack of memory. I took a quick look and realised I could fix it permanently with a tiny change. I implemented the change, having taken a copy of the startup config, called the guy who's job it was to deal with it, explained the situation and we checked over the phone that it was operative again. He was happy as he was in a bar somewhere. Monday morning I get pulled into a meeting and handed a written warning. It never failed again and I never bothered to help anyone out again. If something failed I just left it and intervened when I was called.
That was the most dysfunctional place i ever worked at. My boss and everyone above him were probably the most incompetent people I ever had to work with. None of them understood even the most basic elements of IT.
CheeseburgerLocker@reddit
So you didn't get written up, as the title states. So not only did you cause Penelope in finance to leave for lunch early, you're also a liar!
artifex78@reddit
We should write them up for that.
0dev0100@reddit
A team I was previously on crashed prod for a day because we forgot to deploy an updated cache of data before the old cache expired.
The data source couldn't handle the 400k requests per minute (third party software we had to use) and promptly crashed. This killed the desktop application that ran the client facing stuff.
Health software used to assign priority to patients that was deployed on a 12 hour time difference so no devs were available overnight.
We also couldn't access prod because of legal and data security reasons. I think we could watch someone deploy it through 4 layers of remoting into servers.
10 minutes is nothing.
whitewail602@reddit
I would immediately start looking for another job if I got written up for making a mistake.
bamacpl4442@reddit
Last night, I migrated an on-prem distribution list to o365 and broke single sign in for all of our apps. Apparently, some genius decided to use a distribution list for those as opposed to an AD security group like sane people would do.
The fix was quick, but it impacted a bunch of people. C'est la vie.
ChopSueyYumm@reddit
You took down production for 10 minutes and successfully recovered. So, what’s the problem here? Thanks to your infrastructure, everything was restored within 10 minutes. Simply respond that, over the course of a year, a 10-minute downtime equates to a 0.0019% outage.
These-Maintenance-51@reddit
I was a storage admin and we had a site close. This was awhile ago so we had NetApps. Well, the site closed so I deleted their file shares. After work about 6:30, I get a call from my boss... the site closed but some of the people were still working from home.
After some emergency support from NetApp, turned out there was some grace period where you could undo deletions basically. I had no idea about this, thought the files were gone, and I was getting fired.
Ahimsa--@reddit
Your boss isn’t Frank, is he?
mikeyb1@reddit
Unless you're generally not allowed to perform these activities during the day, I'm not writing you up for that. Write-ups are for disregard of procedure, not for technical mistakes. But if there wasn't a written and accepted procedure for that, you can bet your ass I'm asking you to write it. lol.
Titanium125@reddit
If you’ve never broken anything, you don’t have access to the important shit yet.
Mudman171@reddit
Can't make an omelet without breaking some eggs.
ZaMelonZonFire@reddit
Hey, shit happens. Just own it and move forward. We all fuck up sometimes.
Tarc_Axiiom@reddit
We learn in the fire. You will be stronger now.
Don't ever work in a place where you can be "written up" like you're in fucking high school.
drinianrose@reddit
In general, people need to be allowed to make mistakes. There's a difference between making a mistake and negligence, and assuming we are hearing the whole story here, this sounds like just a mistake.
If mistakes become a pattern, that's a problem, but one cannot be expected to be perfect all of the time.
My recommendation - be brutally honest about what happened. Trying to cover it up or sugarcoat what happened is worse than the mistake itself. In my career, when I've made a mistake I've tried to be proactive about it and letting everyone know that it was my mistake. There's no need for you to be written up if you own up and admit to the mistake in the first place.
Good luck!
RCTID1975@reddit
I wouldn't write you up for that.
Someone above me might want me to write you up for that, and I would, but I'd also tell you not to sweat it.
What happened sucks, but mistakes happen, and as long as you don't make that mistake again, we all just happily move on. Maybe joke about it once a month or so, but otherwise it drifts off into company lore.
daithibreathnach@reddit
I brought down an entire hotel by rebooting their "not used" DC VM.
TwiztedTD@reddit
I hope you don't get a write up. Its not like you broke HR policies or anything. You were doing your job and made a mistake. I personally would be put off by my employer if the wrote me up for that.
RFC_1925@reddit
The only people that don't make mistakes are the people that do nothing at all. As long as you learn from mistakes and don't repeat them, then are excelling at your job.
NeverDocument@reddit
It sounds to me you identified a potential down time causing issue, fortunately you were able to resolve in 10 minutes and here's your plan to make this system more resilient so in the future it is no longer a concern.
kados14@reddit
If they write you up for that, I'd be surprised honestly
secret_configuration@reddit
So the lesson learned here is to be perfect and don't make mistakes ever?
Shit happens, mistakes happen. If this is the worst thing you did (took production offline for whole 10 minutes) in 20+ years you are doing better than most.
bmelz@reddit
Let me guess, change management and ITIL framework is useless?
Advanced_Day8657@reddit
The host had an error, shit happens
Fantastic_Explorer@reddit
If you get written up for that, I’d be having a serious meeting with the boss, essentially saying it’s the first time in 20 years, you want more? They should be absolutely grateful that you have been so amazing for 20 years, something like this is just an honest mistake. If people can’t make mistakes, particularly small ones like this, then we have lost the humanity of working.
Happy_Kale888@reddit
If you don't make mistakes are you really doing anything.
gracerev217@reddit
Came here to say the same as others, rookie numbers. Learn and grow, you wont make that mistake again!
rdldr1@reddit
I was recently written up for the first time in my 15ish year IT career. I did not die.
graysky311@reddit
Don’t sweat it man. All of us seasoned IT pros have all been there. It’s important that you learned from your mistake, and super unfortunate that it had visibility during business hours. I feel you on that one. I once rebooted an entire fleet of winterms (300+) during the busiest part of the day because I clicked a single button accidentally in Wyze device manager. Thankfully, because we were using Citrix nobody lost any data. They just got disconnected from their session and everybody was able to get reconnected, but it was a blackeye for me and my department. The lesson I learned is don’t touch anything in production during business hours.
jcpham@reddit
Congratulations brother! Laminate it and post it on your office door as a warning for all ye who may enter
Ramdogger@reddit
The comments here are giving this youngblood some hope for when I inevitably screw up something major.
AgsAreUs@reddit
As a sysadmin, is it common to get "write-ups" for making technical mistakes? If so, that's insane. Everyone makes mistakes on the job. I have taken down prod twice in my ~20 year career. Once for some code as a developer and once as the person building out the AWS infrastructure. If I was presented a formal write up in either case, I would have refused to sign, etc.
SpotlessCheetah@reddit
Write up for what? That's not exactly your fault that the system tripped on itself.
Different-Top3714@reddit
Why didn't you do a change control. It provides complete immunity!
Skyyk9@reddit
I did do says admin work in the past. My first write up was as a QA person working into the night with a developer to get a project out the door.
I failed to test EVERY link in a Go Daddy page. Yes I was working for GDDY.
I got written up. Not the development person.
jclind96@reddit
what the fuck is a write-up? are you in elementary school?
DrewonIT@reddit
I would only write my team up for this if it blatantly disregarded policy or protocol.
Accidents happen. You owned it, and to me, that's more important.
Now, if it happened a second time... 😉
warpsteed@reddit
Getting written up for this would be stupid, unless it's not the first time you've done it. Everyone makes mistakes regularly in IT. We'd all be written up a million times if all it took were a mistake anyone could make.
P00PJU1C3@reddit
That's it? Fuck destroyed a customerr's SCO Unix server that ran their entire company by using the wrong flag on a command. Broke the fucking kernel.
Unable-Entrance3110@reddit
So, what's the end effect of this "write up" incident? Is this going to follow you for the rest of your life? Your doctor is going to open your file and be like.... ooof, looks like we are doing this thing without anesthesia!
VoicesInM3@reddit
write ups are hilarious to me.
You think your written down remarks are going to scare me?
-eschguy-@reddit
At my place they don't consider you true IT until you've brought the company down.
jeetah@reddit
Imagine getting write ups
Otto-Korrect@reddit
Document it as a successful disaster recovery test.
Brave-Campaign-6427@reddit
10 mins is slightly worse than 99.999% unscheduled downtime. I'd be okay with it.
lowNegativeEmotion@reddit
When you get the write up, don't somehow reboot everything again. Also, we talked it over and Jim's going to be prepping the coffee at the end of the day. It's really unlikely that you could screw that up, but we just can take the chance.
Karen doesn't want you parking near the power pole either.
fraiserdog@reddit
At my job, that would have been a resume generating event.
I got suspended for 3 days for missing ONE failed backup for 700 servers.
Networker1980@reddit
If it bothers you, take the proactive approach and tell your boss that you understand the mistake, the reason it happened, and the situation he was put into and that you will ensure it won’t happen again in the future. I bet that write-up is then history.
Moontoya@reddit
Shit happens, people err
If you owned up and sought to make right, I'd be pleased not pissed.
Accountability and responsibility are massively valuable in people
At most, I'd have a private chat
Remember, praise publicly, critique in private
Tlargojones@reddit
If this is write up worthy, then I should be on the street with a cardboard sign.
JimmySide1013@reddit
“Write up”? A healthy work environment would dig into what happened, appreciate your ownership of the issue, and find a way to share this with the rest of the team so it can be avoided in the future. Live and learn, not revoke someone’s hall pass. This isn’t middle school.
bmanxx13@reddit
Doesn’t warrant a write-up imo. During RCA is when you can get chewed out by mgmt if it’s bad enough. Good leadership simply want to know what, why, and preventative steps going forward then they move on.
cammontenger@reddit
Why would you get written up for this....?
JacksGallbladder@reddit
If you're written up, take that as a red flag.
Honest mistakes are honest mistakes. Corrective actions should be for repeat behavior.
Everyone is allowed to fuck up. The penalty should be fixing it, and learning from it. That's all there is to it.
WolfskinBoots@reddit
Hang in there man, I once simply performed a test to check if the Windows clustering was working. Some error happened and it took down our entire SQL Enterprise environment. The whole day was spent with Microsoft and every important person in the company listening in on the call between me and the Microsoft engineer. Down time was about 6 hours.
EastDallasMatt@reddit
I can't imagine writing someone up for causing a problem that was recoverable in 10 minutes. I also can't imagine working at a place that would.
KJatWork@reddit
If anything, it sounds like this is more on the management for not having adequate documentation and training in place for you to be successful. I've seen far worse in very large companies, and this wouldn't be a writeup unless the Admin had done this same thing a couple times already and was disregarding instruction and workplans and the write up would be about that disregard.
progenyofeniac@reddit
In my org, if you'd had approval for the change, there'd be a root cause analysis performed, they'd put some steps in the documentation on how to avoid this, and the world would move on. My manager would remind me to be more careful in the future, etc.
If this isn't a pattern for you, no reason to expect that to be a write-up.
hola-soy-loco@reddit
First thing you do when setting up a k8s cluster is turn off swap. So sounds like the true blame is on the guy who set it up 🤣
backcounty1029@reddit
If you were my employee I would ask you to present your mistake, how you fixed it, and what you'd do in the future to prevent it to the team as a learning experience. There would not be a write up for this mistake, although we'd discuss it and determine how to present this to the team as a learning experience. If you aren't down for presenting to the team then I'd do it in a constructive way as to not demean your integrity and experience. These types of exercises build internal trust and confidence in group building, culture, and overall team experience/knowledge.
I'd also take the time to congratulate you on recovering quickly and owning your mistake and reiterate this in the team meeting/discussion with your approval. Employees that can own mistakes, use them to teach others how to prevent the same or similar mistakes, and move on are the ones I love to lead.
Sorry you made a hiccup but these things happen. If anyone tells you different, they are wrong.
WickedHardflip@reddit
If I got written up for all the stuff I have done over the years I'd be cleaning toilets. Everyone makes mistakes and all we can do is own them and be honest about how and why they happened. If management wants to write someone up over it, that signals it's time to look for a new job.
tacotacotacorock@reddit
How about you be proactive instead. Have you written up a root cause analysis and an action plan to prevent this in the future? Sounds like you lack documentation..... Get a head start on this this morning so you can present it to your bosses when they ask what the fuck you did lol.
Xceptiona1@reddit
I look at things of this nature through the lens of learning experience. On the extreme would you fire that employee and hire someone new, or do you keep the employee who's learned that lesson or get the new guy who has yet to learn it.
iwishiremember@reddit
Don’t stress too much (not sure what your age is and I assume you live/work in US?).
I once did rm -rf in root of customer’s VM because I was stressed and I thought I am in different location (customer was pissed) but mgmt. just called me for a review and we had perfect backup of the VM.
It was a mistake and those happens to all of us. You learned your lesson.
Remember this from Buddhism:
“This too shall pass.”
bisskits@reddit
Accidents happen. Especially in risky environments like IT. You did nothing wrong, don't sign anything!
frogmicky@reddit
Congratulations you've lasted longer than I'd. I'm working on my second write up.
cronhoolio@reddit
Don't lie about it. We make mistakes. I was written up for the first time since 1999 in September.
I saw a guy lose his job by lying about a small mistake he made that was clearly logged.
I took down a call center one day during peek hours. Oops, my bad, I just wrote up the root cause and owned it. I've been consulting for that company for ten years now
watchoutfor2nd@reddit
A better organization would see this as a flaw in their change management process and handle it through a retrospective to discuss how to ensure it doesn't happen again.
Ziegelphilie@reddit
Ten minutes of downtime gets you a writeup? Coworker of mine deleted production once and all he got was a t-shirt
Tymanthius@reddit
Um . . . if you get written up for that your boss is a fucktard.
Yea, you made mistake. But it sounds like it was an honest one and you'd tried hard not to.
And 10 minutes is nothing, even if annoying.
Now, if you had a history of this kind of mistake, that's different.
dvb70@reddit
The times I have shot myself in the foot for doing some clean up of stuff that was no longer used.
blue_canyon21@reddit
Going on 22 years here. No place I've ever work would do a write-up for 10 minutes downtime... let alone an honest mistake. Your workplace seems very uptight.
EEU884@reddit
A write up for an mistake? Oh hell nawwww espcially if you put your hand up to it and took responsibility it should come with a life time of mocking but nothing official.
Kashek32@reddit
If this is all it takes to get a write up at your workplace, run fast and far my friend. This sounds like IT on a Monday to me.
Patrickrobin@reddit
Sounds like you had a tough day, and it's understandable to feel concerned about the potential write-up. Mistakes happen, especially in complex IT environments, and the important thing is that you've learned from the experience. Sharing your story and the lessons learned can help others avoid similar issues in the future.
BrainWaveCC@reddit
What does a "write up" mean in this context?
Because I'm pretty sure that everyone responding has a totally different view of that that phrase means, and it is causing a ton of angst for many.
wjar@reddit
i wouldnt even attempt to do a no-downtime expected type of project during production hours, who signed that off?
HTDutchy_NL@reddit
Only 10 minutes of downtime? Well you used up your 99.9% uptime SLA for the week but that's all the damage there is. No data loss, everyone got a coffee break and could continue about their day.
If this is the biggest cockup you've made in 20 years I'd say you've earned a reward.
Sleepytitan@reddit
Sounds like a successful test of your recovery processes to me. That’s a high value day.
ISeeDeadPackets@reddit
This reeks of someone getting their butt chewed from on-high and not having the cajónes to defend their people. Your problem is that it was a very visible mistake, so "something" needs to be done.
Illustrious-Chair350@reddit
My first write up was for a misspelling on the sign outside, thought I was going to get a second one when I informed the person doing it that it was misspelled in the email that i copy pasted from them.
ConstructionSafe2814@reddit
If they had to fire me for every mistake I made, I'd be fired well over 50 times (if not more) by now.
I don't think they will fire you. And if they will, it's not justified IMHO. I would find it justified if it's the 3rd time this week that you caused downtime due to this exact same issue and you had a meeting with someone explaining why this happened and what you overlooked and what checklists to follow for the next time and you still managed not to do so.
But that's borderline "sabotage" "recklessness" and a whole other story. What you had was just an honest mistake. Nothing more, nothing less.
illicITparameters@reddit
I’m a manager, and there’s ZERO shot I write someone up for a simple mistake. A talking to about being more careful, yes. But most folks already feel bad for making a mistake, so beating them while they’re down isn’t helping anyone.
I’ve accidently done worse, and literally no one remembered 72hrs later.
Lando_uk@reddit
I'd have thought unmounting datastores should be done under change control. If you had an open RFC with everything you were doing, then you're covered.
If you were doing potentially risky things outside of a change window, then a "write up" or slapped wrist is probably acceptable.
GullibleCrazy488@reddit
Impressed it only took 10 mins, esp with a phone system involved. The knock on effect usually travels far and wide.
coldbeers@reddit
Oh I’ve dropped far bigger bollocks than that lol, try not to feel bad about it.
Common_Dealer_7541@reddit
If the company is ISO or ASA certified, you will likely have to create an incident report and perform a post-mortem analysis. That is not a write-up and is not used for placing blame.
VMWare is safety-conscious. It should not have let you dismount an active volume. Of course, if you were in the shell, you can push past any of that, but if you were in the GUI/web interface you would have been warned at least once before the last umount occurred. That’s why you need an analysis. Find out what went wrong.
You did great. Donny let the bastards get you down.
EpexSpex@reddit
Thats not even a slap in the wrist mate. I wouldn't even sweat it.
At most its an MI thats been logged and complete with a total downtime of 10 Minuets which on a MSR (Monthly service review) Would probably only show as 0.0001% downtime.
Dreadedtrash@reddit
So you're going to get a write up because of an accident that you learned from? Sucks to be you. I remember we were moving to a new SAN and the old one had 2 controllers for redundancy. My boss asked me to figure out what one was the primary so we could disconnect the backup for rack space. I just pulled the plug on one and told him that is what redundancy is for. He laughed, I laughed, all of the VM's went down and recovered on their own because we were busy unracking a SAN controller.
TotallyNotIT@reddit
I can't imagine a disciplinary action for something like this. A postmortem to discuss failure of change control, abso-fucking-lutely. But a write up? I haven't even heard that phrase since I worked fast food in college, let alone imagine having someone tell me I'm getting one.
Mackswift@reddit
This isn't write up worthy. It's an honest mistake and a chance to examine the process and lessons learned and improve.
mkmrproper@reddit
Sysadmin sure shaped us into perfectionists. Make a mistake and you're expecting a write-up. I hope we don't do this to others because they don't meet our expectation.
Site-Staff@reddit
Shit happens. 10 minutes downtime isnt that bad.
tgreatone316@reddit
You shouldn't have gotten written up for that.. I have done MUCH worse. Taking down networks supporting 10,000 users in the middle of the day, etc..
Also, it is kind odd that is let you even attempt to do it. With storage ESXI tries to do a lot of fingers and toes checks and will fail that action if it even thinks a datastore could be in use.
Your manager kinda sucks for doing this and should have your back. They should have your back.
Secure-Director-7309@reddit
Tell your boss to automate that stuff if they’re aiming for 100% success rate - otherwise, they’ll just keep handing you the same job until you mess it up again, so they have an excuse to fire you.
overworked-sysadmin@reddit
Dont let it get to you, you learnt something from it & no major damage was caused.
I once accidentally replicated a DHCP scope & caused a network outage for \~20 minutes, in the middle of the day.