Caused a big outage at work- how do I move forward?
Posted by VOXX_theLock@reddit | sysadmin | View on Reddit | 726 comments
I was configuring a port on one of Cisco switches. I realised after configuring the port and running write memory (first mistake) that it was the wrong port.
Checked the label for that port, said ‘phone-pc’ this would mean it’s configured as a trunk with 2 VLANs, one of them being set as a native. So I set it as I normally would, and then configured the correct port.
Suddenly get a bunch of phone calls. User PCs slowing down, connections dropping. Emails from Darktrace coming through saying multiple IPs on our network are running vuln scans.
My boss was in a meeting with other high ranking members of the company. He knew what it was pretty quick- an L2 Loop. Turned that switch off & everything came back on, I went back & reverted the changes and everything’s working okay. But I still caused 30 minutes of downtime, during a big meeting with higher ups, and on a Friday afternoon.
Feel like an idiot, I’ve been in the job for a year, finished uni a couple years back. My role is an IT Systems Engineer, but closer to T3 help desk/Hardware tech. First experience with an l2 loop.
It’s knocked my confidence quite a bit if I’m honest, I’m not sure how to move forward in the same role.
havpac2@reddit
Every single one of has done this. At least once
N3dr4@reddit
Once on a Cisco I wanted to add a blank, I forgot the add in the command. This was on my uplink to the data center, I had to go there physically to solve it.
I didn't ask question, saw the boss and told him, we have an outage, it's my fault I'm running there to solve it.
I learned and it's all good
toadfreak@reddit
I'm just going to leave this here - "reload in 5"
ItsMeMulbear@reddit
Modern solution
toadfreak@reddit
You got me. Its been a while.
i_said_unobjectional@reddit
When I first saw the Juniper "commit confirm" command I thought that it was dark sorcery.
toadfreak@reddit
me when I learned "reload in x"
elkab0ng@reddit
⬆️
I rarely needed it but those few times I did it saved me a lot of high-speed driving in the middle of the night.
SnooWalruses7416@reddit
Middle of the night? Im sure ive broken no less then 5 laws in broad daylight because of this.
lazyhustlermusic@reddit
Everyone needs more ‘commit confirm’ in their lives
MorseScience@reddit
I've never caused a problem like that. Want to buy a (suspension-type) bridge?
Spaceman_Splff@reddit
You this was my first outage creation as well. Never made that mistake again. Fast forward 10 years, I’ve moved to security side of the house, and my old boss did that same mistake causing an outage. It happens.
DoctorOctagonapus@reddit
I swear IOS must have been designed by one guy, possibly neurodivergent, who wrote the syntax entirely on his own based on what he thought made sense, and Cisco did zero peer review.
gtripwood@reddit
I took an entire DC off the internet once making changes in ARIN. That was a sphincter puckering 30 minutes undoing that let me tell you.
Away_798@reddit
Yep, been there, it's just something everyone has to experience at least once in your career. That panic while the phone calls start coming in.
LostSpaceQ@reddit
At best once, most it's a monthly thing from my experience. I mean not necessarily the exact same thing, but something that bring down a network.
CobblerYm@reddit
Ever tried to plug a standard RS232 cable into an older APC UPS? Yeah they shut the entire device and all outputs off when you do that. Guilty!
clarkos2@reddit
When I discovered that, it was the last APC UPS we ever purchased.
awesome_pinay_noses@reddit
That's just shitty design.
DoctorOctagonapus@reddit
Honestly I wonder if they engineered it like that on purpose to force people to buy their cables.
"Buy our cables. Not only will a generic one not work, we wired it to the EPO circuit."
clarkos2@reddit
Absolutely they did. This is the same company that used 10p10c RJ45 connectors for USB because fk you that's why.
ipzipzap@reddit
Every single APC UPS came with a cable so I don’t know why we should have bought more of them.
frankentriple@reddit
I did support for them for about 6 months back at the turn of the century. The answer is yes they did. They actually bought the cables themselves in bulk and had an assembly line that took apart the backshell and swapped 2 pins and put it back together.
unstopablex15@reddit
What a shitty proprietary thing to do.
i_said_unobjectional@reddit
I had a bunch of configurable db9s ( https://www.amazon.com/ANMBEST-Ethernet-Adapter-Modular-Converter/dp/B0CBJYM9FZ ) so that I could could have a little baggy of db9s so that I could configure anything with the addition of an ethernet cable.
lemachet@reddit
Id love to hear any other solid engineering solution as to why you must use their cable.
i_said_unobjectional@reddit
You didn't, the pinouts were easy enough to figure out with a tester. https://parts4pc.com/IEEE-1394_Cables/IEEE-Product_PC_Cable_Tester_BNC_DB15_DB9_DB25_RJ45_USB_IEEE-1394.html
But usually, why bother?
Back in the the day serial pinouts were the wild west. They seemed to throw whatever leads were closest on the chipset at the closest db9 or db25 pin for console ports. Making a new cable to ship with your router/switch/gizmo was cheaper than adding a new layer to the circuit board. Why did the cisco rj45 to db9 roll the pins rather than use the null modem pinout? Who the fuck knows? I was just happy it wasn't a fucking manchester cable.
jhdore@reddit
*proprietary design although they’re virtually synonymous.
RevLoveJoy@reddit
Fortunately they've been doing this for about 40 years.
lebean@reddit
It's the worst possible design you could ever make, "Let's make this connection a very standard interface that everyone is familiar with and there are hundreds of millions of 'wrong' cables in IT departments all around the world, but if you plug anything but our very special cable in the UPS drops all loads and goes dark!". Just aggressively stupid and poor engineering on APC's part.
That's why APC units should be blacklisted from any possible purchase consideration. Every other vendor doesn't set intentional booby traps that will blow up your systems.
MorseScience@reddit
There was a time, not long ago, when almost EVERY cell phone brand had a different charging connector (and sometimes multiple connectors within brands). And almost EVERY flat-screen TV needed a $200+ custom wall mount.
Luckily that almost completely went away.
And then the EU forced Apple to stop with those f**king Lightning connectors.
Just two examples. There are, of course, more.
Conscious_Cut_6144@reddit
Holly shit I had no idea that was a shared experience. Been there!
stvdion@reddit
I have done the same exact thing - not a good feeling!
Ok-Bill3318@reddit
It’s not just older ones. I did it this year with one installed in December.
mickert_dev@reddit
Guilty! 😉
Specialist_Hornet798@reddit
Just learned that the hard way a few months ago 😅
Mr-RS182@reddit
I did this to a APC UPS by plugging in a standard Ethernet cable.
cjwebster93@reddit
This bit me the other week…
_3470@reddit
i did this as a junior in the middle of a work day. i was nervous as hell explaining to my manager/director but he was just confused as fuck why APC would set it up like that. he asked me to tape over the serial ports on them after that
Oujii@reddit
The only reason I never did this (almost did though) was this subreddit, so thank you r/sysadmin!
chipchipjack@reddit
Dude I didn’t know that this was a shared experience! My second week of my first tech job they sent me around each closet to make sure the configs were right on each UPS (I didnt know I could just go to the web portal to do this) and I plugged my serial cable into the temp sensor port and everything shut down. Took down a 100+ computer testing lab that was running finals
Terrible-Category218@reddit
There's nothing quite as unsettling as a suddenly silent data center.
splntz@reddit
I have been in IT for 25+ years and never needed to plug an RS232 in to an APC unit, but now that i'm reading this I'm glad I didn't.
One_Monk_2777@reddit
This was taught to me the first day on the job while the team had just learned this the day before (the hard way).
i_said_unobjectional@reddit
Sun Sparcs used to kernel panic over the loss of the keyboard if they got unplugged. Only happened to me a half dozen times, shutting down million dollar systems while pulling cables.
brisquet@reddit
Even new UPS’s do this! Dang proprietary cables. I took down the boardroom equipment when I tried to configure it. Thank goodness there was no meeting at the time.
SAugsburger@reddit
I knew somebody that did that, but wasn't so fortunate in the timing.
VexingRaven@reddit
I remember reading this on the Reddit way back as a wee junior in a place with an APC UPS. I had wanted to check some setting on it and didn't have an APC cable. I did have one of those wonky "make your own pinout" cables intended for exactly this scenario, but I was so scared of screwing it up and bringing everything down that I just left it alone lol
GrapefruitOne1648@reddit
You didn't stay 'junior' for very long, did you? That's a very senior thing to do
GrapefruitOne1648@reddit
a regular network cable will shut it off too when plugged into the data port
tistom@reddit
I did this during production hours.
firestorm_v1@reddit
I found one of those proprietary serial cables in a parts bin at work something like 20 years ago. It is WELL labeled and I will probably die with the damn thing.
The cable comes in clutch from time to time for configuring IPs on the APC management cards.
Death_by_carfire@reddit
done it. felt sick to my stomach.
tdhuck@reddit
I want to know which engineer at APC thought that was a good idea.
BTW, I know exactly what you are talking about, all I can say is thank god for dual power supplies.
lemachet@reddit
It was the CRE - chief revenue engineer.
His job is to engineer ways to improve revenue
Computermaster@reddit
My money is on an engineer being forced to implement some clown suit's idea.
lemachet@reddit
Yea but that's not /your/ fault.
APC are assholes for that design
waflman7@reddit
I was so annoyed when I found this out. Also, happy because it meant we didn't have a faulty UPS like I thought.
anonymousITCoward@reddit
I deleted everyones email... account...like all the accounts in a tenant...
bk2947@reddit
The sound of that clunk and then complete silence in the server room …
Nutlink49@reddit
...on a Friday afternoon while you were daydreaming about the joint you were going to smoke with your girlfriend before the NOFX show in Tampa that you've been looking forward to for the last month, which, just like your PSUs, went up in smoke.
Suspect6307@reddit
The onosecond. The moment when your stomach drops and you realize you just made a lot of work for yourself.
Honky_Cat@reddit
The fact that APC made their port a DB9 is absolutely weapons grade stupidity (or trolling).
Then they double down with the next gen of console cables - I believe it’s USB-A to RJ45.
themaskedewok@reddit
Feels good to see how many others have done this. I of course saw a post about it after I took down an entire building and pissing off the CFO.
bgarlock@reddit
Been there. Back before virtualization was a thing, took down production. This made me follow vendor documentation more closely, because it was spelled out. It's just not intuitive and poor design.
BokehJunkie@reddit
Learned this lesson the hard way once. lol.
PoeTheGhost@reddit
Yep.
CobblerYm@reddit
Right! Mine was literally my first day. They sent me over to diagnose why some of the equipment was rebooting in the middle of the night and I figured it was probably just really old batteries and the self-test would kill it all at 2:00am. I had to get someone to let me in the closet since I didn't have keys, it was a dental clinic at the university, and I was in there for about 2 minutes before I plugged it in. Then the deafening silence... The loudest silence I've ever heard. Followed by a knock on the door from the lady who let me in shouts through the IDF door "Umm, our computers just lost connection" lol.
Oh well, I've been there for 13 years now and nobody remembers it but me I think! But boy do I remember it.
AtTheRogersCup2022@reddit
That one fucking kills me
oilpanhead@reddit
I lost an Oracle database!
anomalous_cowherd@reddit
Including your boss. That's how he recognised it so quickly.
Qronics@reddit
Underrated comment here
MiKeMcDnet@reddit
Right of passage for any good sysadmin
Crazy-Rest5026@reddit
Yup. This is a learning lesson and not a your fired mistake.
L2 loops happen, but you should be able to understand why you took down the network and don’t do it again.
Ok-Bill3318@reddit
My fave is the rs232 serial cable into apc ups
jajajajaj@reddit
Not meeeeee . . . and if you could look where I am, I'd say "but look where I am anyway." Don't sweat it. We don't have any bigwigs out here like "I'm just glad we have you, the guy who has gone a statistically improbable length of time without breaking production, so that we have someone to whom we can give this giant Publishers Clearing House prize-style check to."
Efficient-Sir-5040@reddit
Some of us managed to do something similar to this on coax Ethernet (or even non-Ethernet networks) at some point.
erskinetech2@reddit
Once ? .... yeah sure once lol
havpac2@reddit
Heh true. Just last week I took out a an idf I dropped a few vlans by mistake off the trunk including management.
Had to shelped my self to the other building to plug in to fix it. Not too many complaints just 50 tickets .
I also closed 50 tickets that day. Win win. I was super productive.
jhdore@reddit
Oh yeah. More than once. Normally things calm down pretty quick after isolating the loop, unless one happened to have a couple of Dell switches somewhere which then proceed to play merry hell with root bridges, ignore the actual root bridge with a defined priority and thus bend your network over a table to show it an extended good time.
rogueit@reddit
I would trust op more than someone who hasn’t caused a huge outage. As long as he has keeps that fear now.
henk717@reddit
Or something similar. I've not been in roles where I have been directly configuring network equipment much for me its always been the sysadmin side.
But I did reset the local admin pass on the wrong machine before when I got interrupted and tabbed back to the management console of what I thought was the PC of the user I was assisting. It wasn't it was a server we didnt manage ourselves but did monitor for them.
I owned up to my mistake and immediately reported it to the higher level sysadmins since at the time I was a level one. They informed me that resetting a local admin password isn't possible on that server as its the domain controller. I told them the reset did work, so now we had a problem on our hands. I had accidentally reset the domain controllers main admin account that services relied upon.
But thankfully because I owned up to it instantly one of the engineers still had the customers original password due to a recent project he worked on for them. We were able to restore it before it caused any damage. They never found out and my swift honesty got me off the hook with a "Be more careful next time".
Ghetto_Witness@reddit
Shame on them for using the built-in administrator account for any services in the first place.
Appropriate-Border-8@reddit
We have an User OU just for service accounts.
Murky_Bid_8868@reddit
Agree! Been there done it.
TheBestMePlausible@reddit
Once I took out production a huge Silicon Valley powerhouse with a single missing semicolon for 30 seconds before someone smarter up the ladder quickly reverted it. Beyond being informed that that single snafu cost approximately $560,000 in missed income, I faced no repercussions or serious reprisals, and the head of networking didn’t either, other than being informed that is first job asthe new head of networking was separating the internal and external networks lol
Great-University-956@reddit
My first time was with a cisco access point.
mobileaccountuser@reddit
or twice ... experience is the best teacher .. mistakes and all
Aim_Fire_Ready@reddit
Some of us are even willing to admit it! It’s the guys that pretend like their fingerprints aren’t ALL OVER IT that drive me bonkers!
os2mac@reddit
I deleted /etc on a running db server in production on a solaris host.
misplaced space. "rm -rf . /* " .
some ahole had written a script that dumped it's logfiles in /etc I was trying to clean it up.
this is why change management and peer review exist.
tuxedo_jack@reddit
Yup. The big thing is how we learn from it and how we keep from doing it again.
Also - and I know it sounds anathema from me - if it's somebody's first major screw up, give them some grace. If they're doing the same thing for a second time, they really shouldn't be. If they're doing it a third time... well, there won't be a fourth.
unstopablex15@reddit
that's very gracious of you
tuxedo_jack@reddit
It's weird, because I know I've been in that spot.
If safeguards are BLATANTLY ignored, that's one thing. If it's an honest mistake, that's quite another.
unstopablex15@reddit
Agreed 100%
uber-geek@reddit
Things like this is why I implemented a "Read-Only Friday" policy. No major changes to anything the day before the weekend or a holiday. I use my Fridays for documentation, project alignment updates and working on non mission critical things.
PantsOffDanceOff@reddit
It's like a right of passage. Everyone gives you a hard time for causing a loop. But it's something we all have done at least once.
unstopablex15@reddit
I'd think we are using enterprise grade switches here that have STP turned on by default, unless we're talking about consumer grade unmanaged ones.
ImpossibleBend3396@reddit
And it’s almost never incompetence, just silly things. I once cleared the config on port 10 when it should have been 18. Was either the font or my dirty glasses 🤷🏻♂️
This is the ‘experience’ part of your resume. You’ll always double check your work going forward now I promise
i_said_unobjectional@reddit
The default out of box cisco switch configuration used to be vtp master with no vlans. Since the new switches usually had higher mac addresses they won spanning tree and vtp decisions. The guy installing a new switch and knocking down the campus happened so often it was a common joke.
Every-Progress-1117@reddit
It is a rite of passage.
Welcome OP, you are in good company
My first major fu was shutting down the Windows NT 3 primary domain controller for 1500 users...30 mins of downtime while it rebooted. Our Lotus Notes servers had a minor panic as a result....so probably resulted in improved productivity for short while :D
OMGItsCheezWTF@reddit
At my previous employer it was called "earning your graph"
Everyone did it once (and the longer you worked there the more likely it was you had more than one)
Once the incident is resolved and the RCA completed, the ops team would print out the monitoring system graph showing the spike or flatline in whatever metric you destroyed and present it to you. People would have them pinned by their desks etc. It was a rite of passage for every engineer.
This is inevitable, we work with enormously complex systems and weird interactions, you will break something visibly and loudly eventually. The trick is to learn and de-risk next time, and break something else instead.
tudorapo@reddit
I did the exact same thing once! You learn to live with it.
End0rphinJunkie@reddit
Exactly. Anyone who says they havent taken down prod is either lying or just doesn't have enough access yet. You owned it and reverted it, so try not to stress to much.
Still-Professional69@reddit
Every. Single. One. At least once. And if we’re honest, a few more as well.
Chalk it up as a learning experience and don’t do it again.
Ldejavul@reddit
I was once training a new guy on a switch, I remoted into our headquarters switch to show him a more advance config before I wiped this fairly basic switch before moving it in its new home. One write erase reload later I'm training the new guy on solarwinds config restores.
daryld_the_cat@reddit
switchport trunk allowed vlan.... When adding another vlan to a trunk. Guess what word is missing from that statement?
Sunsparc@reddit
Within about 2 weeks of my boss giving me switch access, I accidentally took down the uplink trunk port and had to make a mad dash to the downstream switch to pull the plug on it so it would reset. Only caused about 5 minutes worth of downtime, but I was sweating bullets.
footballheroeater@reddit
Shit your pants, dive into the deep end and swim.
Bradddtheimpaler@reddit
My first job they just let me do whatever I wanted. We didn’t even have a password policy when I started but decision makers were really resistant to things that made their lives even a hair more difficult. I decided I’d gather evidence for enforcing a stronger password policy by testing our users against some common wordlists, to show how quickly they’d be compromised if anyone gained access. My method for doing this was to open RDP up to everyone in the org, and then set hydra to our user list and a wordlist. I caught people immediately. I didn’t realize we had part of a password policy enforced though, and that part was account lockouts, so I locked out everyone in the AD at 10 AM on a Monday, including myself. It was only for fifteen minutes iirc, but yeah, shut the whole office and production floor down.
Legal_Situation@reddit
Absolutely, not to say it's a good thing, but I think it's experiences like this that reinforce good engineering. Regardless of whether it's network, collaboration tools, software dev whatever - it's kind of a good reminder of the effect our work can have sometimes.
Ignoble_Savage@reddit
Yep, it happens to everyone and really one of the defining moments you should learn from it and move on.
Suspect6307@reddit
You've either done this or haven't been in IT long enough to have don't this.
5panks@reddit
I once crashed our entire RDS environment that all of accounting was in actively doing work and it took us two hours to get it up again.
elkab0ng@reddit
There’s a legit level of brag in how big an outage you’ve caused once you’re a couple years in.
iamLisppy@reddit
and if you didn't, you havent been entrusted enough with stuff in PROD.
RudeMathematician42@reddit
Not sure what they will do, but generally.
Everyone fucks up, and generally there should be systems in place to make sure that such fuckups are at least harder to accomplish (Infrastructure as Code, Four Eyes principle, all that good stuff).
In this case, there's a reason we invented Spanning Tree Protocol, and accidentally causing an L2 loop isn't really what you first think of.
Yes you fucked up but also the system allowed you to fuck up. This would be an excellent way to actually profile yourself in some way (Infrastructure as Code, Ansible and Terraform can both talk to at least some iOS devices, and also enabling STP).
Buckcity42@reddit
I wouldn’t be too concerned about it. You learn from your experiences. Everyone has faced similar situations in their careers at some point. If you haven’t made a mistake like this, I’d argue that you’re not truly learning and growing. While such incidents can be humbling, they serve as a valuable reminder to slow down and pay closer attention.
Plastic_Willow734@reddit
You’re not a real sysadmin until you’ve straight up broke shit, earned your wings today
LorektheBear@reddit
Show my a sysadmin who's never caused a downtime, and I'll show you a sysadmin who's never done anything at all.
Master-Guidance-2409@reddit
man, homie is spitting fire.
locke577@reddit
A lot of the guys I work with wear the fact that they never cause any issues as a badge of honor.
They're also the ones who take months to decide to hire a contractor to do whatever they originally said they wanted to do.
And then the contractor fucks up production.
Damet_Dave@reddit
Also learned a few lessons including don’t touch shit on Fridays unless there is already an outage.
Change freeze Fridays is a thing at a lot of companies for a reason.
HoosierExplorer@reddit
"Never reboot a server at 4pm on a Friday."
Proceeds to update all ESXi servers because it's a slow day.
intj-geek@reddit
We call it "Read only Friday".
As you say, it's the real deal.
ComprehensivePilot91@reddit
Read only Fridays are a thing for a reason. I also have come up with Microsoft Minutes as a way to describe making changes in the cloud. People ask why isn’t it showing? Give it Microsoft minute or two. 😉
IlumInatI42@reddit
We call it "Cloud Minute". Funny thing is, for us that means maybe a minute or 2, but in some cases it literally means hours or even days.
TheMysticalDadasoar@reddit
I have recently subscribed to fuck shit up Fridays
And I finish at half 4 so Byeeeeeee
/S
mickert_dev@reddit
Did a f'ck shit up chge just this friday that actually worked out excellent and unf'cked shit! Yolo!
BubbyDaddy43@reddit
Came to say this. Welcome to the club brother👍
segflt@reddit
Yeah and only 30mins and lower impact for first time. Pretty nice. OP has great tingle sense building now
darthenron@reddit
I still remember blowing dust out of mounted switch into a smoke alarm sensor and having an entire building evacuated because of it….
The goal is to learn from mistakes
Adept_Strategy_9545@reddit
May I have your attention please! The signal you have just heard indicates a report of a fire in this building. Please proceed to the nearest exit, and leave the building.
i_said_unobjectional@reddit
Had a system catch on fire in a telco central office. Used a big fan to clear the smoke. Pulled the smoke thru the 5ESS switches. I can't go back to White Plains, nor do I want to.
wallst07@reddit
It's like a SWE who never had a bug in production, not possible.
Except we can't write tests for a lot of things we do.
Jayteezer@reddit
On a friday during a board or c level meeting. If we didnt make mistakes how would we learn from them?
JoeyPhats@reddit
Not just broke shit. But broke shit on a Friday. Been there. 😆
AnotherCableGuy@reddit
You fuck it, you unfuck it.
footballheroeater@reddit
Words to live by.
ButlerKevind@reddit
And on rare occasions, you fucks it again!
rulebreaker@reddit
“Nah, this will be fine” 3 minutes later “Oh, fuck, why haven’t I done this during a proper outage window…”
VeryRealHuman23@reddit
“You know how I got these scars”
Downinahole94@reddit
Dell laptop batteries in the early 00s?
tuxedo_jack@reddit
Or literally any Sony laptop from 1997 onwards. The screws in those little bastards would crack the plastic casing and the shards would fly everywhere.
scriptmonkey420@reddit
Those D600s man.
Szeraax@reddit
blocked DNS requests going to all devices to prevent rogue DNS vuln. Didn't realize that default devices included domain controllers. Took down the entire factory for 3.5 hours till I realized that I caused it and got it fixed up.
Considered not fessing up. Got a written warning for that one. We know the feeling.
meastd_0@reddit
They become fun stories to tell in time over some beers with fresh young colleagues :)
Be honest up front, own it and learn from it.
KC_Buddyl33@reddit
I would say this. You seemed to have already identified the main failure. You performed a change during a critical time of business operation.
The whole point of failing is so we learn. Since you've identified the timing as an issue, fix that going forward. Enact better change control, even if it's just your own, so that you can better iidentify those critical times, and avoid them.
Proud-Ad6709@reddit
That's not a big outage, yes, you made a mistake and if you say what you did is true then you did it with no malace and you fixed it as soon as you could.
Next time I would recommend not doing work like that during business hours. If it can cause that much of an impact of its wrong then do it when no one is around and run multiple test afterwards.
A big outag is writing the wrong config to every switch and them being all over the country....
TexasVulvaAficionado@reddit
Come over to the OT side. It doesn't even count if you don't cause physical damage.
WindsingerEU@reddit
Added vlan with add - check Wrong boot code - check Typo in load balancer, causing l2 loops across 2 DCs - check Ups with regular cable - check No router bgp on a PE instead of a CE - check Pulling the wrong cable during a fibercut in a ring. Caused split brain issues - check. (In my defense, wrong label). Believing a electrician he did his job right - check Driving to the datacenter because I locked myself out - check Trusting IT this small change is fine to push to production on a Friday - check
Someone wrote, a sys/netadmin who does not make mistakes, does not work. And I wholeheartedly agree. In my 40+ years I made A lot of mistakes. Some with overconfidence or trusting others did their job right and some outright mistakes.
We learn from them and tutor others and adjust procedures to prevent them.
It hurts our pride yes but we simply carry on. I feel that esp university people fall the hardest since they are often overconfident in their abilities. It part of the job.
mdkdue@reddit
You are all good mate. Mistakes happen, don’t kick yourself too hard. You understood the problem for a start, which means you are not a paper IT engineer. Learn from it and move on. All the best 👍🏻
exoteror@reddit
We always say at work you need to break things to learn how to do them. We are not always going to be perfect but just always need to try to make sure what you are aware of what you are doing and understand the consequences where you can.
I took out a door access system a few weeks back, someone configured it badly in the first place but still...
taneshoon@reddit
Say what you did, you know what you did, you know not to do it again. Don't lie. You should be good if you check all those boxes.
Lying about it and not learning from it is what gets you fired.
spooninmycrevis@reddit
You'll be laughing about it in a few months. Learn from it and shake it off.
Important_Secret6839@reddit
Congrats you’re now officially a *real* admin!
DistractionHere@reddit
I accidentally power cycled our firewall on a Friday afternoon. Was racking a new Meraki FW and the power cable to the existing one got snagged just enough to cause it to reboot. The thing takes like fifteen minutes to boot, so I was just staring at it and sweating. At least the AC in the server room was good.
No_shot_98@reddit
I once tagged the incorrect vlan and took down our entire network. Shit happens. No sense in beating yourself up over it. If anything this just teaches you to triple check your work from now on. So there's the positive.
Horror-Debt-5290@reddit
Hi Brad!
mylightyear@reddit
I did this and took down a Perth metropolitan ISP for at least 5 mins. One of the larger ones in the 2000 era.
In my defence the test lab didn’t have SPT protection configured, but still, that was a learning experience.
Didn’t get fired, didn’t even get yelled at. My bosses were great.
SoberArchitect99@reddit
Move to a fabric and get rid of STP.
Laroemwen@reddit
How did this cause a loop?
unstopablex15@reddit
i'm curious as well considering STP is on typically be default on most enterprise switches.
lost_signal@reddit
So if you mix Cisco and other gear (Cheap Dell stuff) and run the Cisco PV-RSTP you used to end up with other switches trying to push the same topology to all VLANs so you'd get conflicting root bridges.
https://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/Dell-Networking-and-Cisco-Spanning-Tree-Interoperability.pdf?forcedownload=true
unstopablex15@reddit
But why would you use a cisco protocol if you don't have all cisco gear?
lost_signal@reddit
To be fair force10 and later third-party is figured out how to make it work.
The median person just plugged stuff in configures a couple of He lands at access layer, and calls it a day
unstopablex15@reddit
I hear ya. I guess if it were me, I'd use a protocol like MSTP that would play nice with other gear, especially in an enterprise network.
lost_signal@reddit
That is what we generally did yes
SuccotashOk960@reddit
Curious too. And a phone-pc port sounds like it should be an access port with a voice vlan, not a trunk.
lutiana@reddit
I am too. In Cisco land I'd say you are correct, but on some other brands maybe not. I know for Ubiquity gear you have to configure the port as a trunk and then only allow the data and voice vlan on the port.
unstopablex15@reddit
Last time I used ubiquiti equipment, they didn't even use the terminology 'trunk', I think they refer to it as tagged and untagged vlans on ports.
SuccotashOk960@reddit
OP specified it was a Cisco switch
lutiana@reddit
Oops, I missed that. Thanks for pointing it out.
datagutten@reddit
When configuring an access port with voice vlan I see the same MAC in both vlans which feels wrong, so I prefer to use a trunk, but I keep the voice vlan to get the LLDP tag.
SuccotashOk960@reddit
Don’t you run into other limitations like that? If you’re using trunk ports as edge ports you can’t use things like portfast just to name a thing.
Kronis1@reddit
I haven’t seen a L2 loop cause an outage in almost a decade. There’s just so many things that good spanning-tree configs that utilize to stop it from happening.
wombleh@reddit
I've seen loops when the native VLANs are mismatched as the spanning tree from one VLAN is joined with another.
I seem to remember there was something in particular with some feature in Cisco per-VLAN STP for backward compatibility with the original single instance STP (802.1D), which ran in the native VLAN and meant issues there could impact all VLANs and make them keep re-converging. But this was a few years back now, so that may be nonsense.
HerrBadger@reddit
Hopefully you got a pat on the back from your colleagues.
We’ve all done it, you’ll laugh it off once the dust settles.
It’s the process of earning your stripes - you cause an outage, fix it, then you go on with your life.
Also, the cardinal rule - Don’t make changes on a Friday.
Double-oh-negro@reddit
Whenever you configure a Cisco, set up Reload in 10 and then set a 10 minute timer on your phone. Don't save the config until you know it works. If you lose control then a reload will save you. If no one screams after 10 minutes, go ahead and save.
GhostlyCrowd@reddit
My First was rebooting all of prod instead of dev, 4 node fail over cluster with 75 VM's, Terminal servers for a few hundred users, SQL servers, the CRM everything. Takes a bit to come back up....
You're one of us now, time to get a tattoo.
"I took down Prod on XX/XX/XX"
Mountain-Cheez-DewIt@reddit
I can jump into the mix of someone who has done this before as well. It wasn't due to mislabeling, but rather unawareness of some cabling that a vendor had done. Rather than female terminations, they had done male terminations along the inside of cabinets. At the time, I thought the cable going into the dummy switch (also part of the issue) was just feeding to the countertop. I added a second cable back to the same uplink switch.
Interface read STP blocking, but I couldn't figure out why. I disabled this to see what would happen, caused a loop, and knocked out the network. They blamed me for it, but nothing really came of it. It's a shitty company to begin with, so I don't take it personally to my own skill. In the end, we all learned something that day.
redstonefreak589@reddit
Once accidentally removed all permissions from a system user in Cisco UCM and took down the phone system of the Fortune 300 I worked at. A 5 second goof resulted in several days of fixing issues. It happens, just gotta learn from it and improve moving forward! :)
Daphoid@reddit
Take a breath. You're fine. You kept your boss in the loop, he got you sorted and sounds like he didn't blame you; and service was restored. Years later you'll chuckle maybe.
Honestly I ask about times like this in interviews and I find it odd if you don't have a story or two because that means you weren't trusted with systems enough to fail. We've all done it. You now know to be more careful, ask questions if you're unsure with your planning, execution, or otherwise. Perhaps updated some documentation or had to show the team and do a root cause analysis.
You're doing fine.
Saditface@reddit
Shit happens. Learn. Move forward. if you weren't taking it seriously you wouldn't be on here having a confesssion. Ten Hail Ciscos and an Act of Spanning Tree Contrition. You’re not an idiot, you just got your first official L2 scar.
Belmodelo@reddit
OP there will be a time when you're training the new guy and telling him this story , Good times
sysadminmakesmecry@reddit
Ya, if you havent caused an outage, do you even really work in IT?
Acknowledge the mistake, explain how you'll do better going forward and just continue. You made a mistake, and thats okay.
axisblasts@reddit
Welcome to the club. Learn from it. Check things twice in the future.
I took out email and a few databases for 8 hours my first week at this company. That was 7 years and several promotions ago.
Let it knock your confidence and slow you down to double check things and verify. Then it doesn't happen again.
Minor issue. All I ask people is to be honest when this stuff happens. Fix it or learn from it. And everything is fine. Honest mistake. Even if it was larger it was a mistake. If someone tries to hide it or lie however. That's a personality trait and much harder to recover from.
lost_signal@reddit
I've crashed 911, shrank a LUN on a file server, took out the camera network for a port, and locked myself out of a layer 3 switch and required someone go do a power pull to trigger a reboot of shame.
My boss was in a meeting with other high ranking members of the company. He knew what it was pretty quick- an L2 Loop
hold up, Wait, you guys don't run spanning tree (MST/PVRSTP?) for access switches?
Public-Actuary2437@reddit
What I tell me team is: We are the ones who press the scary buttons, and make the scary changes. If someone is going to take things down by accident, it's going to be us. It's only considered a failure, if we didn't learn from it.
If you're uncomfortable with bad outcomes, that's a good thing. That'll drive your attention to quality - but nothing will make you perfect, or free from the human condition. Be humble, and give yourself some grace. You're human.
EmployableWill@reddit
I feel like everyone has done something similar at some point. I accidentally restarted the server at our largest client one time and caused a several hour long outage
Nevertheless, they are still in business and no one died
lucky644@reddit
Your only real mistake was doing it on a Friday.
Respect ROF (Read only Fridays) going forward.
golden_tix@reddit
I labeled Microsoft edge as a virus and quarantined the browser on 3000 endpoints… and had to unquarantine manually
Ninesunz@reddit
Been there done that. It's the best teaching experience honestly. Now you know why there's change controls and reviews. It's not to scrutinize methodology, its cya - you can't be blamed if it was approved.
AfterCockroach7804@reddit
You are now a network engineer. If you haven’t brought down at least one critical system are you really in IT?
ethanjscott@reddit
That’s not even bad
Proof-Variation7005@reddit
i definitely expected worse
techierealtor@reddit
Yeah I was expecting config nuke on a main trunk at the core or something. Network loops happen. Take your wrist slap and move on.
GallowWho@reddit
Config backups save lives (and careers)!
Relevant_Fly_4807@reddit
30 minutes of downtime is nothing!
Aim_Fire_Ready@reddit
30 minutes downtime? That’s a normal Tuesday in some envs!
SpaceChimps98@reddit
Seriously. I've taken the network down at multiple sites for several hours before. This seems very minor.
alwayz@reddit
Network schemtwork. You're not really in trouble unless you lose data.
SpaceChimps98@reddit
Can't lose data if you can't enter data.
avlas@reddit
30 mins downtime, no data loss, no privacy breach?
That's when you shout "YEA SORRY COFFEES ON ME TODAY" and you're good
DonStimpo@reddit
Yeah this is small fry outage. 30 minute outage is nothing
Sroni4967@reddit
the fact that your boss immediately knew it was an L2 loop tells you he's probably caused one himself at some point. honestly spanning-tree portfast + bpduguard on access ports would've caught this before it spiraled - might be worth suggesting that as a takeaway so something good comes out of it. 30 min downtime sucks but you learned what an L2 loop looks like in production and you won't forget it
CP_Money@reddit
Makes me wonder what vendor switch it is because it’s not Cisco with that syntax
Sliverdraconis@reddit
What you talking about? Cisco ios syntax is literally
Spanning-tree portfast
Spanning-tree bpduguard enable
CP_Money@reddit
Dude I meant the phone-pc syntax chill
Sliverdraconis@reddit
Lol, i thought you were talking about the spanning tree commands lol
CP_Money@reddit
Yeah in hindsight I see how that made me look like a moron LOL
GallowWho@reddit
I assume it was the name of the interface assigned to the port.
Phreakiture@reddit
What isn't?
This is off of one of my Cisco switches at home, though I am not using BPDUguard here. We do use it at work.
zephead98@reddit
IT honestly makes me wonder why Cisco doesn't default to bpduguard when you enable portfast.
Ace417@reddit
If you do “spanning-tree portfast default” and “spanning-tree bpduguard default” you don’t ever have to worry about it
DoctorOctagonapus@reddit
I caused an L2 loop once when I had to replace an uplink cable. I plugged in the new one before unplugging the old.
CeC-P@reddit
Blame everyone else for the bad design. I actually know nothing about network design so no idea if it was bad design but it sounds like it.
Btw I accidentally deleted the entire voice VLAN for my office but restored it from a screenshot I took ahead of time as a paranoid backup. In my defense, Ruckus' old web UI is the most misleading and mislabeled thing I have ever seen.
midy-dk@reddit
Just learn from it. I once shut the wrong port and cut off an entire site from the datacenter 😅
Throwaway555666765@reddit
This is why I believe in some level of automation workflow for network changes, if you have the resources to build it. At the very least, it would be nice to have something to build and push your config that shows you what the diff will be, so you can catch the port mismatch more quickly.
rrmcco04@reddit
Best thing to do with it is to own the mistake, take away what you learned, and move on.
We've all broken prod at least once (or a dozen times... Or more) because that's frankly payst of the job. If you don't have a true test environment(no one does), you're going to do that.
It's a bad way to start the weekend because you might be kicking yourself, but any IT supervisors worth their anything will be like, "well, that was interesting. We all learned here, have a good weekend"
GamerLymx@reddit
that reminds me whb i caused an outage by typing 'ip access-group ACL' instead of 'ip access-list ACL' ... fun times
Opposite-Optimal@reddit
Worked in IT for more than 5 minutes... Yeah it happens. Once took out a companies pbx system as a loose cable came out with me while I was working on a patch panel.
A lot of unhappy network engineers came running to the DC to figure where it needed to go.
The way I saw it . I found a single point a failure for the company 😂
arracheur2homme@reddit
Je suis apprenti en infrastructures en 3e année, il me reste 1 ans.
Récemment j'ai réinitialisé le port d'arrivée fibre du routeur en étant stressé, si tu te sens bête, pense a moi....
machacker89@reddit
Own up to it and move on. Learn from your mistakes
Fatality@reddit
Why wasn't there protection preventing a loop?
It's never the fault of the person impemeting it's the fault of the process that let it happen.
BBO1007@reddit
30minutes? Theme rookie numbers.
My first time I deleted a user database trying to delete one user.
If you don’t already have the three envelopes, blame it on the sales guy.
https://youtu.be/uRGljemfwUE?si=GzFm8uPUaUjhBne9
Cb7_@reddit
Salesforce.com was on the right testicle.
Aildrik@reddit
To quote Grand Admiral Thrawn, "Anyone can make an error. But that error doesn't become a mistake until you refuse to correct it."
I think basically every profession is going to involve making mistakes as you learn. The key is to use those as valuable learning opportunities for growth. Explain to your boss the mistake and how you plan on avoiding it in the future.
Imagine being a plumber; at least you didn't flood someone's house!
schnityzy393@reddit
We've all done it, but also, no change Friday bit you in the ass. Double lesson!
Cb7_@reddit
Surely Fri is a good day to fuck up? Coz then u can spend all weekend fixing it without impacting too many users. Assuming standard office working hours of course.
AttemptingToGeek@reddit
Monday no one will remember.
Normal_Story1153@reddit
You dont learn if you dont fuck up. Dont get discouraged shit happens. We are humans we do make errors.
earthly_marsian@reddit
Here is how to approach it. Own your mistake, make a plan to check thrice and cut once when doing changes.
Changes should have happen after working hours, not during.
Like others have said, we all have done it at different levels.
Spiritual_Cycle_3263@reddit
Don't worry. Someone at a company I worked for deleted DNS records for a 20,000 employee company.
One thing I teach everyone is, take your hand off the mouse, read the prompt first, then take action. I've been known to click too fast myself.
I also blame UI for not being consistent with where Ok and Cancel are located. Also big changes should always require a confirmation dialogue box with a 3 second delay before button becomes active.
jamesaepp@reddit
make your changes, whatever they are. if you lock out, wait for the 5-minute timer to expire. if you're happy, run:
if you want to back out early:
Ace417@reddit
I add the “write memory” under the archive to keep easily comparable times when changes were made and saved
vareng@reddit
Reset timers have saved my ass more than once.
Accidentally set an ACL inbound instead of outbound on a router on another continent once… that would have been fun to explain to the team.
n0cn0c@reddit
I'm sure somebody else said this, but trusting the /label/ (description) on a port is a mistake. You need to inspect the actual configuration of the port and compare it against your understanding of what is appropriate. For example - sho run int gi2/0/48
Connect-Ad6135@reddit
If it makes you feel any better, my first screw up in IT ended up deleting the entire company database. This was back in the days of SQL 6.. restore took almost a full day and since it was a financial firm, cost them a bunch of money.. still in IT almost 30 years later. It happens to all of us at some point and we learn and move on.
oelcric@reddit
One foot in front of the other brother. It happens to the best of us, my bosses always told me expect a fuck up and plan for a recovery
Link_Tesla_6231@reddit
Own up! We’ve all done it I have too! Ran pass the boss to fix it ignoring him, after fixing it I went straight back to him to own up!
whiskeytab@reddit
here are the biggest things:
1) own the mistake
2) learn from the mistake
3) don't beat yourself up
yeah you made a mistake and it caused a problem, we've all been there. no one was hurt and nothing was permanently destroyed so go easy on yourself.
Individual_Fun8263@reddit
steveamsp@reddit
This needs to be up at the top of the list of replies.
Did OP screw up? Seems like it. That's fine, everyone makes mistakes. Next steps are: Fix it (already done). Figure out how it got broke in the first place (also appears to be understood). Move on and don't make that same mistake again.
Daneel_@reddit
Agreed, this should be the most upvoted. Owning up to it and being honest is the most critical thing. It shows you have integrity. People will respect you for it and usually just move on to getting it fixed. I've seen tiny issues become firable because the person simply wouldn't own up to it, even when presented with evidence that it was them consciously doing something.
We've all been there - brought down something in prod - but how to respond is what counts. Don't beat yourself up. No one died. It's fixable. Chin up and carry on, now with more wisdom.
lotekjunky@reddit
as long as you didn't need a change window, don't sweat it. In my environment, that stuff has to be done in the evening. If someone implemented an undocumented change, they would get out in a performance plan. You didn't mention any of that stuff, so you're fine... but since you are "new", figured I'd mention it. Also, if you were on my team, we'd prolly call you "Mr loops" or something for a few days lol
the_good_hodgkins@reddit
Every sysadmin/network admin has fucked up at one point. I had a manager ask me once how to keep this from happening again. My reply was simple. Don't let humans log in. He was not amused. IDGAF.
peterox@reddit
You were down for only 30 minutes? Rookie numbers my friend. Wait til you're in consistently for several hours going into the a day just to realize the time was off by 10 minutes on the mail server. Learn from it, move on, rinse and repeat. 😂😂😂
bitSwitcher@reddit
My first mentor in sysadmin told me, “ there are two types of admins, ones who have taken down Production, and liars.”
acquiesce88@reddit
No one seems to be saying this, but how about some change control, backout plan, and not making changes in the production environment during production hours.
LowMight3045@reddit
Had to scroll to make sure this comment was here about change control then. If your org doesn’t have change control then what u did was an acceptable risk . As others have written, people make mistakes. We try to minimize them and you are to commended for resolving this issue so fast .
TheRealJackOfSpades@reddit
It's a rite of passage. First you own up to what you did. Don't try to hide it.
Everyone's done it. I crashed the whole building by tripping over a power cord. Destroyed the firewall by typing
rm -r /*instead ofrm -r ./*. We survived both, and no one ever mentioned it to me again.sprocket90@reddit
I know a dude that plugged a cable into 2 different ports of the same switch, took out a whole floor of a hospital wing
For about 4 hours they took his closet key from him. Demoted to help desk.
Good luck
Nighteyesv@reddit
If they are having big meetings on a Friday afternoon that’s bad, is the company struggling? As for how you move forward, shit happens, learn from the mistake and get in the habit of double checking your changes before implementing them. 30 minutes is nothing, had one change that took us down for over a day, didn’t get any sleep while rotating through a dozen Microsoft engineers on an emergency call to fix it. Then once that was fixed my sleep deprived brain thought it would be a great idea to implement the second change I’d planned only for that one to go wrong as well. Didn’t sleep for nearly 3 days trying to fix them both and had a lot of angry management asking for explanations.
splatm15@reddit
We have all caused broadcast storms.
It makes you more careful and planned.
Thin_Weekend9564@reddit
One time i accidently plugged out a network cable from our server rack while pulling other cables when i was new. It was an network cable to the Main Scada PLC.
Apparently i halted 7 Production lines.
Downtime cost was around 200.000€
AFlyingGideon@reddit
Shouldn't an L2 loop be detected/blocked by STP?
Orashgle@reddit
If you still have a job it wasn't that bad
macgruff@reddit
Everyone who’s been around the block has a story like this. Thirty minutes? Bah… not that big a deal. The important thing is that you learned from it right? Don’t ever do that again, and what ever next thing you do, be more careful.
* Measure twice, cut once.
Zeldamike@reddit
Anyone who's been in this industry for very long has done this. Most of us more than once. You apologize show that you know better going forward and take the lesson to heart. It will be ok and you'll be better from learning a hard lesson.
Dry_Conversation571@reddit
This is where all of us share a story about how we fucked up and caused a major outage and some of our teammates chipped in to help out and then afterwards simultaneously gave us shit for our mistake and shared their story of when they fucked up.
It’s the circle of life.
poizone68@reddit
Just write up the Root Cause Analysis as you would with any outage, and make suggestions for how to improve the process.
pjustmd@reddit
You’re not an IT unless you’ve taken down production at least once. Welcome to the club. You move on by moving on you learned your lesson now you know what not to do next time.
mattyyg@reddit
I deleted the voice vlan for the whole company once. We had no phones for about 20 mins, my director calmly helped me, and all was good. We all pull some dumb shit sometimes. All of us. It happens.
CaptainZhon@reddit
Won’t be the last time you bring down production
kerrwashere@reddit
I did this for a client once because i enabled trunking on two of the wrong ports as I wrote them down incorrectly and blocked all traffic to an entire rack in the middle of the day. Changing it back took 2 min. Finding out about it took an hour.
It didnt take the entire org down just half the room, you are fine mistakes happen.
Alternative_Run643@reddit
This is an excellent opportunity to learn the company culture. You see the real faces of people when you make a mistake. Remember this moment.
bacvain@reddit
My man if you don't make mistakes you'll never be a good senior administrator. Trust me I've taken down an entire pharmaceutical line because my Cisco configs caused it even though I checked and double checked. Took blame resolved it quickly and figured it out and made the proper move afterward. I'm good at what I do because of the fuck ups of the past and learned from them. None of the amazing techs I know reached this point of their careers without dropping the ball a couple of times. We laugh about now over a couple of drinks 😂
What this does and what you need to take away from it is due diligence and testing in Dev environment before deploying. It makes you question yourself? Good let it do that. You'll be careful and focused even more and next time you'll take more precautions in whatever you're doing. 🫡
JohnnyFnG@reddit
Take it as an opportunity to say, “yes, I know exactly what caused this, I now know exactly the mistake I made, and I’ll ensure to be more careful. Thank you for your understanding and support.”
Acceptance, growth, and determination. Leadership should understand. It’s not like you did it intentionally. Expected Downtime is fine; unexpected downtime is a hit-or-miss experience. Hopefully they’re cool about it, don’t beat yourself up over it. If they are hypocritical or giving you grief, use that to motivate yourself to do better, not that you’re not good enough. You have this job because someone believes in you, now you just need to believe in yourself.
TLDR: prove to them you made a mistake, you’ll be more careful, and try not to do it again. There is no such thing as 100% uptime.
brokensyntax@reddit
If you haven't laid low a fortune 500 or two with a misconfiguration, are you really even an engineer? 😅😅😅.
Honestly though, it happens, own it, think of ways the process or details could be better documented/communicated.
Present those in the inevitable post mortem.
The learning is the part that makes you more valuable than agentic, or just leaving directions with an intern.
EmotionalVegetable48@reddit
Try defragging the corporate Exchange mailbox store and confusing RDP disconnect with logging off. Half way through.
Zapped everyone’s mail. Restored dial time service shortly after but all prod mail took a while.
And I learned something that day too haha
Merdrak@reddit
Shit happens, easily reversed and it wasn't intentional.
Best way to move forward is to slow down and be methodical about what you do. Sure, it takes a few minutes to double check that you're in the right switch, right port, etc. - but that is the key. Everyone wants it instantly, but cover your own ass and be slow. Fast is fine, accuracy is final. Be slow in a hurry.
I've done similar and it's all about owning up to it, and creating a check-balance to avoid issues going forward.
Disastrous-Fun-2414@reddit
Sounds like you need to setup spanning tree and prevent this from ever happening again
Striker2477@reddit
Shit happens bro. In reality you got it fixed… also… it was a Friday. I’m willing to bet most people are definitely not being productive for 30+ minutes on a Friday regardless.
Couldn’t have been a better day, apart from a holiday.
Savings_Art5944@reddit
That's how you learn. Hopefully, your boss blames the switch if anyone asks.
Next_Impression3901@reddit
Advice to, instead of configure, do a rollback...
goingslowfast@reddit
Welcome to the club. Getting a bit gun shy isn’t a bad thing, learning to think twice and read what you’re about to commit is a good habit.
Ask your boss if you can help write the RFO/RCA. Doing that will help you better understand risks and mitigations.
Once you’ve been around the block a few times you’ll find that adopting jedi master like zen in the face of an outage is the quickest way to get back up and running safely.
tmstksbk@reddit
Stuff happens.
File it in the "don't do that again" box and move on.
Japjer@reddit
You don't do it again
You aren't the first person to do this.
BillsBells65@reddit
It happens to the best of us. Take ownership of your mistakes and learn from them.
jerepjohnson@reddit
Welcome to the club. Learn and don't make the same mistake twice.
imbannedanyway69@reddit
You've only done this once? Ha. I couldn't count how many times I've done this.
Learn from it. Come up with mitigating strategies to ensure it doesn't happen again whether it be documentation, procedures to revert the last changes, backups etc.
kayelex@reddit
Learn from it. One of the questions I ask on an interview is to describe what happened during an outage they caused and what they learned from it.
shitty_mcfucklestick@reddit
Best thing you can do from a situation like this is own it, learn from it, and then come up with a plan for how to make sure it will never happen again. For example, what could have prevented this in the first place? Better systems documentation? More education about how the switch works? Labels on cables? Etc. Spearhead that learning and putting more safeguards in place. To me, that’s how to take responsibility for a mistake like that, show you learned from it, and improve things for everybody with future prevention.
xangbar@reddit
I accidentally overrode a firewall config adding it to our Fortimanager once. Took down a client but they were appreciative we put in time to fix the issue and get them back up and running. It sucks but we've all been there. Don't let it ruin your confidence and view it as a learning moment.
varinator@reddit
Very good! You now have learned to not do this again, you will always remember it when doing similar things, you will be able to spot it if someone else does the same, you gained experience and you can add more value to the businessthat employs you.
igiveupmakinganame@reddit
30 mins is nothing
Ill-Sentence7346@reddit
Todo el mundo la caga alguna vez y si los jefes son serios lo entienden porque ellos también la cagan. Otra cosa es que la liaras cada semana.
FluffyMumbles@reddit
Don't let it knock your confidence. it's not breaking stuff that's the problem (unless it's silly stuff, ALL the time), it's breaking stuff then not realising or owning up to it.
Sounds like you had a handle on it the entire time. You clearly have the right mindset. You'll break something again. You'll work through it, and be better for it.
I'd been in the industry for 20 years when I stupidly shutdown our exchange server (instead of logging out while remoted in), last thing on a Friday, then went home. That was a memorable weekend! CEOs screaming at me because their emails weren't working.
I also felt stupid. I never did it again.
mickert_dev@reddit
It's all about perspective, it could have been much worse. 😉
https://www.datacenterdynamics.com/en/news/northc-data-center-outside-amsterdam-suffers-fire/
kiddj1@reddit
Wait till you take a platform down that causes loss in revenue and your company gives out credits
It happens to us all.. we're changing the engine on a moving car
IndependentBat8365@reddit
A buddy of mine told me this story of when he was a new sysadmin pup. He worked for a loan processor that handled school loans for a set of universities.
He caused a $10 million dollar outage. In that $10 million worth of loans were lost. At least electronically. He had to enlist help to manually scan them back into the system. That’s not calculating the lost time, delay in loan processing, recruiting additional help, etc.
He thought he was going to be fired for sure.
His boss said, “what? Fire you? You’re my most expensive employee! You’ve just had $10 million in training, and now you’ll NEVER make that type of mistake again! I can’t afford to replace experience like that!”
My advice: as long as you learn from this and can explain what and how you’re going to mitigate this risk going forward, I wouldn’t sweat about it.
mickert_dev@reddit
Bridged some VLANs at a a customer site once by plugging in a non vlan aware dumb L2 switch into an edge port. It brought a great part of the LAN to a halt because of a loop. Their network contracter was pissed. My reaction was: you have some allowed VLANs configuration in your future. Not feeling guilty as i was wearing my DEV hat that day.
In your case the network admin has some RSTP / BPDU guarding config in their future. This is not really your fault.
But use something like 'configure terminal revert time 1' + 'configure confirm' / 'configure revert now' in the future. 😉
Designer_Solid4271@reddit
Get back to me when you’ve deleted about 70GB of customer data accidentally.
Or somehow failed to run a correct step in a process that resulted in $60k in lost revenue. I know that’s not much by business standards. But it does get management attention.
Yes. We’ve all messed up. The important part is what did you learn from it. A good management structure understands that.
diwhychuck@reddit
Man why I like to participate in read only Friday.
But always verify before you smash that enter key.
I like to use vs code as a fancy note pad. Put all my commands so I can proof them.
drazydababy@reddit
Man in terms of a mistake and the impact this ain't too bad. Pick yourself up and carry on. Youre still a great employee who does great work. Just so happened a great employee made a minor mistake. No big deal. Carry on as the great employee you are.
AdministrativeMud238@reddit
I went to add a clan to a trunk port. Saved it, poof went my wifi. Thought a reboot would bring back the boot config, corrupted.
BuffaloRedshark@reddit
always save the running config to a backup before making changes
SOHC427@reddit
If you’re not making any mistakes, you’re not actually doing anything. We all make mistakes. Just suck it up, admit & be honest about what happened and learn from it. If you’re fired for that, it wasn’t a good place to work. Trust me. I’ve done and survived MUCH worse.
tmarkley91@reddit
Accept your mistake, learn from it and move forward
neoncracker@reddit
Been there done that. If they give you a pass just take it as a hard lesson.
MangoEven8066@reddit
Dont worry about it. Like others have said, no big deal. And it was only 30min!! Thats nothing ;)
Ok_Prune_1731@reddit
Please do not let corporate make you seem like what they have going on is oh so important. Its not.
You will be fine.
Inevitable-Star2362@reddit
Trust me it could of been far worse. Count yourself lucky mark it as a lesson and move on.
bendsley@reddit
You take your licks and keep moving forward. Yeah, you may be reprimanded or up to and including being fired. It's happened to most/all of us.
That said, you should lay out what type of standard and routine changes are allowed during maintenance vs. during the day. Make sure your boss approves it. This way, IF this were to happen again and this was set as a normal, routine change for during the day, then you both signed off on it and it was a mistake.
You might also label the port better in the config so you know not to touch going forward.
Impossible_Egg_1691@reddit
It's a right of passage and something you'll be joking about and consoling the next guy who does a major F'up this time next year. It's called experience. The important take away here is you identified the problem, fessed up to your mistake and fixed the issue in only 30mins! That's the sign of a good Technician and somebody you'll want to on their team.
Ok_Business5507@reddit
Live and learn. Always fess up to a mistake. Learn from it. Understand what lead to it and you’ll surely not likely make it again.
twolfhawk@reddit
You own up to the mistake, document the root cause and a method for preventing in the future.
Life moves on. This is like a clerical error in our world, something got filed wrong and it caused a delay. Don't beat yourself up. We all make mistakes. I have a habit of asking my new engineers "what's the biggest oopsie you ever did in IT" and then "what did you learn from it?"
That's my litmus test. When I was doing interviews I'd ask that. Anyone that said they never made mistakes was a red flag.
pondo_sinatra@reddit
I stopped the worldwide production of the largest soda company in the world for about 6 hours to the tune of about $9m (or maybe I have the numbers backwards at the point) and I’ve survived for nearly 30 years (now a CIO).
You have completed your rite of passage. Please accept this homemade cat5 cable loop as a crown.
Shibizsjah@reddit
lmso. This shall to pass. We have all done some fups.
Tscotty223@reddit
As everyone is saying just own up to it as we’ve all made mistakes and you just have to move on. Admitting to doing it is one of the best things you can do both for the company and your sanity. Thank you for sharing. My first one these type of things was a long time ago when I moved the Novell public directory to a different location by mistake. This was a Novell network and because of that I took the whole hospital down at a time when everyone was coming into work. I stay there for 8 years after doing that. You’ll be okay.
anothernerd@reddit
30 minutes no worries.
muchoshuevonasos@reddit
I once replaced a switch after end of day, thought all was good. Turns out I had misconfigured DHCP snooping on the replacement. Brought down a testing center for a couple of hours.
My boss figured it out and said "Yep, that'll do it. Ask me how I know."
docNNST@reddit
I did that with the trunk allowed vlan command. Instead of add. Then I learned about staging a reboot of the switch before I change anything that could break connectivity.
budlight2k@reddit
Been there done exactly that. Being the company got rid of the network engineer I wasn't one. Plus all the switches where old. it took all day and I had no idea what happened, I just removed the cable so it was out of the way and did the same thing again later.
Everyone has to have the big one, it's a right of passage.
Velik0ff69@reddit
Do not put yourself down, it happens to all at some point and I see way worse outages from time to time. If you have learned from the mistake, that is what matters the most.
nateshull@reddit
All you can do is learn, I honestly consider it a small outage. Biggest one I have dealt with. Not caused by me, had to deal with the aftermath.. imagine attempting to make a change to 2 mirrored sans, to make them stripped for the companies main exchange server. Losing the data then finding out the latest backup was a month ago.. yep email server down for several days. I had to write several scripts to capture everyone's ost file on their laptops and desktops before it was overwritten. Then we could convert the ost to a post and rehydrate the data.
SupraCollider@reddit
You just do. Own it, accept that you aren’t perfect, don’t bother yourself with convincing the people that you work for that you don’t have flaws but rather that you have the integrity to acknowledge and correct a mistake. Be accountable.
This is a lesson in change management. Something like that should have never been scheduled for that time if there was something critical going on. It’s a procedural error that is an org issue and not a first year employee issue
v-irtual@reddit
I remember driving to the datacenter, plugging in an SFP that I was going to configure later into my UCS fabric interconnects, driving back to my office, realized all connectivity was lost, and drove right back.
Somehow that link got configured as a trunk with all VLANs on it....
BroccoliOscar@reddit
I had a coworker once blackhole google on our firewalls and it was inaccessible for like three days until we figured it out. It happens. Laugh, make note, and move on.
justaguyonthebus@reddit
Now you are real system admin. Own it, learn from it, grow from it.
SpaceGuy1968@reddit
I've rebooted an app server the first real Netadmin job I had..... You could hear the low rumbles grow into howls from the cubicle farm......
My boss and I went out (he was a manager of IT )
He said "you did this" ? I admitted yes "I screwed up" ....
Don't do that again.....he also said something like......I've done worse and you will do worse in your career..... always admit the mistake or take ownership of mistakes because it makes solving the problems faster.....
johnsmithdoe15@reddit
well you it guys will continue to think you are network engineers
frizzer69@reddit
Own it, apologise, learn from it, don't make it habit. These are maybe learning experiences, I'm 55, have been in IT for my entire career and I still vividly recall that stomach churning feeling of hitting enter on something just as I realise what the result will be. I made some calls and then pulled an all nighter fixing it. Lesson learnt. Don't delete an OU full of computer objects with tens of GPOs linked. This was during my first Windows 2000 rollout, moving from NetWare NDS to Windows 2000 AD. The friends I made in those early days i still play poker with today, 30 years later, and they still bring it up. We all have a good laugh, because we've all been there and know exactly how it feels.
DaprasDaMonk@reddit
These configurations can't be done after hours??? If you make a mistake it won't cause too much of a circus
2nwsrdr@reddit
In this case he’ll receive the calls on Monday.
afarmer2005@reddit
The best learning experiences in IT come from events like this - the key is to learn from it
A book or an online course will never teach you things like this
BadadvicefromIT@reddit
3:30pm on a Friday starts “don’t break shit o’clock” for this specific reason. Unless something is a mission critical issue we aren’t touching it till Monday.
AhYesTheSoldier@reddit
Reminds me of my first f**k up at my current place. And the positive comments. Hope you're feeling better man
VNJCinPA@reddit
Learn this from that experience:
Good luck
HTG-UK@reddit
Best thing to do is own up. Learn from it.
You are not a network engineer until you have taken the network down.
Second of all, the network must have some fundamental design issue to have permitted the l2 loop.
I have lost count how many times I have taken networks down in my various roles. but over time you learn and get more confident in how to fix them
AverageMuggle99@reddit
This is par for the course. Everyone’s done stuff like this. It’s how we learn.
zekerman50@reddit
I was a coop. My boss bought in a friend's PC. C: OS drive. D:accounting data. Wipe C: upgrade Windows from 3 to 95. Leave accounting data alone. Sat down, went to a command prompt, and for some reason: format D:. So I reboot the machine not knowing I wiped the accounting data. Machine reboots into Windows 3 again. I'm wondering what I did wrong. I'm thinking I must need to do a low level format and wipe the MBR. I'm a coop. I barely know what the heck I'm doing. Do the same damn thing again, completely wiping the accounting data. That's when I realized my mistake. I was positive that was it for me. I look at one of the guys at the bench with me, and I said "I screwed up, I formatted the data drive." He said "You better tell the boss." Just then, my boss, the department director, walks in to check on me. Boss: "How's it going?" Me: "I formatted the wrong drive." Boss: "No you didn’t. " Me: "Yes I did." Boss: "No, you didn't. " Me: "Yes. I did." Boss: "Oh, that's not good." He turns and walks out of the room. What do I do in the meantime? What does anyone do? You walk around telling people of your mistake hoping for comfort and support. I walk around telling the rest of the senior techs what I did. I'm new to all of this. I'm waiting for them to commiserate with me and console me. To a man, they all repeat what the boss said: "Yeah, uhhh, that's not good." Boss walks in. "You're lucky. It's OK. He's got a backup of the accounting data." I didn't know what to do. I thought my life in IT was over before it started. He said: "Finish the machine." Then, for good measure: "Format the right drive this time." My name's Jim. Why does this matter? Wait for it.....I sit down, type format c:. Command not found. Really? I wish. It would have saved me some angst and embarrassment. Same thing again. Dammit! The guy I had initially told I had made the mistake is laughing hysterically. "Hey dumbass, you going to jim that drive?" He had renamed format.com to jim.com on every bench machine. For weeks after, everyone got big laughs beating "You gonna jim that drive?" or even better "Make sure you are "jimming" the right drive before you start. Every one of us has made the mistake that makes you sick to your stomach when you hit enter. I'm the director of that same department now. I'll tell you what. You'll never make that mistake again. That's why they're called life lessons. You don't learn shit when things work, you learn when things are broke, whether you broke them or the system broke them or someone else broke them. One day, and probably many many days before you retire, you'll be a tech hero. You'll get things to work that no one else in your department can. You'll be the go to guy or gal. When that happens, remember this day and show the person who brought the system down some grace. Jim
rire0001@reddit
I've been in IT since 1977, almost 50 years, and Ive never made a single mistake. Not once. Oh, I probably came close before, but nothing that a little professional caution and reams of documentation, coupled with multi-level peer reviews couldn't prevent. Follow your test plan in the SDLC, and you'll be perfect every time. That's what it's all there for, after all. Now if you'll excuse me, I'm due for my ivermectin booster
its_FORTY@reddit
This is just part of growing into your trade and is actually a very necessary part of that process. I’ve been doing this for just at 25 years now without ever being given a negative performance review—and I assure you I have made FAR worse blunders. Including one that impacted almost the entire Budweiser brand for 4-6 hours. Marketing, distribution, wholesalers, you name it. The only thing I didn’t take down was the actual brewery itself.
Just always try to take something positive forward from from your mistakes, and most importantly, make whatever mental or process adjustments are needed to avoid repeating the same mistakes multiple times. You’ll be just fine.
person_8958@reddit
I mean... not to put too fine a point on it, but you're 1 year on this job and a few years out of uni. This is honestly how things are supposed to be going at this stage. Don't beat yourself up. Just learn from it and move on.
Zerguu@reddit
How do you even get L2 loop with STP (I assume)?
TheGenericUser0815@reddit
You're not a real admin until you brought down production.
No-Temphex@reddit
At least your first outage took skill. Was at a doctor's office in the middle of the day to connect something in the closet. The outlet the power went to for the server was one that had been over used so it was a loose connection.
I was new, didn't notice and stepped wrong. Made my changes and never notice. Left and got a frantic call while I was driving to my next location. Went back and found the chord had completely fell out of the plug.
Good times... Lost of laughter when I got back to the office.
GhostandVodka@reddit
I'm confused on how this caused a loop. The loops already had to exist.
cosacee88@reddit
Just a mistake, not the end of the world 30 mins is nothing and wont be remmembered next month maybe week... move on lesson learnt
braytag@reddit
How do you think your boss knew right away?
Happened before, will happen again.
DDS-PBS@reddit
Own it. Learn from it. Never repeat it. Move on from it.
First-Structure-2407@reddit
I knackered a fibre connection once, took Virgin well over a week to come and re-splice it.
Refused to accept that I was responsible.
The problem went away.
Cool IT’ing dude .
swedish_bear12@reddit
We have all done a mistake at some time or another. Don't take it too hard.
I once introduced a configuration change to our SSH configs and didn't take into consideration that our old CentOS servers couldn't understand that config.
To make things worse those old servers were added way back and it took some time for anyone to figure out the superuser password so i could log into console and revert the changes.
So our developers couldn't SSH to the servers and neither could our jenkins pipelines.
No downtime for our end users but if something had crashed we wouldn't had any means to troubleshoot and fix it.
Took about 3-4 hours to go through all servers and revert the changes.
90Carat@reddit
Learn from your mistakes. Learn how you dealt with it. Learn how to not make the same mistake twice.
We've ALL fucked up. Learn and move forward.
Jot down notes. I like to ask a question in interviews. "Tell me about your biggest mistake..."
data_wrangler@reddit
I was interviewing to be the head of engineering at a tech company and this was the icebreaker question in a group interview with a few senior software developers: "what was the first or worst time you broke a system in production?"
We went around the room. Some of the stories were hilarious, others made me want to give the person a hug. One of the best interviews I've ever done, and I've kept that question in my pocket since.
JoeMiner79@reddit
Call it awareness training!
Longjumping-Peanut14@reddit
You just became a real sysadmin - congratulations! 😄
Ragnarokk__au@reddit
See this is why I don't like today's culture. Failure and learning are seen as a negative.
You made a mistake but you also learnt from that mistake. And it was minimal downtime. Don't beat yourself up. Instead dust yourself off and move forward. We've all been there my friend.
You had professional growth today, you may not be proud of what you did, but be proud that you learnt and were able to fix the problem. Enjoy your weekend !
_Born_To_Be_Mild_@reddit
I shut down all the servers by mistake once. Thought Windows 2000 option was just for clients, didn't realise it included Server.
Sachi_TPKLL@reddit
Bro, that is nothing. I was 7 year experience and took a snapshot of prod sql server at 3pm on Thursday and I worked for investment banking firm. Every trade just stopped and took us 50 min to get the server back up online. Price for mistake north of $20k, but lesson was lifetime. Keep your chin up and don't doubt yourself. Good luck.
egpigp@reddit
Just shows that you’re busy!
MajStealth@reddit
My coworker with 15+ years at this corp alone took out our site last week with a known bug to him. Configuring a vlan in a bagg, killed the primary core link to the other core, stp not doing its thing, both storages deciding its time to shutdown because why bother with the other working link, killing the whole vmware cluster. Took around 2 hours until most services worked again. Was a not so nice thursday.
Inn0centSinner@reddit
They're not going to fire you over a dumb mistake especially after you've been there for a year already. Blame it on improperly labeled ports. Now if it was your first day on the job. Boy oh boy.
Hazzula@reddit
boss probably knew what it was because he has seen it or been there before. take it off your mind this weekend so you can come back stronger next week
evolutionxtinct@reddit
Bud this is first of many journeys lol own up to the problem explain why you won’t do that screw up again and find other common ways to not screw up on a switch again… like adding VLAN to a port :)
cdewey17@reddit
I'm confused at the "T3" part of the post. Taking down a network is like a T2 prerequisite 🤣.
Th30gr3@reddit
I restarted every computer in our company through SCCM with a Google Chrome update. I thought I was done when I had to talk to my IT director. His response? “I’ve done way worse, your fine”
KLEPTOROTH@reddit
Welcome to IT.
Madh2orat@reddit
You just got expensive, on the job training. If they know what is good for them they won’t fire you. Own up to it, and move on. Just make sure you learn from the mistake and don’t do the same thing again.
RefrigeratorLive5920@reddit
One of my favorite interview questions for candidates was to ask what was one of the worst outages they caused. One storage admin powered down the active array on a active/passive SAN during an upgrade. A DBA on his first day clicked to reseed replication for a geographically distributed production database causing a 12 hour outage across multiple time zones. Lots of fun stuff. We've all been there. Generally the consensus is that the longer it takes to cause your first outage in a new job, the worse the outage will be. You were there a year and only caused a 30 minute outage, that's not bad going.
coffey64@reddit
Own it and move on. We’ve all done it.
Also, never change things in Friday. Your brain doesn’t work right on Friday afternoon or Monday morning. Avoid those windows like the plague.
Inside-Finish-2128@reddit
The bigger question is why didn't the network have protections in place to guard against an L2 loop?
GeneTech734@reddit
The only time I have ever seen someone get let go for causing an outage was because they went to sleep before fixing the problem and then lying about it afterwards. It was an obvious lie too.
WorkLurkerThrowaway@reddit
Same. Everyone makes mistakes, if you lie and try to hide it, then you can’t be trusted. That’s when you get fired.
cdheer@reddit
This right here. When I’ve had blunders, I always tell my boss right away. I’ve never even been yelled at.
rocuspeter@reddit
I have accidentally shutdown the servers of an important client . Be upfront that you made a mistake let your team lead know, if you can fix the issue do it if not ask for help from your team.
It’s a lesson learnt and that’s how you grow.
We have all been there, done some really crazy stuff. You will be fine.
fr33bird317@reddit
Welcome aboard.
3500K@reddit
I totally know how that feels. I was working for a large retailer and went in on a Saturday morning before they opened to add a switch for an upcoming project. In the chaos of power cables, I accidentally unplugged their (old) Nortel phone system. Normally they come back up without issue but not today. I had to make an emergency service call to our phone vendor to come out and help me fix this thing (I have little to no knowledge on the BCM, I just look for the happy blinking green LED’s). Several hours goes by, the store is now open and the GM had the district Manager in for a conference call. I basically hid in the Comm room until the tech showed up around 11. He just pulled the power, paused and plugged it in again. A few minutes later all the phones were back up. Didn’t I feel like the Ahole. The only good thing that came out of it was the tech told me this is a common issue and just to cycle power, if it doesn’t come up the 1st time. So yah, I know how you feel. Consider it the IT baptism we all have to go through, once or twice. lol. Luckily, people forget and in a week, you’ll be smiling about it.
NorthOfUptownChi@reddit
It does indeed happen to all of us.
I once accidentally typed "reboot" into the wrong telnet window (telnet, not SSH, that's how long ago it was) and reset our giant production server and file server into fsck'ing 1.5 terabytes of raid, which took about 45 minutes, while about 50 affected production employees in this graphic arts shop all stood around, looking at me, as they weren't able to do anything on their workstations.
I would have faked my own death then, if I could have.
Hang in there! You'll survive.
naednek@reddit
Did you learn from your experience? Good, move on, you won't do it again. Use this time to document your steps.
jeremyrks@reddit
If I got fired for everytime I brought down a network, office, production, etc... i would have had a lot more jobs i guess...
Creegz@reddit
Don’t sweat it brother. Everybody makes mistakes. The important thing is you remember this and avoid it in the future. That said, sometimes you make the same mistake again in a different way.
mOUs3y@reddit
history -c && history -w
Thick_You2502@reddit
Take it as experience. You solve it the moment you noticed. It's a mistake, and you've been honest assuming the resposablity and fixing it. Learn from it. Just don't let urgency or familiarity with the task, get over confident and minimize potential risks. And ***T A K E Y O U R T I M E *** to understand and plan the tasks. Trust me you'll be fine, I had 36 years working on IT, and it's something that eventually happens to anyone.
kimura_hisui@reddit
From the sounds of it you owned up to it, and fixed it, Godspeed! 🤠
Ok-Leg-3224@reddit
Don't worry, I have seen, done, and fixed worse. We are humans and we make mistakes. Please remember that if you cringe at your previous self or the actions done by your previous self, then you are growing and that is a lot better than most of the adults in my life.
kimura_hisui@reddit
God Speed
Ark161@reddit
Own your fuck up, take your licks, put systems in place to make sure it doesnt happen again. We all screw up, we all cause outages. Anyone who says they have neve caused an outage is either lying, has had super kushy jobs, or are somewhere on the neurospicy chart that made them quadrule check everything.
Example: A few years ago, I took down every printer in about 12 major hospitals and about 30 clinics by disabling SNMPv1/2 on the printers. Fun fact, windows print servers just think that shit is offline and unreachable. Fast foward a few months, I accidently bouncing half of our servers becasue I didnt comment out part of the original script I used to get uptimes. I owned both situations. Walked into my boss's office, who was with our directror, and said "It was me. I fucked up becasue x,y,z. I am addressing it now and will put documentation in place so that it doesnt happen again with other staff. I will take any disciplinary action that is deemed appropriate, but wanted you to be aware of the situation and apologize in person". I didnt get written up because I owned it. We also have a systrem where someone has to double check your shit before you run it against prod. It also lead to people taking change control more seriously. Yeah, it was embarassing. Yes, it bruised the shit out of my ego. All that said though, we all screw up, what is important is what you are going to do with the experience.
piekid86@reddit
I joke that my counter for taking down SAP resets each year, and they can't fire me unless I go over.
I don't even work on the SAP team, nor do I use SAP.
Sometimes I break it through. It happens.
Humble-Rhubarb-9688@reddit
Own your mistake, take accountability. People will respect it. No serious admin has not caused an outage at some point.
But never make the same mistake twice.
czj420@reddit
Readonly Fridays.
Purple-Path-7842@reddit
Read only Fridays brother. Learn from it and move on. Shit happens.
AV4LE@reddit
Everyone who works in networking eventually do something like this. I accidentally shut down all the security cameras and door cardreaders in a prison for the criminally insane. It was only down for about five minutes while I was sprinting down to the basement to connect a console cable to the core to revert the config. Still stressful as f*ck.
rburner1988@reddit
Way she goes big guy. Welcome to the club. You're doing just fine.
HerbOverstanding@reddit
Dang, not bad, good job
teethingrooster@reddit
One time I deployed a task sequence to wipe are reload a 100 devices when I was trying to run an inplace upgrade. 👍
jr5mc1lio03fbc4zqsf8@reddit
Nobody is productive on friday afternoon and every sysadmin has done this beforce atleast once
rickside40@reddit
No STP?
1TakeFrank@reddit
Rule #1 No changes on Friday
joeyyoej555@reddit
Ddi you ever send ctrl+alt+del on a console that you thought is windows but it was linux main DB server for a factory a shut down all manufacturing robots? I have 🤣 any many more in my 16 years.. its just what we do man cheer up
Fast-Assistance-8936@reddit
I’m interviewing candidates for an IT role and one of the questions I like to ask in the initial phone screen is what have you broken in IT? I’m just listening for honesty and ownership. Everyone makes mistakes
bdashrad@reddit
If you haven't taken down prod, you're not ready to be a senior yet.
coyote_den@reddit
Linux and windows servers on a KVM, back in the CRT days. Turn on the monitor, flip the KVM to what I thought was the win2k server, hit ctrl+alt+del to log on before the monitor had warmed up.
When it did, it was showing the Linux server rebooting.
Over_Context_2464@reddit
We all make mistakes some more catastrophic than others. The key is owning the mistake and learning from it. If your boss a decent person they will understand and may have even done something like it in the past they won't hold it against you but they will expect you to learn from it, probably even help you understand it better.
Also, if you haven't done so already produce an incident report with as much detail as you can like what was done when and why. Even if your boss doesn't need it it's good experience for yourself.
TrueBoxOfPain@reddit
If you don't - you are not sysadmin :)
supa-dan@reddit
I deleted an entire ad forest decommissioning exchange. Every, single, user.
Luckily it was a non replicated DC and I was able to restore the whole thing from a backup.
You'll be ok, just be more careful in the future 😅
bukkithedd@reddit
Chin up, Sysadmin. If this was your first fuckup: Welcome into the fold proper! This won't be your last time taking down production.
Own up, and OWN the fuckup. Learn from it, remember it, and move on. And pay it forward to when you get to train new people.
boftr@reddit
Ask how we can improve our processes to prevent a problem like this causing issues in the future. Having thought out some options. Turn a lesson into an improvement in process to minimise risk.
Ambitious_Attempt220@reddit
Don’t sweat it, that’s how I got to be where I’m today, just take as a lesson and move forward
richsandmusic@reddit
Take responsibility, come up with a solution so that problem never happens again, reassure leadership that lessons have been learned and won't be repeated.
Whatwhenwherehi@reddit
You shouldn't be anywhere near "tier 3" let alone tier 2.
You are a novice and should be no where near vlan configs.
Get help. Ego check yourself.
notonyourradar@reddit
First time? I've had plenty.
cdheer@reddit
Pfft. Rookie numbers.
Lemme tell you about the time years ago when I had a scripted update to my customer’s network, touching hundreds of routers (just adding a line to a route map). But for the sequence #, I mistyped a 5 instead of a 3.
Guess who broke electronic payments for half of their EMEA retail locations? points thumbs at self
It happens. Years before that, my boss (a very skilled engineer) was trying to debug something and accidentally hit enter after typing debug, which (at least in those days) defaulted to “debug all” and brought down a critical node in our network.
CarmeAce@reddit
As other people have said, learn from it and treble check your work before hiting that write/send it button from now on.
Also if possible follow the second pair of eyes check, the amount of times a second person has done a quick check before sending it has saved me or me saving someone else..
oopsthatsastarhothot@reddit
Apologize. Own the problem. Learn from your mistake.
trouphaz@reddit
Good. Sounds like your confidence needed a bit of a shake up. Learn from your mistake and move on. Now you should be less confident about making prod changes on Friday. Now you’ll feel less confident in just making that change and review and document the steps before you start to make sure you’re on the right port of the right switch. Now you’ll feel less confident to just make the change while your boss is busy with something important and will discuss impacting changes and when they should be scheduled with him. And maybe you’ll have him review the changes you planned before you make them.
Confidence is good, but sometimes we need our over confidence to get a reality check.
Don’t beat yourself up over this too long. Sit down and talk to your boss on Monday. Explain the steps you are going to take to make sure you don’t make this kind of mistake again. This is the learning that your boss should expect. Up until yesterday, you were a guy who got things accomplished but with a certain level of risk. That level of risk is now lower going forward and you are a better employee for it.
Keyboard_Warri0r@reddit
You made changes on a friday ?! Seriously though, we have all done it. Just own it, and learn from it. Guaranteed it will remind you to verify everything!
dRaidon@reddit
One of us!
biacz@reddit
Remembers me of the day when the network guy and myself brought down the whole azure infrastructure for our F500 company. 🤐 shit happens. And also get a proper change management in place
biacz@reddit
Or my inactive user AVD remediation automation accidentally deleted 400 VMs with no backup. We now use OneDrive sync 🫣
ErrorID10T@reddit
Go to your boss, accept responsibility, and ask him to take some time to go through the problem with you and help you understand how to avoid doing it again. This isn't a career ending mistake, it's a place we've all been. The guys who own it and learn from it are the ones I like to keep around.
unstopablex15@reddit
You guys don't use STP?
Mikknielsen@reddit
Those who work hard, makes mistakes sometimes
LiterallyPizzaSauce@reddit
Your boss knew because he's done it before too. Now look where he is.
ducktape8856@reddit
I don't know a single admin who never caused a loop. Welcome aboard!
criostage@reddit
Everyone makes mistakes. All you can do is:
In the same situation as yours, my grandmother said to me once "only people that work make mistakes". Yeah sorry, the translation to English doesn't sound as good but making the point across is what matters.
F0rkbombz@reddit
Learn from the issue, chuck it in the “fuck it” bucket, move on.
Avocado_submarines@reddit
The first time I caused an outage I was like six years into my career. I took it really hard. That Friday night I spiraled thinking about how I put my job at risk, my new family at risk, etc.
I got absolutely obliterated that Friday night. Just hammer drunk essentially just wallowing and mad at myself. I scheduled a meeting with my manager that Monday and apologized. He told me the same thing that everyone is telling you here. Everyone does this, we even had a team meeting and everyone shared their experience of taking something down. Everyone in the team, including managers, etc. all had a story.
I know it’s hard in the moment to not be upset but just try to be kind to yourself and remember:
everybody. has. done. this.
The important lesson is to learn from it and try not to do it again (or at least if you do it again don’t let it be because of whatever mistake caused this). Also try to remember this as you grow in your career and possibly have young direct reports that end up doing this.
thegamebws@reddit
Welcome to the club it's a badge of honour and it is exactly what makes companies realize importance of IT engineers
DerStilleBob@reddit
To move forward you educate yourself what went wrong and make a plan on how to prevent this in the future.
As an infrastructure person you should grow by outages. I tell people that caused outages and learn from it, that the cost of the outage was an unplanned investment into their education.
27thStreet@reddit
Leveled up.
You will be better for this and 30m is a relatively cheap lesson.
iammortalcombat@reddit
Brother I am once implemented an L7 firewall and took down the entire internet for a uni for 2 hours during school session. You were hired for a job - mistakes happen. Learn and move on.
Internal_Rain_8006@reddit
It’s gonna happen to everyone going forward my best recommendation that I’ve always done is verify the damn MAC address. I do it every single time. What’s on the server before and after you make changes and then take the time to do the arp conversion to make sure you know the IP address matches the Mac, which matches a switch port.
1h8fulkat@reddit
A 30 minute local disruption is nothing.
I took down the primary data center for a 100,000 employee Fortune 100 company by inadvertently causing a spanning tree recalculation, crashing the cores and the supervisor cards dying on reload.
Take your lesson and don't make that mistake again....but there will be others...and that's how you learn.
_kinesthetics@reddit
Straight out of uni, I broke prod a month and a half into my platform engineering role at a major agribusiness bank. Hour-long major outage and long arse PIR/RCA's afterwards for about a week. Simple setting got missed, that's all it took. Felt like a right dickhead. We've all done it mate.
Learn what you can from it, head down, and before you know, you'll be breaking prod again with confidence in no time!
PsychologyExternal50@reddit
It happens to everyone. Call it your initiation to the field! Just admit to it, be honest, and lessen from your mistake.
bobdvb@reddit
I work in huge media businesses, there's an expression that I learned early on "I've f*cked up bigger jobs than this!"
And there's a better one that I've shared with many junior colleagues: You can't call yourself a real engineer until you've caused a major outage.
How about someone more prestigious: * Albert Einstein once said 'Anyone who has never made a mistake has never tried anything new.'
How do you move forward? * Acknowledge the failure * Understand what lead to the failure (root cause) * Understand the corrective action that will prevent it happening in future * Document the root cause and corrective actions. * Put it behind you as a lesson.
It's like striking out in romance, you don't wallow in those failures, you move forward, otherwise it's not healthy. I've worked with operators who developed fear because they were intimidated by the implications of their actions. You can't let that happen, so if you have confidence in the knowledge you gained from a failure then you shouldn't be intimidated going forward. You don't have the confidence of ignorance, you now have the confidence of experience.
madferret96@reddit
30 min down time on a Friday afternoon?

420GB@reddit
You say you caused a "big outage" yet what you write sounds like you misconfigured an access switch? I don't think you understand the scale of "big outages". A big outage is when someone at AWS takes down us-east-1 again.
A couple people in one single office location having network disruptions is not a problem to chew yourself out over. Anytime you cause a remote outage that takes away your access is a problem because you can't fix it again without remote hands or coming on site.
DeadAbrasiveness@reddit
Everyone in IT has made mistakes and I can guarantee bigger than the one you made. Own up to it. Apologise. Learn from it. Move on. Nothing more to be said.
Deathdar1577@reddit
💯percent agree with this. Also OP, welcome to the team. We all made mistakes somewhere along the way. Double check next time and you’ll be good.
FunkRobocop@reddit
Be open about it and learn from it, we all do have our mistakes
4SysAdmin@reddit
I started out in networking. Took down an entire manufacturing plant one time (VTP is a very dangerous thing). I fixed my mistake, learned from my mistake, and moved on. Stuff happens. Sure the plant manager was pissed for a bit, but who cares? You just have to learn from it and move on with life.
jamh@reddit
We didn't hire a guy because we asked him to tell us about when he's made a mistake and how he handled it and he said he hasn't made any.
Own it don't try to hide it, learn from it, and know that we're all human. It's how you respond when it happens that separates the initiated from the scared.
Aquarambling@reddit
Don’t be so hard on yourself, as many have said we have all done something, and not just once. The key is to learn from it, own it ,and acknowledge it happened. Then work out and understand why it happened and what could be done differently to not do it again.
Some times we have to deal with issues caused by others who don’t do that and we spend 10 times longer trying to trace and reverse the issue.
Skylis@reddit
Before you start freaking out, nothing you did should have caused an L2 loop without something else being extremely stupid in how the network is configured.
collinsl02@reddit
Indeed - turn on RSTP or STP unless there's a very good reason not to.
snowtax@reddit
We once had a consultant working on a SAN fabric for some new storage we were adding and they uploaded an essentially blank configuration for the fiber network. VMs suddenly lost storage. Backups failed. Servers went offline. It affected the whole organization in one way or another. What happened after? We demanded that specific person be removed from our project, but life moves on.
Dizzy_Bridge_794@reddit
You learn from your mistake and move forward
davidsoff@reddit
Only 30 minutes ;)
It's part of the job, I managed to take our entire site couple of million daily visitors) offline for a couple of hours, by botching a DNS migration. Turns out was domains need a . (Period) At the end.
I learned very quickly that DNS TTL is merely a suggestion. The DNS servers of major Telcos apparently have a minimum TTL of 1 to 2 hours...
Only thing you can do is own it, and see if you can create automated guardrails or policies to prevent this from happening in the future.
I prefer to automate as much as possible, but for some things/systems this may not be feasible.
irve@reddit
A friend took down a hospital once and it made the news without the fact that he did that. Was okay.
DanielBWeston@reddit
I took out an entire building's network with something similar. These are the lessons we learn the best.
MairusuPawa@reddit
Congrats! It means you're actually working!
Cley_Faye@reddit
There's a series of step do to there:
Mishap happens. That's in the name. As long as you handle things in a timely fashion and improve yourself/your workplace processes, you've done your job. The only question remaining is "how to fix the external impact". Anything inside, even "meeting with higher ups", is a local problem, and if there's no long-term consequences, any sane workplace should gloss over it. Customer impact is a bit more complex.
Or, if you want something a bit more reassuring, consider the uptime of global services like github, AWS, Discord, or even some general public phone/internet providers. You messing up, noticing it relatively quickly, and taking steps to fix things is okay.
Syondi@reddit
Own it. Learn from it. Don't do it again and everybody is happy. Everybody makes mistakes, its hiding it that is the problem.
collinsl02@reddit
Don't worry about it - when I was younger in my career I put HP commands into a Cisco switch, which wiped the desktop access VLAN from it - I was trying to move a port from being in the desktop VLAN to the printer VLAN and so I put in:
which on an HP switch would remove the VLAN from the port, but on Cisco it deletes the VLAN using the "no vlan xxx" bit, then drops the "port yyy" bit silently as extraneous information.
I've done much worse since then, so the lesson here is that you WILL make mistakes, how you recover from them is the important bit. Try your best to ensure you don't make them in the first place by having an action plan/change document with detailed steps, do your research in advance if you don't know exactly what you're doing (and if it's a routine task you should have a standard operating procedure/playbook for it or better yet automate it), and if something looks wrong stop & assess.
When you do inevitably screw up though your main planks of support should be "I had a plan and I followed it and I thought it was right" - that's a heck of a lot better than saying "I had no idea what I was doing so I just did what looked right".
A good manager/boss/leader will approach this kind of failure of the plan with an attitude of "mistakes happen, fix it, work out what went wrong so we can learn from it, and move on". Bad managers will cast blame about like a fire hose and you shouldn't work for these kinds of people.
At the end of the day, pick yourself up, get back to work, and you'll feel better in a few days once you have your confidence back.
MozillaTux@reddit
Only 30 minutes downtime ? amateur 😃 On a serious note, this was a great lesson to learn. Nobody but you will remember in a couple of weeks
StrikingPeace@reddit
Only 30 mins bruh
Rhysredditaccount@reddit
Lol, I literally did this three weeks ago. I freaked out for like 3 days and I don't even think about it now haha. Was so stressed I was going to lose my job etc. Our head network engineer laughed and said if that's all I do wrong in my career I should be very happy and proceeded to tell me much worse stories 😂.
You made a mistake, realised and rectified it. You're fine mate.
nishtown@reddit
Mistakes happen, it's how you own them and learn from them what matters most. If you try to hide or lie about it you'll be found out and no one will trust you.. own it.
Provide a post incident risk assessment and they will appreciate.
jacenat@reddit
Are you even in that profession if you have not caused at least 2 hrs of company wide outage?
NUTTA_BUSTAH@reddit
Hey, you just fucked around and that made you find out, now you know about L2 loops and you feel the stakes of the operations so you can make better decisions in the future, and just a minor 30 minute blip is nothing.
Welcome to the club!
DrazGulX@reddit
Document your mistake, write what solved it and hand it to a fresh tech whenever that is the case.
pressure_13@reddit
I’ve been on the job for 25+ years and have caused major system outages at least 3 times. Apologise when you need to and move on. It’s a learning experience to double/triple check before committing changes.
MeanPrincessCandyDom@reddit
A take I haven't seen from anyone else:
Did you follow the steps described in the approved change? Did you violate a Read-Only Friday policy? If there is no change process in place, this is on your manager.
iSurgical@reddit
If you haven’t taken a network down my accident you’re not an IT tech.
Document the issue, learn from it, and move on. A good manger won’t fire you.
bofh@reddit
We’ve all done stuff like this, to the point where I wouldn’t trust someone who says they haven’t because they’re most likely to be a liar or totally oblivious to the consequences of the actions they take.
Own it, apologise for it, learn from it (are there things you could do personally to ensure this doesn’t happen again to you? Are there things your team could do to ensure it doesn’t happen again to anyone?)
Ninevahh@reddit
If you haven't messed up like this at least once, then you're not really in IT.
TheNewl0gic@reddit
Yhe ones thst make no mistakes are the anos that dont touch anything. Just move on, learn and to better next time.
Sarcophilus@reddit
If you haven't knocked over production, you've just not been on the job long enough. It happens to everyone at some point. I've restarted production servers on accident. I reimaged my bosses notebook he was actively working on, on accident. A colleague of mine has redeployed dozens of PCs during the workday because he moved the wrong group.
Every error has lead to review and adjustment of processes. It's part of the job.
Everyone makes mistakes in his job. Our job just impacts more people at once.
Gnump@reddit
Well, the Denic just killed the whole German internet for several hours.
Shit happens. Learn from it and move on.
Rhyton@reddit
Enable STP if it's available.
See you back here when you break a production database.
Shit happens, learning from it is the real lesson.
Parity99@reddit
We've all been there. Own it, learn from it and move on. We're all human.
narcissisadmin@reddit
Do. Not. Lie.
Embrace it, only a complete fucktard would fire you after paying you XXXX $K to learn a lesson.
jmbpiano@reddit
Congratulations, you are now qualified to work on critical networking infrastructure.
Seriously, who would you trust more with a mission critical network- someone who's learned from experience a half dozen different ways a network outage can occur and the steps needed to resolve them, or the person who's never been involved in an outage at all?
ConstantSpeech6038@reddit
It is part of your origin story now. Everyone gets humbled from time to time.
andrewsmd87@reddit
I once brought down all email for a hospital with a fat finger op change in DNS. You just learn from this.
CrownstrikeIntern@reddit
If that’s the worst you’ve done you better step up those numbers son. (Killed video for a few cities for a few hours when I first started is isp land)
badaccount99@reddit
Dude day one leaned against a server hitting the power button as we gave a tour of our extremely small datacenter.
Crap company with only one DNS server, down. EXT3 drive failures. AFS? That a thing? I can't remember. That was a long night as I rebuilt everything from scratch because backups were an afterthought at rinky-dink ISP. He somehow had a job the next day and outlasted me at that company.
Same company lied to me and put me as "cables" on their tax sheet. Two years later a lawyer contacted me with a new W2 and offered to pay for everything. F those dot.com wannabe guys.
Maksimitoisto@reddit
Happens to each one of us, important thing is to learn from your mistakes. Next time you'll do better and verify before deployments.
into-the-ether@reddit
Happens to the best of us pal
xqwizard@reddit
But did you die? Seriously though, own it, put something in place so it doesn't happen again, and move on :)
commissar0617@reddit
why are you messing with pros switches during business hours? this should be done in a change window.
Thecardinal74@reddit
it's IT, if you aren't trial-and-erroring, you aren't learning anything
matt0_0@reddit
Everyone's giving you some positive reinforcement, let me hit you from the other side of the same advice.
Yes everybody has that one massive screw up, but! Your mistake and the other it caused was so minor that this wasn't 'the one' for you! Keep yourself frosty, that big mistake is still to come!
jonathan5505@reddit
Not sure how long you been in IT. But it happens. On my first day at a MSP in New Orleans 20 years ago. I blow up the raid configuration for the volume that had production exchange 2003 database files that SBS 2003 server was using. Longest 40 hours of my life waiting for it to restore from tape backup. It happens but i be willing to bet your never make that mistake again!
bionic80@reddit
There are only three types of admins: 1) Ones that have fucked up and learned to stop fucking up 2) Ones that think their shit doesn't stick and have never fucked up while fucking up ALL the time 3) Unemployed Users.
Geminii27@reddit
First time? :)
somesketchykid@reddit
Write up your own RCA, from the heart, on what happened, what you learned, and your take away from the lesson that you will incorporate into your day to day that will make it impossible to make the same mistake twice
You're not a real engineer til you break production on accident and then bring it back up under pressure. Congrats.
Send the RCA to your boss, directly and privately. Tell him he can use it if he wants but you wrote it for your sake and his so he knows you learned from the endeavor in earnest and will not repeat mistake.
rp_001@reddit
Everyone has done something like this or this same thing (I turned off a phone system that took thirty minutes to reboot)
Learn from it. Double check, backup files, confirm with colleagues , etc.
You probably will make the same mistake again or something like it. Sh t happens. Own it. Try not to make big mistakes too often.
cheeley@reddit
steveamsp@reddit
Absolutely. Everyone screws up some times. We're human, it's inevitable.
Don't hide from it. Figure out what went wrong, own the mistake, and learn from it to avoid that problem in the future.
goronfood@reddit
"I apologize for the downtime. I was testing my ability to troubleshoot and bring systems back up in an extremely stressful situation, and am happy to report that we were very successful.
In all seriousness, I learned an important lesson on how to never do this again."
Depends on your org, but levity + groveling seems to work pretty well for me
SynapticStatic@reddit
An L2 loop? Are you guys running spanning-tree? Wouldn't have taken everything down if it was, would've just shut down one of the ports and kept on truckin'.
Anyways, I wouldn't worry too much. Everyone has taken infra down at one time or another for some reason or another.
If anything, take it as a reason to slow down, double/triple check everything, and make sure changes are documented somewhere so you know what changed last before the break. Some things don't fail spectacularly, or quickly in this way.
DullNefariousness372@reddit
Just never mention it.
DullNefariousness372@reddit
And if they got a problem send them our way. You don’t get anywhere by not breaking shit.
RedditUser41970@reddit
Most of us have been there.
I brought down an entire retail company for 4 business hours once. Wasn't pleasant. Helped though that I was the third different person to take that company down after we acquired it...
Ultimately, recovering from incidents like this comes down to two things: how your employer treats you, and how you treat yourself. Your network had a rough day, but it recovered. Be like your network. You'll recover.
Five years from now you'll be bragging about the time you brought your employer to their knees.
pirutgrrrl@reddit
Congrats! You’re an admin. Welcome to the club :)
mytren@reddit
How do you think your manager knew what it was and assisted with the fix?
It’s a rite of passage. Welcome to your seniorship.
davidokongo@reddit
Welcome to the club 😎 You've now been baptized
Betazeta2188@reddit
Take ownership of the issue, learn from it. If there is anyone out there that hasn’t made a mistake, I call bullshit. Big or small, owning up will go a lot farther than trying to hide it.
bio88@reddit
Its like a rite of passage. We’ve all been there. Have some beers and laugh about it.
jamesowens@reddit
One time I did a “simple” find | grep command on a NAS containing user profiles.
I didn’t know the NAS was physically located on the far end of a point-to-point VPN nor that it ran shares that were write targets for production experiments (science data). I made those discs scan all over the place, negated the cache and hogged all the bandwidth.
It was in scope for testing and was not marked as a VIP system but I still felt really bad for my fellow scientists when I learned what I had done. I learned a few lessons that day.
When you mess up, own it and fix it. If you can’t fix it on your own, ask for help, stay engaged and learn. Staying engaged will earn you respect. If you run, get defensive, or deny it you won’t live it down.
FawdyInc@reddit
Almost every engineer who has been around long enough has caused an outage at some point. The difference between juniors and seniors usually is not whether they’ve made mistakes, it’s whether they learned from them and improved their process afterward.
There’s a well-known story about someone expecting to be fired after a major incident, and their manager responding with something along the lines of, “Why would I fire you after we just invested that much in teaching you a lesson you’ll never forget?” This is part of the profession more than people admit publicly.
MilkSupreme@reddit
I remember one time on the core internet switch, instead of deleting a port from a VLAN, I deleted the VLAN.
That was fun.
post4u@reddit
We have a word for this if it happens on a Friday. We call it "Friday". If it happens on a Tuesday, we call it Tuesday. It happens. You're good. You owned it. Move on. Won't be the last time in your career. Get used to it.
Here's what a good boss will do in these situations: "We had a minor network issue earlier that caused some disruption. It's been resolved."
No blame. No names. It happens. Next task.
emperornext@reddit
sounds like a junior network administrator mistake. your company has weird job titles or shitty separation of duties/elevated roles
vennemp@reddit
You’re not a senior until you’ve taken down prod at least 3 times.
Wagnaard@reddit
If you've never caused an outage you've probably haven't had much to do in your career.
prtnrsncrm@reddit
We’ve all been there. Try to learn from mistakes and don’t beat yourself up.
MonkeyMan18975@reddit
Is there a better teacher than pain and suffering?
JJSpleen@reddit
Service Desk
mossman@reddit
"The burned hand teaches best. After that, advice about fire goes to the heart." - Gandalf
mjt5282@reddit
I worked with a senior manager a long time ago that preferred hiring sysadmin that in his words "got experience breaking other people's systems".
hkusp45css@reddit
I use the same line. I also use that line to save jobs ...
"Who do want in that seat, the person that just proved they can fix what they broke and who we KNOW will NEVER do THAT SPECIFIC thing again, or some stranger?"
hurkwurk@reddit
agony.
scriptmonkey420@reddit
I think I have cause at least one major outage at each job I have had. Gotta learn somehow. It's not doing the same thing twice that is the important part.
Tyr--07@reddit
Yeah exactly this. There's not a single system admin I've heard of that has been in the career for 20+ years and doesn't have stories of 'Oops, okay so when I was a jr I did this and..' people are already laughing because they know of the problems it caused.
Mental-Past-7450@reddit
If you don’t break shit you aren’t learning.
Solaris17@reddit
Once I was RDPd into one of my offices and I was cleaning AD.
I had taken over from another admin and it was really still the "listen to the ground" stage of my tenure, so no rocking the boat while I learned the infra.
I had 2 windows open.
AD Users and computers
&
Group policy management
I am sitting on my couch and I just get done making some configurations and moving some GPOs to new folders and re working the tree.
I moved all the GPOs out of a folder.
I minimize the windows and run gpupdate.
Everything is as it should be.
I bring the window back up
I delete the old folder
I lose connection to RDP
I was in user and computers
😄
Thankfully this specific branch office was only like a 1 hour drive.
bites_stringcheese@reddit
Observe Read Only Fridays.
MyUshanka@reddit
Welcome to the club, dawg. 30 minutes of downtime on a Friday isn't a bad way to do it either.
JagFel@reddit
any seasoned IT professional has done something like this at least once.
Most of us once every few years as we handle new tech. These become teachable moments, and its how you can, and will, grow professionally.
Its so expected that its part of my standard interview questions: "I want to know about your last notable screw up...and what you learned from it."
dat510geek@reddit
We do this once mate. Any you forgot Friday is read-only Friday, no changes Friday. Document, study n prep for next week.
BoredTechyGuy@reddit
Welcome to the club.
chibihost@reddit
This is your opportunity to learn how to do Root Cause Analysis. Learn how your company does it (if at all) and look into how other orgs do it to see what it can be.
Look into the 5 Why method if you have no where else to start. Basically start with what went wrong, and ask why. Once you answer that ask why again until you go five levels deep.
Look at public RCA's (Cloudflare posts some good ones when they have a major event),I think even NASA has things publicly available (it doesn't have to be a tech/cyber example). The goal is to learn how to discuss events without getting emotional or defensive. These discussions should be a blame free zone or people don't take accountability, but that unfortunately isn't always how they are handled.
firereaver@reddit
We have all done this.
Be accountable to the mistake and own it. Tell your manager what you will do differently next time. Consider peer reviews for changes. Focus on blaming the process not the person.
kicker69101@reddit
That's all you did? Come talk to me when you state the percentage of the internet you have taken down.
ATek_@reddit
Ask yourself, how did your boss know what it was?
celestrion@reddit
Kicking the production network in the teeth while lead admin is one of those things we have or will all do. We practice perfection, but we never achieve it.
write netbefore changes and rollback easily. Maybe it's having some sort of monitoring to point out when you've created a packet storm Maybe it's documentation on which ports go where and a process to update the documentation as part of maintenance windows. Maybe it's checklists before/after any network change.When you go to your boss, hat in hand, with a diagnosis of what went wrong and plan to fix it, you're framing this episode as unexpected training that the company's already paid for. You've made your commitment to not let that go to waste; they have that option, too.
I can't imagine a better way to argue that infrastructure changes need change windows. They're a pain in the neck, but they prevent worse pains. Lots of organizations get this wrong and make them so burdensome that they make the overall experience worse for customers of the network, but you can be smart about this to make everyone's life (especially yours) less stressful.
AllAggies@reddit
Did you have/need an approved change request? 30+ years ago I did something similar. When my boss called me to his office to discuss it I stress to him was sorry and learned a lot about risk. Towards the end of the meeting I provided him the change control number that he approved this change in the middle of the day.
We both learned a lot that day.
jerrysupervillain@reddit
In as broad and vague way that I can appropriately disclose:
Some 30 years ago i pushed a change that reversed DHCP propagation and wrecked connectivity for residential customers and businesses in the group of small islands on the West Coast that my company provided service to.
This was on a Friday and we were in the middle of filling hiring gaps in key roles in our NOC and admin teams.
We sorted it out, but in the interim had our very sparse support staff and customer service teams absolutely hammered by really pissed off people. All because of me, a greenbeard, a few cubicles over having been confidently incorrect.
russr@reddit
Wellllll
Braedz@reddit
Does anyone follow change management at all? Lesson learnt.
Document what you want to change, test and have a backout plan. Get someone to review what you plan to do before doing anything in prod.
moffetts9001@reddit
One of us! One of us!
chuckaholic@reddit
Sounds like you learned something today. You're now smarter than you were yesterday. You should get a raise.
log1k@reddit
One time I bumped the off switch on a power bar that I couldn't see and took a server offline. Server attempted to boot, one of the drives was fried. Office was down for about 24hrs. I can't remember the config of it, but I think the drive that died held the hyper V VMs? Obviously no redundancy, but we did have backups thankfully.
It was a really stupid setup too. Large server tower, laying on its side way up top on a shelf. Cables zip tied (not velcro) everywhere. Power bar I couldn't even see (not a UPS) with a flimsy switch to turn the whole damn thing on/off.
30min job turned much longer.
kraphty_1@reddit
30 minutes for something like this is nothing. If they fire for you for it that's a place you don't want to be
Reiterated- no changes on Friday. Add to that Monday as well.
Always double and triple check all changes Tuesday through Thursday.
owlbynight@reddit
Shit's complicated. Good job making it this long.
firestorm_v1@reddit
Everyone breaks prod at least once. What really matters is what you do next. It sounds like you did the right thing, owned the mistake, fixed the misconfiguration, and moved on.
Did you learn from it? Know how to not make that mistake next time? If you did, then good. Take a breather and carry on.
PAiN_Magnet@reddit
If you get fired for something like this, it's not a company/culture you want to work in anyway. Learn from your mistake, double check your work next time, have a back out plan.
D3mentedG0Ose@reddit
Every single person in tech has broken prod at least once in their lives, and if you haven’t, you will.
Own it, learn from it, and either fix it or assist with the fix. Then, document what happened so it doesn’t happen again
beren0073@reddit
Well, now you learned your lesson - no changes on Friday.
CosmoMKramer@reddit
Congrats! 30 mins of downtime, no data loss. Happens to all of us, no matter how long you’re in.
Gullible_Vanilla2466@reddit
ive lost an entire weeks worth of data on a file server trying to fix broken backups, and locked everyone out on a friday at midnight after accidentally flipping on a CA policy that targeted all users instead of admins - still here
Ginsley@reddit
Honestly just learn as much as you can from it and move on. Everyone screws up eventually, and honestly yours wasn’t that bad. Nothing got erased, nothing was destroyed and nobody died. Co workers will probably give you shit for a little while but that’s probably the worst of it
FireCyber88@reddit
Not a bid deal. If you’re not making mistakes, you’re not moving fast enough.
ireddit_didu@reddit
I’ve done worse. Much worse. And kept my job. You’ll be fine.
MastodonMaliwan@reddit
Oftentimes, the best way to learn is a painful way to learn.
slapjimmy@reddit
You'll never do it again now haha. Talk to us in 5yrs when the pain has worn off hehe.
Sliverdraconis@reddit
Been there done that but why a trunk? We do accessport with voice vlan tagging for workstatiom ports with a phone as well. Cisco shop
InfantryMatt@reddit
My company was like oh you said you were interested in networking. Guess what you are the network guy. We need this this and this figure it out. I started configuring a 2960x and gave the mgmt vlan the ip of our virtual servers. Bogged everyone down for hours
CombatBotanist@reddit
Yea, like everyone else already said, we’ve all been there, and this seems pretty mild. For serious events at my job we would fill out a Correction of Error (COE) which lays out what went wrong and why as well as how it can be prevented from happening in the future. That last part is the most important and we aim to develop concrete and actionable plans for improvement. It’s not about blaming anyone, it’s about improving the system to make it more resilient and make it harder to make mistakes. Usually this document is shared with those impacted, or at least the higher ups, as an acknowledgment that something did happen and it’s being taken seriously.
zenjabba@reddit
Welcome to IT. Nobody is going to do anything but laugh at this as we have ALL done it and will do it again even though we try not too.
mvdilts@reddit
shit happens and we've all done something like this. Take a deep breath, walk away for the weekend. There was no permanent damage done, only a 30 min disruption on a Friday afternoon and you now understand what you did and how to fix it.. It could have been much worse. Learn from your mistake and you'll be golden. One of the worst thing I did , when I was green on the network side of things, was enable QoS on trunk ports between cores and took down the campus for about 12 hours 😄. I'm still in the industry and was at that company for several more years after the fact.
Katonawubs@reddit
Welcome to being a sys admin. This is why we have "Dont fuck it up Fridays." No big changes or changes that could cause issues. Save those for monday.
Top_Boysenberry_7784@reddit
That's not too bad. Was only 30 minutes and hopefully you learned something. I have made changes to wrong ports many times. If you have a backup of the config it helps a lot. Plenty of free/cheap ways. Quick and easy ways on a budget is unimus or sftp backups to a server every time you write. Also "reload in 5" is your friend in case you lose access.
That's nothing. Try making a change at a site halfway around the world and losing all remote access. Which meant the site lost all outside connections like Internet/SDWAN/MPLS. Local IT didn't know English and had his wife bring his son to the office to interpret as he knew some English.
Visual-Fun-1161@reddit
I accidentally shut down our domain controller, print server, and file server all within 2 days apart when I had first started my job. I meant to sign out but my brain fumbled and clicked shut off. I’ve also messed up a config on the firewall and traffic stopped routing once. Safe to say we’ve all been there and you’ll be alright
TheJesusGuy@reddit
I took my ENTIRE BUSINESS DOWN FOR THE ENTIRE AFTERNOON causing an L2 loop. You're fine. Switches didn't have stp and server rack was spaghetti so I literally couldnt see, but they didn't want to replace the switches after I'd asked for 3 years about.
deteknician@reddit
We have a saying in my country that's something like this: one who doesn't make mistakes is a one who doesn't do anything.
Sqooky@reddit
OP - we've all made mistakes like this. Mine was Defender quarantining one of the mission critical service accounts we have that's used in the HRM process & role granting process at our company, plus a few other misc tasks, like modifying users and computers in the environment on a general purpose web app, similar to AD U&C. Nothing outlandish, but if you've never had Defender quarantine an account on you, it's an absolute bitch. Very much reminiscent of what you felt as I don't know that any of us knew the SOAR feature was turned on.
DoctorOctagonapus@reddit
There are three types of sysadmin. Someone who broke production, someone who is going to break production, and someone who's so useless no one in their right mind will let them near production.
RelevantToMyInterest@reddit
Hey, if it makes you feel any better, i overwrote a prod db because I was careless.
On the bright side, SQL backups are working perfectly.
TyLeo3@reddit
Did someone die?
joshbudde@reddit
As far as things goes? This is nothing. For reals. It's embarrassing, you didn't really interrupt anything long term, this is baby town frolics. Just have a chuckle, say sorry guys! we need better documentation and go on with your weekend.
MrBoobSlap@reddit
Sounds like they’ll be promoting you to IT Systems Engineer, Senior during your annual review.
rankinrez@reddit
The engineer who’s never caused an outage like this is the engineer who’s never done any work.
Don’t sweat it. But there are learnings here, like why no EVPN or at least spanning tree? Vlans being configured manually instead of as part of some automation workflow?
frozenwhites@reddit
Welcome to being a real system/network admin, my guy. It took you a whole year to F up? :) You’ll laugh about it later. Good job learning and understanding why this happened and how to do it right next time.
wunda_uk@reddit
I took out a large storage array it cost the business half a mil roughly, wasn't my fault, a bug nuked the controllers I just pushed the button a few late nights and a nice overtime payout later and a chat with some senior stakeholders all was good stuff happens it's how you react that matters more
m00ph@reddit
Coworker took AA's website down for two hours once. Client engineer did finally fix the scripts so they couldn't deploy to the wrong environment finally.
Everyone has done it. Try to learn, and not make a habit of it.
woodjwl@reddit
Bruh, Friday's are read-only. Don't you know that! ;)
Shurtugal9@reddit
You learn thats how you move forward. You own the mistake you learn and you move on. We all have done it, I took down and entire office because the config on a firewall wasnt actually committed so when I rebooted, it disappeared. I owned it, fixed it, and then asked to go to additional training so it wouldnt happen again. Youve earned your badge of honor by taking something down with a mistake. You will be better for having done it.
JPDearing@reddit
One of us! One of us!
It was 30 minutes, not 30 hours. Fess up, accept responsibility, show that you have learned something and will never make that mistake again, and move on.
As has been said many time before here in this Reddit, are you rrally a Sysadmin if you haven't broken production?
Good luck
beedunc@reddit
Just be forthcoming and don’t lie about anything. Stuff happens.
20-year IT person/network (Cisco) person here, at a new job that uses HP. My Cisco muscle memory did not serve me well: command on Cisco - ‘help’ Same thing on HP? Remove all ports from vlans.
Luckily my boss was well versed in shitty HP, so he resurrected it ‘quickly’ (whole office down for 3 hours).
Stuff happens.
gumbrilla@reddit
Oh man.. Been there, done that. We all have, this is called experience. Learn from it, improve, and try not to do the same thing again.
There's a scene in the movie Jaws, where grizzled old men are sitting on their boat talking about the shit they've seen, and then start comparing scars. You just earned a scar! It's not a nice process to get one, and you'll not forget it, but you earned that scar, and when the immediacy of this goes away, you've got a reminder, and a lesson, and best of all a story.
This all probably maps to the grief curve, I can't be arsed to try to figure that out, but it's probably something like that...
oh. and p.s. don't do changes on a friday 😃
itishowitisanditbad@reddit
Exactly.
The one offs are not important.
Its when it happens again and again and you have to question a persons judgement.
Honestly just having OPs reaction to it is a huge green flag. Its the people who shrug that are the problems more often than not.
kizzlebizz@reddit
Not 3 hours ago I seated a network card on a server that wasn't fully seated because I wanted to use the 10g vs 1g interfaces. Wouldn't post afterwards. I thought to myself, "Why the ever living hell would you do this on a Friday?"
williamp114@reddit
I made a similar-level mistake when I was a green JOAT. I'm still at the same company almost a decade later (though i'm trying to get out of here -- unrelated reasons)
You'll be fine, OP.
zanzertem@reddit
Everyone will forget in a week trust me
palogeek@reddit
Everyone has done this at least once. My first night in a NOC we had a satellite go spinny, and I managed to break BGP for the whole country in trying to find a workaround. Shit happens, live and learn.
Now you research spanning tree, BPDU's, write a PIR, and document what can be changed to ensure that this does not happen again moving forward.
Better yet, invest in loop free fabric.
If I didn't break shit badly early in my career, I wouldn't be the person other team members come to when random stuff breaks in weird and wonderful ways.
lbaile200@reddit
lol
Hot_Ambassador_1815@reddit
Own it. Learn from it. Laugh about it when you're drinking a beer
master_illusion@reddit
Welcome to IT…..
clubfungus@reddit
Don't let these other posters fool you. I have never made a single mistake in my life. Anyone who does make mistakes is subhuman. Some of us are just meant to be Soylent Green.
/s if not obvious
cmonspence@reddit
You have arrived, welcome! It’s warm here.
ecorona21@reddit
If you haven't caused an outage, you are doing it wrong.
Top-Examination-6800@reddit
It’s ok
Traust@reddit
I watched a project manager turn off the power to the entire server room as we had no idea what a switch power panel was powering in a room next to it. Turns out it was the breaker between the UPS room and the server room and patch cupboard.
Everyone had a good laugh when I told them what happened since they knew it was a learning experience for him and gave them a short break while everything restarted.
Common_Arm_3316@reddit
This was one of the first mistakes i ever made in a profession IT capacity. I never forgot about it and learned a lot about spanning tree that day
kellyrx8@reddit
Dont worry it happens to us all :)
Hell I ended up clicking the wrong phone in Meraki and wiping my CEO's Phone when he was oversea on Vaca.....ooopsie!!
iogbri@reddit
This is part of it, this is experience you got today. Everyone of us did it at least once.
I once made a change that crashed reports in an erp system for over 100 customers. Fixed it within 45 minutes but that was a stressful day.
ManLikeMeee@reddit
Man, join the club.
High five brother! (Or sister!)
Seriously, it happens to everyone, I say, if you ain't breaking something, you ain't working.
Just take backups, learn, take notes, pictures measure twice, and most importantly, don't let it beat you down.
The stuff we do is actually really difficult when you think about it as it's pretty much another language...
ButlerKevind@reddit
Oh, this brings back memories of me running CyberCop on an NT 4 hospital network back in the day.
Started off scanning my workstation, then my peers asked me to scan theirs. Instead of using the scan I had queued up, "tweaked it", ran, inadvertently scanned the entire Class B network and "allegedly" crashed multiple servers (which were revealed to have had multiple vulnerabilities found), in addition to EVERY pc on the network emblazoned with the Windows Messenger Service banner "Your system was scanned by CyberCop".
Fast forward two years later, get "volun-told" to be the new firewall admin for the org.
Good times.
Rothuith@reddit
On a Friday afternoon?! Are you crazy?!
vCentered@reddit
Lol.
Lol because you don't realize but in ten years you will have fucked up in such amazing ways that you won't even remember this moment.
DehydratedButTired@reddit
Welcome to the club. It happens. Make sure to show the root cause and your plan to make sure not to happen again and you are good. Also don’t take anything down for a while if you can help it.
Also Read Only Friday is really important.
deadfulscream@reddit
We do RO Friday's at my work.
You only do break fixes, documentation, no major changes on a Friday so you don't have to spend the weekend trying to fix things. A good lesson learned.
My company, we just look at what happened, have a coaching session then call it a day.
megoyatu@reddit
Use Ansible and git, then you can roll-back.
ThePoopShovel@reddit
I've been in IT for over 25 years. I could write a book about all the shit I have broke in that time. It happens, and will happen again. All you can do is own it and learn. How you handle mistakes is more important than the mistake itself. Take your medicine and move on. If your superiors punish you or fire you for it, you need to find another place to work.
toulouse420@reddit
But have you ever broken your change management system because you were to descriptive with regards to the change?
Tricky_Adeptness1643@reddit
We all fuck up , I work for New Zealand’s second largest fuel trucking company and with a simple group policy change in Intune managed to uninstall the somewhat complex delivery platform from all
Trucks at once at 5pm on a Wednesday a and went home without realising - spent 1am - 9am remoting into every truck ( 100+ ) to re deploy and configure the application before the scheduling team came in and there was minimal disruption but holy moly
colni@reddit
Don't make changes on a Friday
burundilapp@reddit
Only 30 minutes, little league, come back when you’ve had the entire company off for a day.
Sobeman@reddit
no change fridays
BoysenberryDue3637@reddit
Ben there done that. Unfortunately more than once. Hopefully you had proper approved Change Control open.
f0gax@reddit
Learn from it and move on. That's all you can really do.
GiftFrosty@reddit
Own your mistake. Take responsibility.
DarkSky-8675@reddit
If you had any idea of the things I've broken. People make mistakes. Admit it, own it, tell the people you work for how you'll learn from it and then move on. Yep, you'll feel like a putz for screwing up, but tomorrow is a new day.
SGG@reddit
Agree with everyone else this is a rite of passage.
Accept you made the mistake, apologise (only once, and mean it) for making the mistake, explain the mistake, help to work on the solution, figure out how to make sure it doesn't happen again and/or how to lessen the impact
teslasnoot@reddit
As you proceed in your carrier, there will be others and guess what. If you are growing, they will be bigger Oopses. You’ll get better at avoiding the smaller ones (most of the time) and the bigger ones will pique your interest once you get past the shame etc. And then you are hooked being a real sysadmin till you can’t do it anymore for whatever reason
Isntagram@reddit
Well you could blame it on an intern
Skullpuck@reddit
You need to convert to Read Only Fridays. Set it in stone. Never ever make changes on a Friday. It's cursed.
intmanofawesome@reddit
On a Friday as well. Its why we have no change Fridays.
AlexMelillo@reddit
A rite of passage. Well done lad
DrWarlock@reddit
You've just gained a level..... experience. You won't do that again or at least double check your work next time. Right of passage.
The_Syd@reddit
Don’t beat yourself up too hard. We have all done that at some point or another. I misconfigured a trunk port on a switch and somehow managed to write the memory before it error disabled and took down the accounting team for over an hour while I worked on trying to correct my issue. Just learn from your mistakes and grow.
RoadBlock97@reddit
Let me tell you about the time I forgot that server drive numbering started at 0 and I had to replace a failing drive 1 in a raid.....turns out raids don't recover without a certain number of "healthy" disks. 20+ years in IT now.
Fix your mistake, do what you can to ensure you don't do the same again, and forgive yourself.
Commercial_Growth343@reddit
Start a new tradition in your office! A company I worked at did this many years ago: They had a trophy some intern left behind, so they treated it like a "i goofed" prize. Whoever caused the last outage had to display this orphaned soccer trophy on their desk as a badge of shame. This was passed around the desktop management team for years. good times!
TheFrin@reddit
Don't sweat it - I took an international airport out for 4 hours by forgetting to type the "add" on the "switchport trunk allowed vlan add" command...
It happens. As long as you don't lie about it, learn from the mistake, be gracious to the shouting you're going to get... you'll be fine.
Representative-Crow5@reddit
Don't worry, it happens to all of us. I once deleted the entire sales table in a production database. got a backup and life went on. downtime was an hour but no one really cared.
lewas123@reddit
Welcome to the club
LaughableIKR@reddit
Learning by doing sometimes. My co-worker deleted a 50TB LUN on the SAN. There was a guy at an ISP back in the early 90's who plugged the dial-up modems into an outlet, and every time someone turned the lights off when they were leaving, it would shut down 1/2 the modems.
Stuff happens. Randomly.
ReptilianLaserbeam@reddit
Aaah I thought it was something worse. Don't worry, won't be the last time hahahahahaa remember for next time ONLY READ FRIDAY
tshizdude@reddit
It’s a right of passage. Congrats!
AverageDummy2@reddit
Its official. You're a member of the club. LOL
Now what did you learn? That's the important part
I would say the leason is always"show run" and make a backup before you do any changes.
Personally Inhave logging turned on my mRemote client so everything I do is recorded at all times. So "show run" will be in my logs.
PretendStudent8354@reddit
Why would you configure that as a trunk. Configure it as an access port with a voice vlan.
IKnowCodeFu@reddit
Congratulations you’re a real sys admin now! Did you do this on a friday?
Markuchi@reddit
Everyone has done this but what sets apart a average sysadmin from a good one is understanding exactly what went wrong at a very deep layer and then owning upto it, learning from it and using your experience to explain to others what went wrong as a post mortem so no one else will hopefully make that mistake. If you can find ways with technology to help mitigate even better.
royte@reddit
Sounds like learning! What I do is own it, develop method to not repeat it, and proactively reach out to supervisors and application owners. I've taken email down for a week, ERP down for a day, and reporting server for an the busiest hour of the day.... all many years apart from each other.
ObiLAN-@reddit
Been there done that before.
Worst was in my early days, was on a ladder trying to get some cables down off a cable trough. Miss stepped and lost my balance, fell, and took out a couple patch panels out the way down.
That was the day i got really good at punching cat cables lol.
lilelliot@reddit
Dude, it could be so much worse. I used to manage an app (as the developer + admin) that was running on local servers at 52 factories around the world. Without this app (client server on Windows - .Net.+ MSSQL), products couldn't ship. At this time, many moons ago (~2006), we were using local admin accounts for system accounts, and this application had local use logins for both the appserver and all the SQL Server DTS packages that using JDBC to pull data from various Progress databases. You may be able to see where this is going.
We promoted an IT director to become "CISO" and his team didn't communicate very effectively with anyone outside the datacenter sysadmin/netadmin team. That is to say, they didn't communicate very well with my team (I ran Enterprise Apps), and one day they decided to forbid any local accounts.
Long story short, they broke all 52 factories where this app was running, and they refused to back down from their mandate, so I had to go reset credentials on 104 servers to use new domain accounts, also change the connection strings in about a dozen DTS packages on each database server, and then re-run a bunch of data extracts.
Nothing was permanently broken, thank goodness, but it was a royal PITA trying to make a perfectly sane change ... but as an emergency fix.
BokehJunkie@reddit
One time I accidentally shut down ~30 production servers in the middle of the day.
During my first or second week working remote. after we’d had a “absolutely no work from home” policy for 10 years. lol.
Phlight_1@reddit
Welcome to the club. It has happened and will happen to anyone who does this for a living. Hell to be honest 30 mins is not bad at all. Life will go on and in a week no one will remember the 30 mins coffee break.
SenTedStevens@reddit
That's NBD. We've all been there. At least it was an easy fix.
I mentioned a long time ago and I was really green with Cisco gear. I screwed up my commands so instead of removing the VLAN on a port, I inadvertently deleted the whole VLAN for our back end data switches. I shit my pants and frantically ran to the server room and unplugged that switch and plugged it back in. Since I didn't save that config, everything came back up normally. But, man was that scary.
foxfire1112@reddit
I bumped into a power switch and turned off a SAN in production in the middle of a deadline day. It was terrifying. That's when I got to learn alot about of SAN storage worked but still, in the moment I thought I was having a heart attack. You 30 min goof up is really nothing
nyantifa@reddit
This seems like a really minor fuckup. Don’t sweat it. If anything, you can use this as a “learning and growth experience” in future job interviews
NSFWies@reddit
A real good system, wouldn't have allowed you to kill it so fast.
Glad it was able to be fixed so fast. Seems like you understood what went wrong.
Double check things next time before you make changes. Review plans with others. Do a dry run before you change live production systems, etc.
We've all done it. You're doing fine
SergioSF@reddit
How bad? Like...causing hundreds of thousands in lost productivity?
Your issue has been shared before here.
Downinahole94@reddit
Write down everything you did wrong and how you would do if differently.
Don't be a lazy ass I mean really think about it, what did you do differently this time that led to this outcome. Obviously you have not been breaking things all year.
What didn't you know. Why didn't you know It?
Adrieckart@reddit
When I interview candidates for roles on my team, I always ask what is the biggest outage they caused, how they fixed it, and what they learned from it.
You can learn a lot from an outage, and you'll likely never make that mistake again.
If it's any consolation, the higher up you go, the bigger the blast radius your mistakes will have.
TrainAss@reddit
Congratulations. You've joined the club. We've all done something that has taken things down. It's a rite of passage.
You won't make the same mistake again, that's for sure.
BigJDubya@reddit
I once made a policy change in our MDM and inadvertently wiped 65 phones. Each one had to be setup again manually.
I'm coming on up ten years.
You'll be alright mate!
bit0n@reddit
I had a guy in my team do similar but I had to fix it. I showed him the fix showed him what he should have done called him a knob and reported upwards it was a training issue that has been rectified. What really impressed me the morning after he came to me with a RCA explaining his error how and why it happed with a sensible way of preventing it. So then rather than continuing to shield him I outed him and put him forward for a recognition award. It’s the was you respond sometimes that’s important.
Ian-Cubeless@reddit
Everyone has a story like this. You caught it, fixed it, and now you know exactly what an L2 loop looks like in the wild. That's worth more than any textbook. Document what happened, what you did to fix it, and move on!
greenonetwo@reddit
Apologize, try to learn from this mistake, be more careful next time. Also, ask for a staging or test environment.
ohnotthatbutton@reddit
You feel bad, but in a few years you'll realize how much this doesn't matter. If no one was hurt or died, it doesn't matter.
catwiesel@reddit
yeah, that was a small oopsie and no (big) harm done. learn from it, and you're all better for it.
Mostlyamoron@reddit
In the way way back we had tool, thought it was reporting for AD objects. Friend ran a report for all active devices in Computer containers company wide. Ended up with 800-1200 devices, cant remember. He's done with 'report' so hits delete to delete the query/report.
Its starts deleting all avtive Computer objects in the domain. He immediately comes to me, I confirm, thankfully he didnt have access to Server containers. Calls start flooding in from all over the country. Only time I ever had to do an authoritative restore of AD.
Its been 20 years and we still give him shit about it. Also my favorite story to tell youngins about unintended consequences.
SpaceChimps98@reddit
Resolved within 30 minutes? And on a Friday afternoon and not a Monday morning? That's basically nothing. You're fine.
d3kelley@reddit
Rule #1: Don’t make changes on a Friday.
RuiFerreira_10@reddit
How do you think your manager knew it the problem straight away? he's probably done the same! Don't feel too bad!
What you should do is take full accountability and learn from this!
chubz736@reddit
Man. You good. I wish I was jn it system engineer role
fjlj@reddit
Badge of honor... Badge of honor... :)
LabRepresentative777@reddit
I did this on my first week on the job. Still working there
monkeyinnamonkeysuit@reddit
It is a difficult job, sometimes things go wrong.
Everyone has done it, multiple times. I have made this exact mistake, probably 15 years or so ago, though mines was very much a layer 1 problem, so arguably even dumber. Yours isn't so bad, nothing lost. I ask everyone I interview to tell me about their worst ever day at work. If they don't tell me about a story like yours it raises a bit of a red flag for me - I want to know they've got some scars. The best answers talk about how they reacted - owning up, focusing on the quickest route to resolution at the expense of ego, the post-mortem they did in terms of L&D, RCA and potential process changes.
Once you get out of your head on it you'll realise you are a stronger engineer for this.
Cryptic1911@reddit
Done that myself and knocked out the entire switch stack. Lost everything in our building with 100 pc's and ip phones, plus file servers for our 5 offices and a rack of citrix servers hosting an erp solution for like a dozen companies across the country. Good times
patmorgan235@reddit
It happens, 30 mins is a pretty minor outage.
Do some more planning before making changes on high impact switches like that. Maybe come up with a checklist on how to do it right, write out the step you want to do before you do it.
sadmep@reddit
You just earned your wings. Don't stress about it, you'll probably be fine. And I doubt you'll make the same mistake again!
mcshanksshanks@reddit
Just about 30 years in the field checking-in, you’re not a true IT Pro until you have an outage named after you. Carry on.
kizzlebizz@reddit
Welcome. One of us.
One night I was moving scsi connections or virtual storage or something on our exchange. Remounted the disks and exchange showed dirty shutdown. Started to run the clean up on the first database, showed a time of roughly 23 hours to fix. We had 5.
Open Vcenter - delete vm. Open back-up appliance - export to virtual machine. Took 12 hours to restore.
One time I moved a patch cable at our head end. Walked back to my office and saw, oh wow, a whole location is down...Ran back, moved patch back. Shutdown a whole location for 15 minutes.
beagle_bathouse@reddit
At each job, everyone gets one big fuck up. Just be honest, handle it well and learn from it.
chicaneuk@reddit
Everyone has been there. You're still very new to this. This is part of learning. If your boss isn't a complete penis, he will also know this and won't bust your balls for it.
toadfreak@reddit
Own it, document it, and (officially or unofficially) add steps to your process that reduces the risk of this in the future.
toadfreak@reddit
Then move on. Next. No time for yesterdays problems today.
Narcoleptic_247@reddit
"Damn, my bad." Take it on the chin and learn from your mistakes. We've all been there.
Vasillni@reddit
I once tried to reconfigure a switch stack of 3750 to enable ssh to not rely on serial. The whole stack died. In a DC, in another country, about 3500km(almost 2200miles) away. Stopped switching on all ports. Had to call up the NOC at the DC to have them go at physically power cycle all the switches in the stack. Down for almost 1h. This was an online casino, just before peak hours. I had worked there less than 3months. I still don't know what happened. But I rebuilt the whole network to be redundant without a switch stack. Still afraid to rely on stacked switches, over 10 years later.
MavZA@reddit
Shit happens. Just document this incident and the resolution. You’ll learn from this and be in the same spot your boss is one day. Guaranteed he’s done exactly this or similar.
DrinkYourGravy@reddit
If you aren't breaking things periodically, are you truly a sysadmin? The important thing is how you recover and it sounds like you were on top of things.
Pr0fessionalAgitator@reddit
1: we’ve all been there. Learn from it. Don’t continue to make the same mistake, if you can avoid it.
2: ‘Read-only Fridays’ are a thing for a reason. It means don’t make changes to production unless it’s critical, or you have a game plan.
3, when I started networking, this is what I did for abit. Since you’re using Cisco switches, type-out all the exact cli commands you’re planning on implementing, & send it to your boss or a more-senior network guy to review.
I still do this when a colleague, just to make sure I’m not missing something. Also, get a running config copy saved to a txt file before implementing changes while you’re at it. Makes recovering from a backup easier.
There are also other options for some vendors, like last good configuration recoveries to look into…
You get the gyst.
Initial-Expression91@reddit
Lol while deploying a new RMM and configuring server update policies, I inadvertently rebooted all company servers in the middle of the day.
It happens to all of us.
SaladClassic@reddit
Don't beat yourself up, we've all done it.
karokajoka@reddit
You’re not an actual admin until you’ve nuked prod. Congratulations.
enigmaunbound@reddit
Write a retrospective. Describe the individual steps and thought processes that led to the error. Critique those decisions. Describe how to not have resulted in that a error. Then ask a senior person to review this with you to find anything you missed. If all you do is the exercise then it's valuable learning. If your leadership is worth their salt they will respect the effort. If your org has incident management this is a great Lesson Learned record for audit and accountability. It should also be training for new hires and used as interview questions for prospects. If they can't follow your path they may not be worth salt. If they have a better solution then they are worth gold.
merked84@reddit
If you do this job long enough you will absolutely make a mistake worse than this. No one wants to screw up but it’s life, you’re being too hard on yourself.
CommandSignificant27@reddit
I did a very similar mess up a couple months into being promoted to Network Administrator. Own up to your mistake, make sure you understand your mistake, and keep moving forward. Mistakes are going to happen. This likely won't be the last time you cause an outage it's part of the job all we can do is learn from it and keep going
Cisco-NintendoSwitch@reddit
Welcome to the Club, you just passed initiation!
jonisjalopy@reddit
Oh honey, we've all done significantly worse. You'll probably get an ass chewing, but we take those lol
ooglesnoopleboop@reddit
It sucks right now but it’ll be a funny story you and your boss look back on in the future. Been there, done that, don’t sweat it.
TheEvilAdmin@reddit
I've never done anything like this
derango@reddit
Crap happens. You're going to make dumbass mistakes. This one wasn't that bad. Just own it and move on.
DarkEmblem5736@reddit
Expanding on this guys comments. \~30 minutes of chaos, people will move on with it. If you caused hours of disruption... your 'honest mistake' would be more scrutinized. 30 minutes, a learning experience. People got their coffee and the world moves on.
IamHydrogenMike@reddit
Happens all the time, it just matters how we recover from it and if we keep doing it.
ZeroOpti@reddit
Along with owning it, document it so others don't make the same mistake. Prove to management that you're learning from it and trying to protect the company from future mistakes.
Pyrostasis@reddit
Welcome to the club my man!
You really arent a sysadmin till you break something in production. Some of us break something small like a server and some of us break something big like the internet for the entire east coast. If you work in IT long enough it will happen to you.
The good news is it sounds like you owned it, you learned from it, you definitely will NOT make that mistake again. Look at it as a growth opportunity. Your company paid 30 minutes of downtime for you to learn some lessons.
Make sure you understand what you did, what you should have done, what could have prevented you from doing what you did, and then make sure you dont make at least that mistake again.
No one is perfect, everyone fucks up occasionally, the goal is to learn from the mistakes and make fewer as the years go on.
Icy-Comparison-6045@reddit
You gon break more, comes with the territory so you’ll be alright.
Mister_Brevity@reddit
Document what happened, what the problem was, what was done to mitigate; and how you’d refine the process to avoid a repeat. Mistakes happen, that’s how we learn.
hurkwurk@reddit
one of my greatest joys as a senior admin was the friday when the junior backup admin came to me and said that he was trying to delete and reconfigure a backup job and accidently deleted the entire backup configuration at 4pm.
I looked at my watch (that should tell you how old the story is) and said to him with a smile, well, you have until monday to have the jobs rebuilt and backups ran, enjoy the overtime. and left to go home.
The best part is I already knew that we could do a restore of this crap a dozen ways, and missing a full backup wasnt the end of the world and the Risk factors were minimal, so giving this junior the chance to take a deep dive into the backup system and really get to know it? it was priceless.
by monday, his confidence in using the system (hp's crappy enterprise backup software if anyone is wondering) had grown tremendously, and the reports for the weekend backups arrived on time and all backups were complete.
I reviewed his configs, he'd made a few changes, nothing out of the ordinary, and had corrected a few things that were outstanding as well... all in all, aside from the overtime cost of it, it was actually good work and got us ahead of the game on the backups rotation condition. (this was an older virtual tape backup to physical tape afterword for offsiting) he learned more in that weekend than the prior 3 months :)
AmiDeplorabilis@reddit
25y ago, I was cleaning up a Cisco router config. It had been migrated through THREE different networks, apparently with almost no cleanup, just adding in the new IP scheme and on they went. I pinged each listed IP, cross-referenced it against notes and documentation, then moved on.
I deleted an IP route, saved the config, then waited about 20m. No screaming or frantic phone calls... all's well. I headed into downtown Frankfurt (Germany), 20 minutes away. I got out of my car, and my cell phone rang (yes, it was a Nokia 3310). It was the help desk where I worked, and they suddenly had no network or Internet.
Here's where it got strange. It had been at least 45 minutes since I deleted that route, but they had only lost Internet a few minutes before they called me.
I drove back and reinserted the deleted IP route... network and Internet were restored immediately. The network was undocumented, and I found no other traces of that network on our other devices.
With that, welcome to the club! Stuff happens, sometimes stupid stuff. Learn from your mistakes, even laugh at them, and keep growing. As long as it's not being done maliciously and/or intentionally, you're just like the rest of us!
brisquet@reddit
Don’t worry about it, learn from it and try not to do it again. Ironically I had the same thing happen a couple weeks ago where our network guy was configuring a port and did the trunk port instead. Was on the call and he went “oops, can I get someone to go console into the switch right now?” I laughed thinking he was joking, he was not lol.
icss1995@reddit
Live, learn, laugh! The key here is you realized your mistake, owned it, and fixed it. To be honest you’re the type of person I would want in this role. Also the concern on how to move forward is just some of that imposter syndrome that sneaks up on us all from time to time.
I know too many too scared to make changes or too arrogant to own the mistakes they made.
Matazat@reddit
Hit em with the Shaggy. Wasn't me.
lutiana@reddit
Right, so give yourself \~30 minutes to feel like an idiot, take a drink of water, then go to your boss and apologize for it, tell him what you did, and ask him how you can avoid that in the future. Basically show that you can and are willing to learn from the mistake and take ownership for it.
That said, I'd probably trace down exactly what is plugged into that port and verify that the label is correct and the things are plugged in correctly (based on the label and description you give, it sounds like there may actually be a physical loop there somehow), and if not, then add it to the report you give your boss and work out how it should be labelled.
Again, take the initiative to better yourself, learn from this mistake and move on. If you can do this, then I'd say you'd probably have dug yourself out of the hole you're in by this time tomorrow.
axonxorz@reddit
This was a literal rite of passage at my old job. One division was a taxi fleet operation, so 24/7 was the uptime target.
New guy was with us for about a month before he inadvertently looped the local LAN. He was sweating a bit until I congratulated him. Hard to feel bad for the business when the extremely marginal cost of an STP-supporting switch was just too much.
HotMuffin12@reddit
I once accidentally wiped a whole RAID 5 array, 3TB of corporate data gone. Did I feel upset? Yes. Did I lose my job? No. Did I learn? Yes. Did I wet myself? Yes!
Chupacabraj182@reddit
Welcome to the club brother!
Deweyoxberg@reddit
1 - You are human. This too, will pass.
2 - Own it. Yes, a mistake was made. What did you do to resolve it, and what is your plan for prevention in the future.
3 - Hopefully you have change management. If not, now is a good time to start having those conversations.
4 - You never forget your first big oops. It keeps you nimble and humble. Use it as fuel.
Tonight? You get completely trashed/party it out/whatever you gotta do to "shake it off" as Swifty would say.
Then you get right back on that horse, but with knew knowledge: You *WILL* screw up again. It is inevitable. HOW you handle it will separate you from the rest.
Long time ago, in my first real tech job, I was re-wiring the main patch room. Had it all documented, everything was precise and pristine. My last two cables I had to plug in... I accidentally had them switched with each other. Instant outage. Now? I triple check everything.
You'll be fine. The brain gremlins telling you all the negative crap is the real challenge. ❤️
valenx@reddit
Welcome to the club, don't worry.. it's a right of passage
slayer991@reddit
You learned a lesson. Triple-check any work on production before committing.
20+ years ago we had a server reboot script that pulled from a CSV. I had about 5 servers approved to be rebooted as part of a mid-day change. I edited the csv and then ran the script.
The problem: I didn't save the CSV file....and the server list? Was all of our servers. I ctrl-C'ed out after the first 50 rebooted.
I went and told my boss...and owned it. All the calls were directed to me (after I made sure the apps came back up).
He asked me if I learned a lesson. Yes, I did. I double and triple check everything before committing.
lateralUnilateral@reddit
Deep breath, you will be fine! I think most of us have learned the hard way about write mem first lol. Just don't hide the mistake, own it and put processes in place to prevent it from happening again.
Some_Objective_5783@reddit
At this stage of your career it’s your bosses problem more than yours, and he should be expecting you to make mistakes like this if he’s letting you configure ports on the fly with no change control, don’t beat yourself up it’s happened to everyone at some stage
InspectorGadget76@reddit
Own it. Move on. Crap happens sometimes.
A good boss will think the same.
FelisCantabrigiensis@reddit
ah yes, "AI" being useful.
Meanwhile, have a think about how it went wrong and what could have helped stop it. Wrong labels on the ports? Better docs? Safer order of operations?
It will help you become more skilled, and if anyone comes asking then you can say you have thought about it and realised that these things be improved.
I assume you have a regular one to one meeting with your boss to discuss current work, progress, future plans, any impediments, etc. That's a good time to go over it and you can say you made a mistake and this is how you think it can be avoided in future.
gzr4dr@reddit
You already have a fair amount of feedback but I'll add a couple of items. One, it has happened to us all. Two, take ownership of what happened and conduct an RCA. Don't grovel, but do accept that you made a mistake and highlight to your boss what went wrong and what you're going to do differently to ensure this doesn't happen again. Updating and sharing process docs goes a long way.
One of my orgs did short learning moments at the beginning of large meetings. This would make a good one, assuming it's in the culture of the org.
SemiDiSole@reddit
You create a write-up, apologize and move on.
It happens, the important thing is that you learn from it.
MagicBoyUK@reddit
There's a reason we don't make changes on a Friday afternoon!
Gummyrabbit@reddit
It happens. Our change control requires us to show our back out plan if things go south. I’m also a pessimist, so I always try my changes on a test system if possible. Also, if you’re new, get someone on your team to validate what you’re about to do.
fruymen@reddit
Shit happens.
Learn from it.
Maybe write down what you did, where it went wrong and what you can do next time to not make the same fault.
And be ready to make another fault.
No one is fallible,
Jwatts1113@reddit
Did you know that if you deploy a software package that accidently contains a reg key with the computer name, you can successfully rename multiple computers to the exact same name. And you can't undo the deployment, cause all the computers that go the package all think they are the same machine.
Don't feel bad, the scars are part of the job.
Big_Ad8785@reddit
Don't take personal, mistakes has nothing to do with your ability and proficiency. I've stopped a production firewall with a mistake once, first day as a system/network engineer I applied a rule on the wrong direction. Shit happens.
Ok-Measurement-1575@reddit
That moment when your rdp session mysteriously drops and you briefly entertain the possibility that it's someone elses fault despite sitting in the DC, on a step ladder, consoled into a switch, on Friday.
Then your hear the main door bleep open behind you followed by, "Are you working on anything at the moment...?"
newbies13@reddit
Everyone does it, if you talk to an IT person who hasn't broken something huge, they have never had enough experience. Don't repeat it and you're good.
Junior-Spring-5557@reddit
Welcome to the club! I still have emotional scars from when my account was compromised in 1998 as a Jr. Sysadmin. I'm still doing sysadmin stuff nearly 30y later. Four years ago, one of our electricians accidentally turned off our 5MW datacenter (Off button & on button look very similar, and he was wearing an asbestos-lined helmet, suit & visor)
It's fine to feel ashamed after a fail like this-- that simply means you have a conscience. Imagine if someone did that and *did't* feel some shame--- that's more troubling.
Hopefully your employer isn't harsh on you. Shit happens, especially for complicated systems, especially under time pressure.
Some of the learning happens through pain. Everyone in this industry worth a salt knows that. If they don't, then it's a red flag.
Soccerlous@reddit
Any steaming says they haven’t taken down production at some point is a liar.
Own it a d learn from it.
Worst thing is to lie about it to save face. You WILL get found out.
ScroogeMcDuckFace2@reddit
it happens to everyone. learn from it and keep going
ThrowAwayTheTeaBag@reddit
Best thing you can do is 100% own your fuck up, document the issue for both work and yourself, and move forward. This might be the first, but it's not the last. The only that you could make it worse is if you started dropping excuses or dodging responsibility.
Zer0C00L321@reddit
I'm having that kinda day too. I'm gonna go home, kiss my wife, hug my son, drink a beer and pretend my job doesn't exist until Monday.
kissassforliving@reddit
I deleted a VLAN from a stack once….a production VLAN. It happens.
simonjakeevan@reddit
Own it. Fix it. Move on. Then quit making changes on Friday
lescompa@reddit
Happens because we are human. Own it, learn from it.
Background_Lemon_981@reddit
Yeah, just wait until you break networking enough that your cluster hosts start fencing and all your hosts restart again and again while you try to wrestle everything under control. Forget DNS. Your DCs are all restarting. You have to pull IPs from memory. Streeesssssssssssss.
PhillyGuitar_Dude@reddit
welcome to the show. We've all made a blunder like this in the early years of a career, (heck, sometimes later too!). Stressful for sure, but now you know.
kozak_@reddit
30 min outage? Not bad
Warrlock608@reddit
This isnt that bad, take responsibility for the mistake and move on it happens.
SGT3386@reddit
Tip an older boss gave me. You're allowed to fuck up once on a particular thing. Own the mistake as others have said, do your own RCA and present it as a means of ownership and shows lessons learned. Then move on.
It's when you fuck up twice with the same issue is when you'll have problems and trust is broken.
countsachot@reddit
That doesn't seem so bad honestly. No data lost, 30 minutes isn't the worst. Learn and don't beat yourself up too harsh.
Hcaz_Hcaz@reddit
I wouldn't sweat it. Everyone makes mistakes and that isn't as big of one as I have seen others make. We had a guy push an update out to switches for all sites remotely. Two other people were supposed to verify that update prior to push along with him. Needless to say, nobody looked it over. All switches were wiped once it was pushed out and the network was 100% down.
Same job, someone managed to delete the entire AD in the middle of the week ... that was a fun one.
I have made mistakes as well, just learn and continue on. If the company doesn't value enough to realize mistakes happen then you are better off elsewhere.
Wish the best for you!
ThatBCHGuy@reddit
Tbh, this is why change control exists. At the very least, everyone is informed that a change is occurring and exactly what that change is. That way, if things go south it's easy to pinpoint what it is, and, the actual change has been validated by more than just yourself ideally.
The-Greatness@reddit
Congrats, you are one of us now. If you don’t cause the occasional outage, are you even a sysadmin?
LiteratureThat4566@reddit
If you're not breaking things you're not working hard enough. It happens, people make mistakes. Learn from it and own it (which it seems like you did) and move on. I once reset the wrong switch and caused an office wide outage. Just keep moving and keep learning and if your boss gives you too much shit then they are not a great boss. Everyone in IT has caused an outage at some point.
highlord_fox@reddit
It happens. People make mistakes. What is important is that you learn from that mistake, and focus on how to prevent it going forward.
Was the port mislabelled? Is documentation up to date? Is there a second check you can run against ports? Do a root cause analysis on the event: Yes, you setting those settings caused the issue, but what lead you down the wrong path?
Don't feel bad, we all make mistakes sometimes. Own up to it and make your future better. At least you didn't accidentally press the circuit breaker reset for the wrong set of outlets on a UPS that powers the entire networking stack, who would do something like that I swear.
gethelptdavid@reddit
You threw an interception, it didn’t lose you the game. Get back on the field and win.
StandaloneCplx@reddit
Confidence is not a great engineer trait, you should take this incident for what it is, a learning lesson. Everyone make mistake, the difference is how you process that mistake and use it to improve yourself. Use this as an opportunity to learn more on the various basic issues/risks in the various domains on which you operate and that will help your grow and improve your modus operandi and competences. Also could be a good opportunity to setup lab network for training/mop validation
Ok-Mix9280@reddit
In your position for a year & you're just now bringing things down? Humble brag - you've just learned an unforgettable lesson is all & the chances of you repeating the error has now gone to nearly 0%.
thebigshoe247@reddit
That's life. You made a mistake. You owned it and you learned from it. As long as you don't continue to make the exact same mistake moving forward, there's nothing to feel guilty about.
dai_webb@reddit
Yes it is stressful, and embarrassing, but you’ll learn a lot from it, and will probably never do it again!