How to emotionally be content with outages or issues
Posted by makeevolution@reddit | ExperiencedDevs | View on Reddit | 25 comments
I have about 3 years of experience, and now joining a new team. In my team we are responsible for client facing features, and we have superhero duties e.g. rotation of who first responds at outages/issues. But I am always afraid of it. It distracts me from what I am currently doing, I'm not sure whether I can figure out what's wrong, especially since I'm the only few backenders in the team so I feel I need to be able to solve it. I feel pressured somehow I don't know why. Perhaps since the impact is huge and the time pressure is high?
I now obsessive compulsively check our superhero duty channels for new messages, even if it is not my current duty. It is tiring to be honest. I am also gaining seniority so I feel like I am expected to respond and pick up these issues even before the person on duty does. I think this is also because in my team there's another person who is super proactive and does this, and he gains a lot of visibility; he is on track to becoming a staff engineer due to this. I feel now I need to be at that level to be able to progress in this place. That level of proactiveness and self-confidence, it's something I fear yet need and lack.
How can I handle this emotionally? I do like the job, the people are kind and friendly, and tech stack is very modern. I am learning new things everyday, especially AI related stuff. It's just this ad hoc aspect of the job that I can't seem to be at peace with. But I mean, whatever software dev job I do, this kind of issues will always be there right, especially as you gain seniority and expected to be more proactive? So whereever I go this will always be there and thus I need to grow up. Any advice would be appreciated.
drnullpointer@reddit
I think you need to learn about stoicism.
Stoicism does not mean you don't care about things. But you don't get those things to get to you because you only take responsibility (internally) for your own actions and inactions with regards to things you actually have in your control.
Things may be failing around me, but I take pride that within this chaos I do every day the best thing I can to improve the project.
Responsible_Sir_7423@reddit
Outages stop being emotionally destabilizing when you separate your identity from the system’s behavior. The system failed, not you. You’re the person working the problem.
The mental shift that helped: my job during an outage isn’t to have prevented it, it’s to resolve it clearly and communicate what’s happening to the right people. Focusing on the communication task in front of me killed the spiral.
csgirl1997@reddit
Being overly dedicated to solving issues for your team is dangerous for both you and your company.
As someone who was you, read this: https://sre.google/resources/practices-and-processes/no-heroes/
but_why_n0t@reddit
Stop idolizing the staff engineer. He is excelling at something he likely enjoys doing, you need to find something YOU are passionate about.
belkh@reddit
superhero sounds like a terrible terrible name for oncall engineer.
your job as oncall is very simple, it's rarely hotfixing things on your own.
Monitor, review, mitigate by rolling back, if that's not enough escalate because the situation is fucked anyway.
what's the actual oncall description at your job and do you have runbooks on what to do?
Animostas@reddit
My philosophy is kind of similar, if I'm paged and I'm not immediately sure what to do then it's all fucked anyway so how much worse can it be
makeevolution@reddit (OP)
Yes there are runbooks/instructions, but it's just that I feel the need to always jump in to be able to gain visibility and get a promotion/raise, although in the job role description/metrics it is not mentioned at all that I need to do such a thing (put out all fires). I don't know, it's just this weird thing I have in me.
And what is frustrating is that when the fire itself is something I can't prevent to happen again in the future e.g. dependencies on other teams who don't want to change, etc.
QuietSea@reddit
When you're on-call, especially with only 3 YOE, you're a first responder and not a trauma surgeon. Follow the runbooks and escalate as expected to. Do you guys follow an RCA template after outages occur?
tcpukl@reddit
It should be about triaging the issue for the culprit to fix.
BoBoBearDev@reddit
Study corgi. They are fine example of how you can be happy and capable and recognizable without appearing to be the most qualifying guy for the job.
BoBoBearDev@reddit
Find your own niche, stop comparing. That other guy found a way to be more visible, you can find your own way.
Fir example, if you saw someone making big money in a retail store, you don't open another retial store next to it, you sell ice creams when those people come out of the stores.
Kaimito1@reddit
You feel like you're expected but I'd bet if you asked your lead he'd say otherwise.
I used to have the same issue (without that proactive dev tbf) of looking at support tickets when it wasn't my duty for that day.
The core thing that helped me is "trust that the person on duty can handle it. And if they can't, trust that they will call for help".
Keeping that in mind helps me not worry too much about it
Early_Rooster7579@reddit
Unless your work results in lives saved/lost who cares? So what someone cant do a CRUD for a bit. Money might be lost, someone may send an angry email but life goes on.
Empanatacion@reddit
The professional advice is easy: Do not look at your oncall alerts unless you are oncall. Most staff engineers got there without being "super heroes" (which is a fucked up way to refer to that for more than one reason).
But that doesn't fix the anxiety you're feeling, which isn't really because of your job. You need therapy more than you need professional advice.
makeevolution@reddit (OP)
Yea it's just that with this guy being so proactive I'm afraid he is setting a standard so high that for the other devs who are not as keen as him, it will be harder for us to gain visibility/be rewarded for our efforts. Although there are no signs of that as of now, as humans bias is obviously there and this may happen in the management level.
There are playbooks/steps on what to do during outages, but it's just that this guy's proactiveness that is concerning to me.
And tbh I do see that he is not doing this for the promotion, he just really likes his job I can see it he is genuine. So it's really not fair for me to put him in a bad light and say he is taking advantage of all of us in the team.
brrnr@reddit
Do you have 1:1s with your manager? Those should be the place you discuss career growth and actionable things to do to advance your career. This nebulous game of visibility chasing is exactly the type of thing that leads to excessive uncredited work and burnout.
ramenjosh@reddit
Calling it "superhero duties" is a bit of a red flag in the first place tbh. The best on-call rosters are ones where the team has actively invested in making it really simple for anyone on the roster to a) triage what's going wrong, b) mitigate the issue at hand (e.g. by rolling back a release or scaling up capacity) and c) escalate up the chain if needed.
How would you rate your team on those fronts? Do you have runbooks/tooling to help with you do those things, and are they easy to follow/use? If the answer is that they're not great and you have people like future-staff-eng being rewarded for just putting out fires, then you don't really have a healthy on-call environment and it's not really your fault that it's stressful.
makeevolution@reddit (OP)
They call it "super hero" just to have fun with names but it really is your usual on call system.
Yes there are runbooks and processes, it's not really about the process but rather the stress itself and this pressure that I somehow put to myself to proactively respond and put out fires, even when it's not my turn. To be honest there are clear metrics documented in what the expectations are for each job role in this company and there is nowhere mentioned where I need to put out every fire in order to get promoted/salary raise, but it's just that somehow I feel the need to fix all issues to gain visibility
Blue-Phoenix23@reddit
Hmm, I'm positive that whoever designed the rota for on-call didn't intend for it to be a distraction when you're not the one on-call, although whichever jackass named it "superhero" probably did. Do you actually even WANT to be a staff engineer at a place that has so many production issues, they have their developers constantly fighting fires?
This organization sounds like a mess. The reality is it's a disaster for deep focus to keep context-switching like this. If you want to progress your skills, review the queue for recent fixes in your down time if you'd like, to see how things were done, but jumping in randomly because you feel like you should won't let you progress your actual dev skills. And it certainly won't let you (or anybody) focus enough to improve code quality to prevent all this nonsense in the first place.
Marceltellaamo@reddit
The biggest trap here is looking at the 'super proactive' guy and thinking his self-confidence comes from knowing how to fix every bug instantly. It doesn't. True seniority isn't about having all the answers memorized; it's about staying calm, knowing how to triage, communicating the blast radius to stakeholders, and knowing who to pull into the room. You don't have to be the hero who fixes it solo. Your job as first responder is simply to stop the bleeding and coordinate the surgery. Does your team actually have solid runbooks for these client-facing outages, or is the 'proactive' guy just hero-coding everything from memory and making everyone else feel inadequate?
pierre_lev@reddit
You need to learn and put boundaries. A therapist or some reading and meditating can help.
But yes boundaries are really important, your mind need to rest.
shinto29@reddit
This is it OP. And it might extend to parts of your personal life too, I know it did in my case. Therapy helped a lot, it’s just a job at the end of the day.
Venisol@reddit
No silver bullet, but it helps to realize that our jobs mostly dont matter, at all, to anyone.
If you work on a b2b software, that gets used by some other company, the actual user of the software didnt decide to buy it, their boss did, they cant cancel on you, they dont get fired, their salary doesnt get reduced, they just do something else, or dont. Literally nothing happened.
If you even bring down netflix for a day, nothing happens. So what people cant watch their shows? You are effecting actual people, but not in a very strong way. I sometimes run out of books and because I order them from the UK, unexpected delays happen. I dont care, I can handle it. If I couldnt, I would make sure to order earlier. If things are actual problems, people adapt.
Its also not "you". You didnt bring down your app, you just happened to be on call. Youre not responsible. Even if it was you, do you create more outages than other people? Even if you did, thats not solely your fault since I assume youre at most number 3 in your team. So every person above you signed off and took responsibilty on the code that you pushed into their code base.
If youre worried more directly about yourself, how many people at your company have been fired because of their on call performance? If its such a big issue and you have so much responsibilty, surely you are allowed to decide to take weeks or months to build counter measures and improve stability and uptime?
The person above you, who decided promotions probably doesnt think about it until like a week before. They got kids. Its also not their money, they got a budget from above. They also see maybe 10% of everyones work.
Its no reason to get demotivated or black pilled either. You can recognize these facts and still do a good job and be ambitious. You can check the on call channel and help out for strategic, ambitious reasons.
Also the guy you mention is not really confident or "good at on call", he is good at triaging bugs in your code base from stack traces, I would assume.
Its not a special skill, its just coding and experience. Its also something you get better at in specific code bases, if you saw some config bug take down prod 6 times already, its a lot easier. Especially if he has been on that project since the beginning.
Pleasant-Cellist-927@reddit
There are more ways to achieve staff engineer than giving up your personal life to stress over a codebase that isn't yours, and if that isn't true at your company then your company is shit.
If anything, I'd question why the fuck you'd have so many outages/issues that this guy is somehow gaining an entire job promotion off of being a first line responder, or how bad your on-call process and runbooks are that others aren't able to deal with even the simpler issues without him. What is your company planning to do when this guy leaves?
As for actionability on a personal level, the only thing I can tell you is that, if the codebase isn't yours and you're not being paid to care, then you really do need to stop caring. I realise this advice is on the level of 'you're homeless? just buy a home', but learning to let go of things that aren't actually yours is going to be key to your happiness at work. You compare yourself to another engineer, yet sounds like he's a Senior, so he's a level above you probably (since you say you're 'gaining seniority' but not explicitly that you're a Senior) and is heavily gunning for his next role, but make 0 mention of everyone else in the team, let alone department. Make 0 mention of how long he's been there, while you say you just joined. You need to start realising it's not a level playing field and his level of 'impressive' will not look like yours.
Frenzeski@reddit
As someone who has been doing this kind of work from day 1 (now 17 yoe) let me explain the problem with your thought process. You are experiencing a hero complex, you see there’s a need for urgent assistance and you want to help. Couple this with a reward system where this behaviour is rewarded. This is very very common and it leads to a couple of undesirable outcomes. first you will become burnt out, the stress isn’t sustainable and overtime the increased cortisone will impact your mood and sleep. Secondly it doesn’t provide a sustainable way to teach others or systematically improve the process.
I’ve been there, it’s challenging at first but you see it enough times and recognise the patterns you get used to it. That’s not helpful now because you haven’t gone through it yet, but just know it will get easier and know how to avoid the pitfalls.