Need thoughts on “firefighter” processes

Posted by MoveInteresting4334@reddit | ExperiencedDevs | View on Reddit | 12 comments

Hey all,

I’m working on a mission critical internal app at a huge international bank. Our application is used by about 150,000 bank employees. Every sprint someone is the firefighter, and takes on responsibility for initially handling any Prod incidents, as well as planning and executing deployments.

We recently finished a very rough re-write of the application (app is extremely complex, little to no design documentation, team didn’t have experience with new tech stack, etc.). The difficulties are a long story but suffice to say we are now in production and prod incidents are not a rare thing.

Since our go live, it’s a common thing for the firefighter to be so swamped that it’s all they can do to record and track incidents as they come in, let alone try to triage or even delegate them. Things were falling through the cracks or taking weeks to be addressed.

After my sprint as firefighter, it was so bad that I went to our scrum master and worked out a flow to distribute the workload a bit more and keep a single source of responsibility through the process. We ran this by the teams (devs and analysts) and management. We got buy in across the board after a few small tweaks. It goes something like this:

The overall idea being that it should only reach the firefighter dev if it’s a critical issue that the support team can’t handle.

This sprint is our first sprint implementing it and I’m the firefighter. It’s already falling to pieces. In our staff meeting today my manager and I had the following exchange:

Manager: there were some incident emails from users that weren’t responded to this morning. The firefighter needs to watch for those.

Me: with our new process, the analysts should be engaging with the users and pulling me in if necessary.

Manager: management won’t care who did or didn’t respond, only that nobody did. Processes are great, but there are always exceptions. If you see them not responding, you need to.

I dropped it since it was a group meeting and I don’t really disagree with her in principle but I have several issues with this. With this approach, there’s no accountability. If analysts don’t respond, it just falls on devs. There’s also no single responsible party. What happens if the analysts are busy and assume devs will respond, and devs are busy and assume analysts will respond? Wouldn’t it be wiser to work within the process, like forwarding the email to analysts and saying “hey, did you guys see this?”

How do your teams handle this? And any advice for getting teams to embrace accountability and organization when their norm has always been dumpsterfire/all hands on deck?