How do you manage product maintenance?
Posted by JamesRigoberto@reddit | ExperiencedDevs | View on Reddit | 37 comments
TLDR; how do teams which focus on maintenance plan and manage their wor
We have one core product which has transitioned to maintenance. About two thirds of our tasks are maintenance related to this core product, be it for production or other environments. There is still some development going on for this core product and other internal an small applications. However, management and product teams still plan as if development is our main focus. We follow scrum framework. Maintenance is mostly done ad hoc. The team knows that we will have a lot of requests but those are usually tackled by volunteers. This creates chaos and we are having a hard time getting away for the maintenance work load. It seems to keep increasing.
Our goal would be to be relieved from most of the maintenance and go back to development but until we get there, we need to improve our planning and structure. How do you manage such work?
titpetric@reddit
Some devs thrive in maintenance and upkeep, also covering missing architecture docs, testing and other things which may have been ad hoc.
With scrum, if the focus is still on development and other stuff, designate a maintainer, and designate a cohort, and eject them from scrum.
Track the effort with kanban, after a reasonable discovery/planning session. The point is to action the bulk of the work and let the maintainer prioritize the issues, or just pick out of a bag and work towards inbox zero. With time needed work decreases and they can rejoin scrum or move to other projects.
Give people that care the bag and just track the ticket status for the effort; good planning makes the bulk of the work known beforehand, just like any kind of development. SCRUM should die tho.
JamesRigoberto@reddit (OP)
What do you use instead of agile?
It is not in my power to replace agile. I see the benefits of it, specially for our product since we have been a small start up until the last few months. Nowadays although development power is the same, the user base has increased a lot. And I have been wondering if it makes sense to leave agile in favor of other methods.
Unfortunately I don't have experience with any other methods and I don't have a background in management.
titpetric@reddit
Self organizing within a team is a power move, so basically it's Kanban. You wouldn't believe how far a google sheets doc goes 🤣 a notable public example is the guy who has gumroad, where he just publishes a new todo list for each quarter and lets people do the work without micromanaging, or even just managing I suppose. High trust orgs are a good place to be.
Usually I start with a design doc similar to a proposal/tender, listing all the things that I feel need doing, which usually ends up being 3-5 concerns, with goals and deliverables, and then it's up to planning individual work tasks and executing in a set order. I either log work done or break down work needed to be done onto a kanban board, then we adjust every two weeks.
The goal is inbox zero for everything, not week to week or sprint to sprint. Work may get refined as we go, which is superior to scrum refinements, where you may pick up a ticket which has been refined months or even years ahead of the work.
I tend to reference a mark rendel talk when it comes to "enterprise" software, he beautifully explains how subordinated agile is in that whole process. There are other arguments like scrum drives devs to burnout, and I tentatively propose if you have ownership/autonomy, the chances of burnout decrease substantially.
https://youtu.be/Y9clBHENy4Q?si=jD4Nzv3asBP38vjc
JamesRigoberto@reddit (OP)
I will take a look at the video.
On our organization there are plans and documents in some cases too detailed. But it feel like we have a plan until reality hits us.
The organization used to put a lot of trust on us, but I feel this trust is decreasing. We don't enjoy as much freedom anymore. Also a lot of technical decisions are not made by engineers but by product people, who do not have an engineering background. On the other hand, they trust us to resolve incidents or deal with bugs. I guess since they don't have the knowledge or power to do so and it is their only possibility.
I don't have a feeling of ownership for the latest features, were I didn't have a say on decisions or design.
Also I feel our autonomy is been reduced. Lately any changes require of many meetings.
titpetric@reddit
You are literally describing a low trust env where SCRUM inevitably thrives. Not my cup of tea, sorry you're in that.
JamesRigoberto@reddit (OP)
Not my cup of tea either. The environment has degraded a lot. I believe that management has not realised and they will be willing to change. I will give them some months if no progress is done it is time to jump ships.
Antique-Stand-4920@reddit
Product managers need to prioritize maintenance work along with feature development. Then maintenance can be tracked and managed like anything else.
Convincing Product can be the hard part. Here are several things to consider:
- Let things fail. Failure hurts, and people are more willing to listen when they are hurting.
- Product often needs to plan things a year or more in advance. They assume and want predictability. It would be helpful if you can convince them that maintenance helps ensure their forecasts remain accurate by preventing or fixing things that would ruin their plans. In other words, if Product wants more predictability, they need to actively remove unpredictability.
- If Product has a great idea and wants to move quickly on it, the engineering team won't be able to do so if they are stuck dealing preventable production issues.
JamesRigoberto@reddit (OP)
Those are points. I will try to raise them.
I am really considering the let things fail. It is hard though, because I like to do things to my best.
I am having hard time understanding how much predictability product wants. We do spend some time doing poker sessions and we try to be realistic. However there is no tracking for all the maintenance and they don't seem keen to improve it.
DeterminedQuokka@reddit
I think something causing that many tickets is mislabeled something actually in maintenance mode should be a lot lower key than that.
I would personally point to it as unstable instead and ask for the time to fix it with the argument that it will free up time in the future being used to add duct tape
JamesRigoberto@reddit (OP)
One of the reasons for the mislabeling is the lack of knowledge of the platform. There is no users manual or good documentation.
Also anybody can report a bug and even the people who is in first line filtering bugs lack platform knowledge. So it is not rare that during fixing there is a lot of back and forth just arguing how things should behave.
DeterminedQuokka@reddit
If you are getting bad tickets I recommend creating an FAQ for it. You shouldn’t need to have the same conversation twice. You add it to the faq the first time and then you send people the link to the doc.
JamesRigoberto@reddit (OP)
I agree but somehow not everyone is on board on spending time on documentation. Currently I spend more time on meetings repeating myself over and over than anything else.
DeterminedQuokka@reddit
Take your meeting notes while speaking into a doc. The point here isn’t getting everyone to do it. It’s making it so you say things once.
templar4522@reddit
Personally I think scrum isn't the right fit when bugfixing, customer support and similar things are the majority of the workload. A simpler approach is easier. You are not bound by sprints anymore. You can still set intervals for retrospectives and maybe planning.
If you are stuck with scrum, you need to weight how much of the capacity you can reserve for old bugs in the backlog, how much to reserve for emergencies if urgent tickets coming through is a normal occurrence, and just make peace with the fact that the sprint can and will be derailed if a big enough issue is going to pop up.
Also, shorter sprints (1 week) can help be more flexible.
Another thing to have in place is a good triaging system for the tickets coming from customer support. Someone needs to evaluate the severity of bugs and issues, clean up the ticket so it has all a dev needs to pick it up, and eliminate non issues. QA or rotating devs could do the job, assisted by a product owner or whoever can fill that role and call the shots when it isn't clear what the expected behavior is.
JamesRigoberto@reddit (OP)
What would you recommend to replace scrum with?
The issues triage is very poor. There is no QA team. QA is done by product and the team is small. Usually product is the first line and then there is a dev. Sometimes the person from product is busy and sales pushes the issues. But we have no low priority bugs. For this year management created a KPI to tackle bugs within less than 3 days and product pushes for that. So everything is disrupted.
templar4522@reddit
Just simple kanban. But in your scenario you are still working on new features. So I'm not sure it would fit either.
From what you describe, you are in a situation where "everything is urgent", and the people in charge are resistant to change.
First, I'd set up some sort of triaging. You want to have a standard for what a bug ticket should look like. Steps to reproduce, expected behavior, maybe screenshots, and possibly other useful information that is contextual to your product or customer (e.g. configuration / feature flags).
Then the tricky part is having those that fill in the tickets to follow this standard. If this can't be achieved, you need the person in charge of triage to chase and pester the person so it provides the information needed.
Speaking of who's doing the triage, I'd opt for a developer. Make it a rotating position so everyone is responsible and nobody feels like he got the worst job.
What you want at the end of triaging an issue, is a well written ticket that is actionable, and having made sure that there is in fact an issue. "Can't reproduce" could be enough of a condition to close the ticket as a false positive, but maybe you can't see the issue because the config is wrong or the client forgot to renew a service. Talking to the person filing the bug report is a good idea.
After triaging, which ideally should also classify the bug by severity (you can look up bug severity scales on google), you should push the triaged ticket for prioritisation by product.
Given your situation, I'd just push product to decide "right now, this sprint, next sprint, later". Concrete options are easier than a vague "prioritise".
This achieves several things, that hopefully align with product's interest too.
First it gives a tool to protect at least partially your sprint.
Second, it gives some control back to product, in the form of controlling what goes in and out of the sprint. Coincidentally it also protects dev from unwarranted flak from those who feel neglected.
Third, you can include some of the non urgent issues in the planning.
You should still account for some time dedicated to deal with "right now" and "this sprint" issues.
Product might be your ally. Aren't issues disrupting the delivery of new features? Talk to them, maybe they can get on board and advocate for change too.
JamesRigoberto@reddit (OP)
Those are some good points. There is no way to eliminate the 3 day policy at this time of the year.
With a bit of luck we could convince management to update things for next year.
besseddrest@reddit
this feels like there is a much deeper issue in the code that has historically band-aided; instead of like, root causing this and giving the attention it needs. Which sounds about right for a possibly legacy e product and maybe it is a good idea to like... pause on adding anything new. Maybe it's bleeding the company a bit more money now
It's hard to say w/o much context but i will say that sometimes the backlog could use a good grooming. Sometimes when you come across something that can be fixed, i just do it. Sometimes things don't need tix, pointing, and scheduling, maybe you know what the fix is and u just take care of it
JamesRigoberto@reddit (OP)
There is a code/product quality issue definitely. The platform has become huge and there is quite some complexity into it. There is no single person who is aware about all functionality and there is little, almost not documentation at all. Decisions tend to not be recorded and quite often are changed.
We try to implement new tests but it seems that management is more focus on pushing next feature or next bug. I would say that test cover 30-40% of the code.
besseddrest@reddit
That's tough. Honestly might be a good idea for new development to have strict code coverage policies. for now I would power through it, but focus on finding that deeper, more critical issue. You never know, the fix could bubbble up and address a bunch of lower priority bugs
JamesRigoberto@reddit (OP)
We have requested few times that each new feature should have a requirement to improve test coverage. So when pokering, planning and reviewing we allocate time for this. However it has always been denied.
besseddrest@reddit
in lieu of what, taking on more tasks?
JamesRigoberto@reddit (OP)
Yes, more tasks, which in their minds seem to mean more features.
besseddrest@reddit
honestly someone needs to put their foot down and that somenoe should be your manager, sonuds like they aren't protecting you
every once in a while we would have a 'bug fix week' where the engineers just try to knock out as many tickets in the backlog.
it's pretty helpful because intiially you do a bit of grooming and u find a lot of tickets that are no longer applicable
JamesRigoberto@reddit (OP)
We have done those in the past. But also in the past there was no pressure on maintenance and issues piled up.
Because management don't what that to happen again at the same time they want new features, we find ourselves on a constant firefighting mode.
besseddrest@reddit
i think the devs need to get together to find some statistical evidence to prove to management/product that the ongoing problem is going to worsen and if possible show that ignoring these issues might actually be costing your company $$.
E.g. a task taking longer cause you have to work around a known issue
but that's a minor example and it would need to be blatantly obvious that these issues,if anything, prevent the team from delivering quality code
JamesRigoberto@reddit (OP)
I have tried making that point, but since we don't track maintenance to reduce the overhead it is difficult to come up with hard numbers.
spelunker@reddit
If your product truly is in “maintenance mode”, that should be enough to tell people “no” for feature requests. If they have a problem, tell them to go get you funding (or however your company allocates things).
JamesRigoberto@reddit (OP)
Good point. You made me see that product is not in maintenance mode, but the team is.
johnpeters42@reddit
Initial questions:
Why is the maintenance workload increasing? Are you (say) running into five issues in a month, only allocating enough manpower to resolve three, then the next month you have five new issues plus two carried over from the previous month? Or do you run into five, resolve five, then the next month you have seven new issues? If it's the latter, then why is the number of new issues per month increasing? How many issues are basically "repeat the fix from previous issue X"? How many of those can be made self-repairing, or at least self-reporting?
Have you explained to management and product teams that the product requires more maintenance manpower to remain viable? If so, then what did they say?
JamesRigoberto@reddit (OP)
From the point of view of planning we do not allocate any manpower to maintenance. We try to fix issues within 5 days since they arise. If they cause critical problems or block people, we try to fix them within a day. We don't let the bug bucket grow. I see multiple reasons to why the number increased, to name a few, no proper testing, no proper planning (some of the issues are users want a feature to behave differently), not enough features (some issues are manually updating data, we are starting to script this but still developers have to run the scripts). There have been domains in which the same issue has appeared multiple times. Although not many issues could self-heal, some of them could benefit from an auto reboot.
We have tried talking to management and product, unfortunately the company is becoming too hierarchical and don't listen to lower levels.
Usually there is no pressure to release features, so I don't see the need to bump development estimates. For months it has been the case that not many new features are released or planned.
johnpeters42@reddit
So where is management/product directing their "mainly development" mandate? On other projects? Whatever it is, the idea is that you can plan for a certain amount of maintenance behind the scenes, regardless of what they plan and what you report back to them.
JamesRigoberto@reddit (OP)
The process quarter process usually is like this: at the beginning of the quarter product plans in X epics. From those epics only 1 or 2 would be completely defined and ready to work on. At the sprint planning there is little room to introduce new stories since not enough of them have been finished. So features are always delayed and postponed.
Product and management don't consider maintenance when planning. There is no proper track of how many hours we spend doing those tasks. And sometimes to change one line of code you have to spend hours of meetings to gather what the user wants and how the product is meant to behave.
You are correct the idea is that we do maintenance behind the scenes. The problem is that this is not behind the scenes any longer and takes us most of our time. Also there is no proper tracking of all of this maintenance work.
Comprehensive-Pea812@reddit
core company mistake, not budget for maintenance or less budget.
they should at least budget for 50-50 and an unused maintenance budget can be used to improve the quality of delivery.
company being stingy with maintenance budget will see their system root and people are trying to leave
JamesRigoberto@reddit (OP)
Unfortunately the team is very small. We are 5 developers to manage a large number of services. Each sprint one of us will be in filtering issues and solving as many as he can. And another person is in charge of the release pipeline. When on those roles you have some spare time, but not enough time as to focus on other tasks.
We have asked to hire somebody to increase the automation of the QA procces. But growth is been more vertical now. They are looking for managers.
mxdx-@reddit
Once an application has been live for a while, issues inevitably surface, be it CVEs, deprecated libraries, or necessary updates. Unless there's a dedicated maintenance team, these tasks should be handled like any other sprint work: tracked as tickets, added to the backlog, refined, and prioritized. A good rule of thumb is to reserve a portion of sprint capacity for maintenance. Over time, this reduces tech debt and makes such work more occasional than recurring.
JamesRigoberto@reddit (OP)
We do have those common issues and most of the times they are pushed to be resolved within days, so not much planning there, just putting current work aside and dealing with them as they come.
We usually have 1 or 2 story points per sprint for refactoring. But I feel is not enough.