What is golden rule of doing a production patch?

[-]

JohnnyBlackRed@reddit

Are trying to solve the wrong problem?

You are trying to solve/stop the team to do adhoc releases, which they do for fixing bug and getting information to fix bugs.

IMHO there are two (un)related problems:
Fixing bugs should always be encouraged, the release cycle should provide for such a thing. How bit depends on the level and maturity of developers and tooling ( mainly tooling ). If you can make a change without or very small risks of breaking things due to good tooling ( testing, linting, static analyses etc ).

Missing insight into the application, they are adding debug logs in production code. This is a big no-no. This means there is a big lack of production tooling. There is no APM, no Log monitoring and or no good way to reproduce errors on local development ( or a testing environment ).

If you fix these problems the problematic adhoc release should go away, and developer productivity will go up

[-]

JWolf1672@reddit

It really depends on the system and the needs of the org/industry.

Where l work some of our stuff goes to prod almost immediately, others go weekly, some others have a lengthy release cycle that needs product owner buy-in and customer notice of downtime.

I don't think there should be any golden rule for when to release, there is too much nuance depending on what that system does and how it's been architected for a golden rule to exist without having a million caveats to it.

[-]

pugworthy@reddit

Never on a Friday

[-]

Personal-Sandwich-44@reddit

When possible, my team, and any team I've been on recently works towards a process of continuous delivery.

The faster you can write code and then ship it to an end user, the better, so with that mental model, "patching prod whenever" is entirely fine. What are the arguments against it?

[-]

MoveInteresting4334@reddit

Not OP and overall I agree with you, but my team is a notable exception. I work at a large bank on a mission critical application. Deployments to prod require a lot of approvals, verifications, etc. and we must inform business users of the upcoming deployment and possibility of an outtage. We need explicit rollback plans in place before the deployment and analysts on hand to immediately test what was deployed in Prod.

There’s just no way this song and dance could be done on a continuous release basis.

[-]

tech-bernie-bro-9000@reddit

same. also worth registering with your users that as a tech team you are doing work.

i don't think continuously shipping code really helps business users the way people thinks it does. like fix things and patch sure but the daily minor updates are mental overhead for people doing work on a clearly defined manual process- like what millions of front office bank workers do across big banks.

have always always been skeptical of people overly indexing on quantity of ships vs quality

one quality ship that actually addresses spec'd problems without bugs VS 30 ships to eventually fix a poor rushed initial ship.... ask your business lead which they prefer

[-]

owari69@reddit

I feel like you’re conflating continuous delivery of code with continuous tiny product changes by nitpicky/bored product and design folks.

We do quasi-CD on some of our apps, and the goal is really just to avoid letting a bunch of changes accumulate in the pre production environment, and to make our automated tests a reliable safeguard for releases.

Shipping a bunch of code at once makes it harder to troubleshoot bugs. So we ship small slices of code as they’re ready alongside robust testing (unit, integration, E2E), and then use feature flags to handle rollout to end users once the whole feature is ready.

[-]

Az4hiel@reddit

I don't want to be rude or something but that seems like a lack of imagination - there are myriad ways to lower the impact of a failed release - can't you deliver to 1% of users at a time? If that's still millions of dollars on the line how about 0.001? How about deploying to a single user at first (then 2, 4, 8, 16...). Can't you keep the old version of the app running in parallel to the old one - and have them switch transparently to the user on error (so worst case they wait 200 ms instead of 100 ms)?

Like I get it that if you are neck deep in tech debt (which is normal for banks) you probably have more pressing problems than setting up infra for all of this - but I wouldn't classify it as impossible by design.

There are also genuinely effective ways to handle compliance if you are able to actually cooperate with the legal team (this does require empathy and care on both sides so I do think it's indeed quite difficult).

I do work in Fintech too - not in the bank but banks use our products directly and we deploy multiple times a day.

[-]

MoveInteresting4334@reddit

Not sure if you’ve ever worked at a big bank but I appreciate the assumption that I or any other dev up and down the experience ladder has any impact or say on any of this whatsoever. I also have never met or spoken with anyone remotely related to the legal team and I’m not sure we even work in the same state.

[-]

Az4hiel@reddit

So is it impossible for your team to lower the impact of the failed release (and so move towards CD) or is it just workplace politics? If it's politics then I don't think that the term "notable exception" is justified or in fact in any way special - there are multiple poor working environments that make devs unproductive in some way and it's just one of them.

I don't suggest that you or any dev are responsible for the state of the company in any way - that would be naive - there are some modern banks though (at least in Europe where I am from) that do have more modern culture (at least on the teams I had the pleasure of working with).

Again I do not work in any bank directly and I see how you could disregard my opinion because of that - I did work for the last 8 years with bank products and on banking products (psd2 stuff lately) so take it as you will.

[-]

piecepaper@reddit

Dave Farley talks about to have been practising CD on a high speed exchange (fintech) Its possible. The goal by deploying more often the risk is smaller and much safer then big bang deployments.

[-]

MoveInteresting4334@reddit

Oh, no disagreements here. If it were up to me, we would establish stronger automated testing and rollbacks, and then move to a more continuous deployment. But that is all way, way above my pay grade. I couldn’t even tell you which parts are required by regulation and which are just wanted by management.

[-]

Personal-Sandwich-44@reddit

Totally agree and understand there are specific environments where you can't do CI, due to compliance or regulations or whatever.

But OP said none of those, so I got the feeling they were someone that was used to release cycles and thought that was the default, and any deviation from that is "bad", whereas I'm of the opposite opinion, so I was curious what specific arguments they had against it.

[-]

MoveInteresting4334@reddit

Absolutely. If you can, you should.

[-]

wrex1816@reddit

It devs randomly just ship to prod (and have the privileges to do it which is crazy) without reason, cause or any process or quality control, that's NOT what the process of Continuously Delivery means. All you're doing is what OPs team is doing except you're giving it a fancier name to make it sound ok.

Sign of a very immature team if it's full of cowboy devs who have never had to learn lessons from being burned. Been there, wouldn't do it again personally.

[-]

Personal-Sandwich-44@reddit

Huge jump to go from “we do CD” to “yolo patch prod”

We have code reviews, code owners, a solid test suite, e2e tests for critical areas that run on a prod canary before promoting, solid observability and an easy mechanic to rollback a code deployment.

[-]

wrex1816@reddit

Sure.

[-]

GammaGargoyle@reddit

Gating releases without a proven reason for doing so is a bad idea in most cases.

think your release process should be iterative. Release to prod as frequently as possible up until the point that you fail, then back off and implement the necessary minimum viable safeguards and cadence. Rinse and repeat until optimal.

[-]

Jumpy-Zone1130@reddit

Never ever ever do it on a Friday at 5PM

[-]

SpiritedEclair@reddit

What is wrong with continuous delivery?

[-]

Numb-02@reddit (OP)

How is this related to continous delivery?

Generally there is a release cycle. After a sprint, we deploy changes to production as per release schedule.

This is about not following release schedule but doing production patch on the go for non critical stuff.

[-]

LossPreventionGuy@reddit

continuous delivery means no release cycle. it goes when it's ready, no reason for code thats ready to go to wait til Thursday just because

[-]

Bingo-heeler@reddit

There's some merit to choosing the appropriate time to release, especially for "new" features.

Of course all of this depends on the environment, the specific software, the specific change, and how long until there is a more optimal window. The less your team controls the more slowly you need to move because you have to coordinate with others and compete with other priorities. if no one is waiting for the feature, there's no harm in pausing to ensure all your ducks are in line or you can have more support from other teams.

[-]

LossPreventionGuy@reddit

bad sign. your stuff should always be independent of other stuff.

feature flag the things that can't be turned on, but ship them.

[-]

DrShocker@reddit

When I left my last job they were starting to ask us to plan out the next 6 months of 2 week sprints and what we'll work on...

[-]

Numb-02@reddit (OP)

Oh okay. I have always worked in release cycle schedule.

It is new to me. That makes sense.

Sorry for questioning it and thank you.

[-]

TeachMeHowToThink@reddit

It was a total paradigm shift for me when I went from companies that do a release cycle to companies that release each PR as soon as it’s through QA. It’s such an objectively superior process it shocks me that there isn’t more talk about how outdated release cycles are.

[-]

coyoteazul2@reddit

I'd say it depends on how sensitive your product is. What happens if a patch introduces an error and you need to rollback? Rolling back a single release is easier than rolling back a bunch of them

Finding the error source gets harder too. The error won't necessarily manifest immediately after release, so you may have a couple of hotpatches applied after the one that actually introduced the error, which difficults finding the release that actually introduced the error. If patches are only applied once per sprint, you give more time to the error to show up and it's easier to know which release introduced the error.

[-]

TeachMeHowToThink@reddit

These are actually some of the strongest arguments I make in favor of incremental releases over batched releases. Assuming you have a decent logging/alarm system, any bugs introduced by a patch should be identified pretty quickly, and then you have a much smaller amount of code that needs to be investigated to determine the cause. It's not sequential - you don't need to also roll back all the other releases that have happened in between, just the one that was identified to be the cause.

[-]

squeasy_2202@reddit

And feature flags.

[-]

MoveInteresting4334@reddit

I’m not sure I follow you. First, you say that rolling back after a single release is easier than rolling back a bunch. I’d argue that rolling back a release with a single change is much easier than rolling back a release with a bunch of features.

Then you say that finding the error source gets harder because the bug might not manifest immediately on release. True, it might not, but why is a large scheduled release any better here? It also contains several different feature changes that you have to dig through to find the error.

It seems like the issues you mention with continuous release are also present in timed releases, but in many cases worse.

[-]

lordbrocktree1@reddit

Maybe rather than pushing against them doing continuous delivery (which doesn’t have anything to do with sprints by the way), introduce the idea of canary releases to prod. Things like istio will let you do a slow rollout to prod. Start with like 1% of traffic and if you aren’t getting bug reports, slowly migrate to 100% of traffic to the new release.

[-]

SpiritedEclair@reddit

I won’t repeat what the other commenter said. So the way we work is via trunk development. Everything we do gets merged into mainline and goes to prod.

So the way we do ci/cd is complicated but that’s because we deal with millions of customers using our platform concurrently (cloud provider).

The way we manage our content releases is via a) feature flags and b) staging.

By the time we are ready to start a content release, everything has been tested in beta and gamma environments.

When we enable feature flags, we have environments that actually test the changes. So our lower envs capture all states our upper envs can be in.

[-]

Jmc_da_boss@reddit

Do it often, do it fast, do it with small changes

[-]

RestInProcess@reddit

It all depends on the need for the patch and the risk to the end user. If the application is mission critical and they're risking bringing the whole thing down then they need to stop. If it's low risk and and not a critical application, then why worry about it?

Usually, this will be all determined by upper management and the first time there's an hour outage of a mission critical app then it'll be the last time, I'm sure.

When I got to my current job I was adamant that publishes should never be done on Friday because then stuff breaks over the weekend and it'll create a mess we don't want to deal with on Monday. My team doesn't care. Most of the time they patch, edit, and publish right to production. It drives me nuts. It just happens to work most of the time. There are lots of habits that need to change, but that aspect is fine as long as things are documented and communicated.

[-]

Esseratecades@reddit

It seems like there's some misalignment here.

Release cycles make sense in more risk averse environments because they force changes to go through the process, sometimes multiple times, to ensure safety.

However, frequent pushes to production make sense in less risk averse environments as they allow for faster value realization and make it easier to address bugs found in production.

But what's important is that there's still a process. Jim throwing his code in production just to get logs is acceptable if it goes through code review, testing, and CI/CD to get there. If Jim is literally bypassing these things to force his code into the production build, that's a problem.

A lot of people in this thread seem to think you're reacting to the former, but I've also been on teams where people have attempted the latter.

[-]

writebadcode@reddit

Frequent prod updates are usually a sign of either a very mature or very immature dev orgs. In the case of a mature org, there are safeguards and tooling in place like canary deploys and feature flags. In an immature org, devs just ssh in and deploy changes with few guardrails.

For the middle of those extremes, you often find practices similar to what you’re used to, release schedules, etc.

If there are safeguards in place that make it easy to detect and revert bugs, you’re probably dealing with the mature version, so no worries. If not, start advocating for some better safeguards and practices.

[-]

DowntownLizard@reddit

Unless you have downsides to continuous delivery like an app pool refresh in IIS because you dont have load balancing or something like that, I think it's the way to go. Have good logic tests that cover your bases, be smart about what you are pushing out, test what you push out obviously, and make use of feature flags or just leave the logic unreferenced or inaccessible.

It's how CI/CD is meant to be. Your main branch is what is deployed, and you integrate into it often to avoid merge contlicts, duplicated work, and long-lived branches. Faster feedback loop and easier code reviews. You catch issues earlier into the process rather than after the whole feature is built having to go: "It should have been done this way but too late unless you redo most of it".

[-]

inputwtf@reddit

The problem with doing a release cycle is that as the amount of changes "pile up" - it gets more stressful because you're always worrying about which change could be one that broke something, and it can get very very difficult to revert a change once it hits production, if subsequent changes relied on that change logically, as opposed to just fixing a merge conflict.

Releasing changes one at a time when they are complete, means that you can roll back a change and know that it's only that change that was the problem.

[-]

Dimencia@reddit

Golden Rule: Never do it on a Friday

Logging would be kinda OK to hotfix, because it's very low risk, but I would discourage it for mental health and precedent reasons. Hotfixing non-critical things can set an expectation that a lot of your work ends up having very tight time constraints, needing to be done as soon as possible, PR'd, and pushed to prod. Part of the advantage of a release cycle is giving the team more relaxed timelines, work doesn't need to be done until deploy time, and it can sit for a while and give the team time to review it carefully, think over it, and potentially improve it or fix bugs. Think about all the times you've been in the shower or whatever, not at work, and realized some code you wrote had a bug, or you come up with an idea on how to do it better, and consider that hotfixes never get those

[-]

Numb-02@reddit (OP)

I get you totally.

I have been part of such environment and mindset since start of my career.

Its so interesting to see there are people on both sides, one that follows CI/CD and sees release cycle as outdated stuff, where there are others like me as well who are just used to release cycle or never really work with continous deployment.

I guess the question that im asking will not make much sense to people that follow continous deployment and there argument is valid as well.

[-]

soundman32@reddit

Ha ha. We have that rule, yet we've just released to prod at 4pm on a Friday. I'm on a hugely popular commerce project, which would probably lose £2M over the weekend if it didn't work.

[-]

Dimencia@reddit

Yeah, there are always exceptions, unfortunately. But avoid it as much as you can, anyway

[-]

Revision2000@reddit

My last team had CI/CD in place; we merged to main, it was automatically rolled out to the environments until it ended up on production.

Feature flags were used to guard against “too early” or “incomplete” functionality being made available.

Since we would deploy all the time, we could fix and improve stuff all the time. No need to wait for a specific window, no need for “hot fixes”, no need for esoteric branch strategies. Also, usually no need to panic if we’d missed something.

At other clients I’ve had to deal with release windows. Those always cost way more time and cause way more headaches than the “problems” they solve, as you’re trying to coordinate completed functionality across multiple teams, while managing branches and patching up old versions still running on production. Sigh

[-]

badlcuk@reddit

Well, in my current role we cant for various reasons that may not apply to you. Let's pretend instead of calling this "patches to prod" you call this "daily releasing". Whats the problem the process is currently causing?

[-]

uno_in_particolare@reddit

I genuinely don't mean this as a snarky remark, everyone's experiences are different - but I'm really confused about this question

Continuous delivery in 2025 seems like a real no-brainer, akin to having unit tests or code reviews

Of course, there are exceptions, and it's totally valid to discuss the viability of CI/CD in a specific context... But I don't understand the surprise at a more "generic" level, or am I missing something?

[-]

Numb-02@reddit (OP)

I did not bad any such experience before so it's a new thing for me.

All the company I had worked before all had a process of release cycle and not continous deployment.

[-]

2fplus1@reddit

It kind of depends on what you mean by "patch". Are people just ssh-ing into a prod server and modifying files in place and restarting services manually? That's amateur hour. Please don't do that. If you haven't had a catastrophic outage yet, it's just a matter of time.

If by "patch" you mean that they commit code in small units, push it, and it runs through a proper deployment pipeline with tests, canary deploy, feature flags, good monitoring/observability, and automated rollbacks? In that case, yes, you can and in many cases should be doing that daily or more (my team averages 4-5 prod deploys per day per developer with effectively zero incidents). This is basic Continuous Delivery and there's plenty of data showing that this generally produces a better outcome than infrequent deployments in terms of both feature velocity and system stability. See the DORA report, Accelerate, David Farley's "Modern Software Engineering", etc.

[-]

throwaway_0x90@reddit

Hmm, I'm pretty sure patching production is something you only do in an emergency.

This team definitely isn't following best practices.

The pros/cons really come down to how reliable do you need production to be. If you don't care that some unforseen accidental bad patch could bring down PROD, then I guess nothing matters.

But if you worked at, say Amazon, and you did a quick patch to PROD to get more logs about an error and it turns out it logged too much too fast, consumed a bunch of resources and slowed/stalled PROD... you'd be fired.

[-]

Cautious_Implement17@reddit

I think you would be surprised at how little ceremony is involved in releasing changes at companies like amazon. new features that are visible to customers get tested before launch, but that can be totally decoupled from shipping code to prod.

a simple logging change could be shipped to prod the same day. if it caused an availability drop, it would be automatically rolled back.

[-]

kagato87@reddit

Oh hell no! Patching prod directly is asking if r trouble.

After the developer thinks it's good and merges the pipeline compiles and passes unit tests. Yes, we get failures at this step.

After the pipeline build succeeds QA has their go at it. Yes, they find issues.

After QA approves it, sever a mins get an RC and deploy it to a prod-test environment. This deployment is staged and deployed using production methods.

Once that is approved, patches are scheduled for prod. Prod is limited to Monday-Wednesday unless there is a blocker (so that if something does go wrong in prod we're still not violating "read only Friday").

Once upon a time, when I was still new to the company, someone tried to bypass the test steps. I resisted, hard, and eventually got ordered (in writing) to do it.

Let's just say the outcome gave me the leverage I needed to enforce the current policy with no exceptions. Two stages of testing every time. Yes, we can turn that around in a matter of hours in an emergency.

[-]

DamePants@reddit

Thank you for the reminder to get things in writing. I have a patch happy org and shit is rarely tested. Everyone thinks their change is fine, yet I’ve seen to many P1’s caused exactly by that mentality.

[-]

GlasnostBusters@reddit

Tuesday, after EOD.

This gives you 2 days prep and 3 days to fix release issues.

[-]

No_Stay_4583@reddit

Doing it right before the weekend?

[-]

geeeffwhy@reddit

i think if you have a well developed and well tested fallback strategy, this is all good. continuous delivery is a thing. as long as that process lets you detect and revert problematic versions, what’s the problem.

now, setting that up and making sure it’s working is an undertaking, and some of your development process should adjust to use things like feature flags. your deployment strategy should probably be blue/green, and so on, but it isn’t impossible to release patches with high frequency.

[-]

Jaryd7@reddit

As I see it, prod should only be touched for hotfixes and during the planned release cycle. For everything else you should have an dev environment where you can deploy whatever you want whenever you want, without influencing prod.

[-]

Northbank75@reddit

If it’s well tested before being pushed what is the problem? What is the problem that this is creating for you?

If I identify a bug that I can fix in sixty seconds and it doesn’t require any heavy lifting, telling everyone to wait for x date isn’t a thing …

[-]

i_exaggerated@reddit

What's a release cycle?

Do you all have good monitoring and automated/easy rollbacks? If so, patch what you need to.

[-]

davvblack@reddit

what does “patch” mean to you? how much of the process do they skip (eg automated test, pr review, lower env deployment, manual qa)?

it’s not bad inherently to deploy fast and often. how often are things breaking? the practical answer is you should produce features and value as fast as you can while breaking things at the perfect rate that your customers just barely don’t mind.

this is different for different stages of company. for startups with 6 months of runway and two paying customers, it is optimal to yolo.