Beyond GitHub’s basics: what guardrails and team practices actually prevent incidents?
Posted by dkargatzis_@reddit | ExperiencedDevs | View on Reddit | 11 comments
GitHub gives us branch & deployment protection, required reviews, CI checks, and a few other binary rules. Useful, but in practice they don’t catch everything - especially when multiple engineers are deploying fast.
From experience, small oversights don’t stay small. A late-night deploy or a missed review on a critical path can erode trust long before it causes visible downtime.
Part of the solution is cultural - culture is the foundation.
Part of it can be technical: dynamic guardrails - context-aware rules that adapt to team norms instead of relying only on static checks.
For those running production systems with several developers: - How do you enforce PR size or diff complexity? - Do you align every PR directly with tickets or objectives? - Have you automated checks for review quality, not just review presence? - Any org-wide or team-wide rules that keep everyone in sync and have saved you from incidents?
Looking for real-world examples where these kinds of cultural + technical safeguards stopped issues that GitHub’s defaults would have missed.
rayfrankenstein@reddit
PR’e don’t prevent incidents and code review causes more problems than it solves. Just get rid of code review altogether.
dkargatzis_@reddit (OP)
Interesting take. Personally I invest quite a bit in PRs and like to own the merge button myself. For me, thoughtful reviews and that final ownership help catch subtle issues and keep changes aligned with the bigger picture.
garfvynneve@reddit
It’s not the change set in the PR’s it’s the change set in release artefact.
You can have small pull requests but if you only release once a month you’ll always have a bad time
dkargatzis_@reddit (OP)
Absolutely. Frequent small PRs don’t help much if they’re batched into a big monthly release. Shorter release cycles and continuous delivery matter just as much as PR size for keeping risk low.
ArchfiendJ@reddit
You need a strong lead and culture alignment.
If you have a lead pushing for code quality, small PRs, etc. but half your devs are code worker that just code things they are told to, then it's doom to fail.
If you have a team that strive for code quality, product quality, fast delivery, etc. but can't agree on "how" and you have a weak lead that just do top management reporting, then nothing will be done either (or worse spark conflicts)
dkargatzis_@reddit (OP)
That’s a great point, without strong leadership and cultural alignment, no amount of automation or rules really sticks.
I’ve also seen that when guardrails are designed and owned by the team (not just pushed top-down), they become part of the culture instead of feeling like extra process.
It keeps the “how” evolving together with the team instead of relying only on a lead to enforce it.
Ciff_@reddit
dkargatzis_@reddit (OP)
In the AI era, it feels like the live meeting becomes the single source of truth.
drnullpointer@reddit
> How do you enforce PR size or diff complexity?
Having lots of small PRs does not usually lead to making it easier to review it.
> Do you align every PR directly with tickets or objectives?
Most PRs should be linked to tickets and objectives.
Some PRs (refactorings, reformats, etc.) may not require a ticket or objective. Ideally, I would like to 1) spot a problem that can be quickly solved, 2) solve it, 3) immediately post a PR.
Any additional bureaucracy makes it less likely that I will actually do anything about the problem.
> Have you automated checks for review quality, not just review presence?
The only thing I personally do is track reviewers who let production issues through. I then target those reviewers for "reeducation". But I know of no automated way of doing this.
> Any org-wide or team-wide rules that keep everyone in sync and have saved you from incidents?
Lots.
An example: I instituted a checklist of things to verify on each code review. Every author must ensure these rules are met and every reviewer needs to verify these things in order to accept a PR.
Some examples:
* Any user/operator visible functionality needs to have documentation. When PR updates functionality, the documentation has to be updated as part of the PR.
* All processes need to have metrics. If it can fail or succeed, it needs to have a metric reported.
* Errors cannot be ignored. An error needs to be either fully handled or fail the process.
* Any new data set added to the system needs to have an estimate of how large the data set will be, how quickly it will grow and needs to have automated retention policy.
And so on.
Over time, as we have things fail, we tend to add more checks to the list to make it less likely to fail.
dkargatzis_@reddit (OP)
You guys clearly invest a lot in continuously improving workflows, really solid practices.
Curious - how big is your team, and do developers generally embrace these rules?
gjionergqwebrlkbjg@reddit
Fuck off with your advertising.