AI first teams - how are you dealing with code reviews?
Posted by GraphicalBamboola@reddit | ExperiencedDevs | View on Reddit | 46 comments
So my dev team has gone all in on AI, and it has worked really well so far (surprisingly against the current set narrative)
We have really dropped "code" quality bar - but have instead increased "functional" quality of the product by investing more on QA (way more cheaper)
We have shipped features almost 40% faster than we used to - and no significant drop in "functional" quality or user reported issues
Now we feel we are not moving a fast as we thought it would allow us to - and the reason is code reviews still take time and are the main bottleneck in the pipeline - so my question is for AI first teams out there:
- How are you dealing with Code Review bottleneck?
- Have you dropped code reviews altogether?
- or at least dropped the quality bar on reviews so you don't have to review each line of code (and move with average code AI generates)?
- How are you dealing with the risk of security issues if code reviews are becoming more high level rather than every single line is being looked at? (especially on backend)
What is the way forward plan on longer term for your team?
One-Wolverine-6207@reddit
The shift that helped me was changing what gets reviewed. Reviewing every line of AI-generated code is a losing game, you end up either rubber-stamping it or drowning in review load.
Instead I review the risk surfaces: anything touching auth, payments, database migrations, deploy config, or external API contracts. Everything else goes through automated gates: unit tests, integration tests against a staging environment, type checks, and a required status check on the PR. If the automated gates pass on low-risk code, I don't read it. If they fail, or if the diff touches a risk surface, I read carefully.
The other thing that matters is that the agent has to work in isolation. One agent, one feature branch, one PR. No shared working directories, no multi-agent commits in the same branch. Once you lose that isolation, review becomes impossible because you can't tell who did what.
I can share the workflow I use if anyone wants to see it, it's a full CI/CD setup built specifically for this problem.
GraphicalBamboola@reddit (OP)
If you can share or inbox me the flow please, this sounds great
boring_pants@reddit
Just stick to your guns. You've already decided that short-term velocity trumps code quality. So stop wasting time on code reviews.
If humans aren't the ones writing the code then humans don't need to be as familiar with the code, and if humans don't need to be as familiar with the code, having humans review the code is pointless.
Dannyforsure@reddit
Probably just skip ahead and fire all the devs as well while your at it.
boring_pants@reddit
I mean, you need to keep someone around to write the prompts.
Dannyforsure@reddit
You just get ceo agent to tell director agents to tell pm agents to tell dev agents what to build. Simple really!
boring_pants@reddit
Old: Who watches the watchers
New: Who prompts the prompters
Dry_Hotel1100@reddit
You don't get an answer, because the software service company has been closed.
Now, the medical staff uses Claude code in order to create their medication app themselves.
;)
hegelsforehead@reddit
I also wonder what other kinds of measures need to be invented to ensure software quality. Code review was an important step, and it seems like we're missing some kind of check in this new SDLC when we skip it. That said, AI code review makes a stupid ass idea in such a flow so I'm not suggesting that.
0x6rian@reddit
I’m assuming your team has going spec driven? How do you like it? My team is gravitating towards it. I’m open to more functional code as you say but I have mixed feelings about whether I enjoy it as a way of working, and apprehension about long term effects of writing less code, understanding the code being pushed out at a faster pace.
popovitsj@reddit
How do you measure being 40% faster?
fasnoosh@reddit
Related question, has the definition of “shipped feature” changed?
GraphicalBamboola@reddit (OP)
Not really, that's where we are not allowing to compromise - definition of good has changed from code is great quality to functional quality is satisfied
pr0cess1ng@reddit
So you went from 1 bad metric to another? Great lol
Medium_Ad6442@reddit
It is MFA. Measure from ass
GraphicalBamboola@reddit (OP)
E.g we estimated a project before adopting AI tools and then delivered 40% earlier than expected delivery
loganbrownStfx@reddit
lol come on man. Sample size of one project estimate is a crazy way to derive that number
GraphicalBamboola@reddit (OP)
If it would have taken us more time than the estimate and I presented that we were 40% slower, I'm sure you wouldn't be calling out the sample size then.
But anyways, what should I call then, we were not faster at all? It was a fluke?
loganbrownStfx@reddit
I absolutely would lol, estimates are estimates.
GraphicalBamboola@reddit (OP)
E.g we estimated a project before adopting AI tools and then delivered 40% earlier than expected delivery
UnderstandingDry1256@reddit
We're still figuring it out, but the best approach so far is accept 100% code to be a generated black box and focus of quality criteria.
Reviews do not make much sense anymore. Generate architectural overview; generate potential vulnerabilities report, do it with different models, compare results and it will produce way better review than any flesh engineer.
rupayanc@reddit
we shifted reviews to focus on intent verification first, meaning "does this PR do what the spec says" before getting into how. that sounds obvious but it wasn't our old reflex, and it's the only way to keep reviews from becoming a 3 hour archaeology dig into AI generated code.
HiSimpy@reddit
Great question. AI-first throughput usually breaks review SLAs first. Teams that hold quality set explicit review classes (critical, standard, fast-path) and owner-based queues so high-risk changes do not wait behind low-risk churn.
bake_in_shake@reddit
Have your team run the repo through repowatch.io before it goes for review, your surface area will reduce significantly as they try to beat their own scores. I'll be adding drift between scans soon as well.
Leading_Yoghurt_5323@reddit
i wouldn’t drop reviews, i’d change what gets reviewed. less line-by-line style nitpicking, more focus on risk, architecture, security, and whether the change actually should exist
FlamingoVisible1947@reddit
Wow you're producing shit code 40% faster at the cost of infinitely more expensive debugging and this is what you say has "worked really well"?
You don't belong in this sub.
HasFiveVowels@reddit
Assuming that using AI immediately implies shit code is an opinion that I expect to hear from juniors and those who have insufficient experience with AI to know how to use it well (and so they assume that such a thing isn’t possible)
TheBoringDev@reddit
Eh, my experience has been watching people “learn how to use AI well” and seeing their bar for what constitutes good code fall through the floor in real time. I’m not saying it’s not possible to produce good code, but assuming shit code is probably closer to the median.
boring_pants@reddit
OP is literally saying they've dropped their code quality bar.
GraphicalBamboola@reddit (OP)
I did tell you, we have shipped a big project and have not seen any drop in functional quality than before. So what exactly are we debugging?
micseydel@reddit
I would love updates at 6 and 12 months on that. Dropping code quality can definitely have short-term benefits, and depending on how you're doing the things you're talking about, it could all just be a matter of tech debt.
GraphicalBamboola@reddit (OP)
True, that's something we can only find with time
SodhiMoham@reddit
I totally resonate with what you are saying. Infact, I have written a blog post about the same topic.
But to answer your question, you can use AI to do reviews. You can have custom skills to review the PR in such a way, that it can do the following:
- help you understand the change
- the flow, and which file is responsible for what sort of change
- the regression risk
- verdict
A sample PR review generated by AI looks like the following https://getainative.com/claude_action-text-to-markdown-blobs_review
I found myself going into the depths that i have never reached before the AI.
I hope this comment helps
Old_Cartographer_586@reddit
I’m the only one on the team I lead that can code without AI at all. We are actually reinforcing code reviews. Most my job is turned into code reviews.
The quality I see in my job is definitely worse than what I used to see at my old role.
Honestly, I feel like we aren’t shipping as fast because they break rebreak rebreak and rebreak it. Yesterday I reviewed three different PRs that had simple syntax errors that I know for a fact was coded by Claude Code
igharios@reddit
We do a lot of reviews pre-coding, and do quick checks post coding. You need the model to tell you what he changes are or will be, and validate there.
if the you find bugs, go back and update your specs, prompts, architectural documentation, ... anything that the AI uses to generate the code.
Keeping up with code volume is a lost battle
Richard-Degenne@reddit
Easy.
"@claude please review"
You did say you were an AI-first team, didn't you?
GraphicalBamboola@reddit (OP)
Funny, we do have AI review pipelines to deal with amy nitpicks - that is how generated code quality is closer to average and not shit
Richard-Degenne@reddit
If you think the difference between shit code and average code are nitpicks, I have very bad news you.
CanIhazCooKIenOw@reddit
Code reviews are now focused on "weird stuff" and architectural discussions and less nitpicky because more than before, use and edge cases are properly covered by unit/integration tests.
hegelsforehead@reddit
There's also an exhaustion on tests here. An agent is so capable at generating tests that we end up with thousands of tests and, as we are not writing the code, at times we cannot properly judge which tests are noise and which are truly valuable. To sieve through them is another layer of work that we have wished to have automated away, and is thus not a proper solution.
HasFiveVowels@reddit
If only there was a system that could effectively summarize large amounts of data…
CanIhazCooKIenOw@reddit
I agree that the pendulum might've swung too much to the other end where there's a good chunk of useless tests but hopefully it improves.
This to say, it's a new way of doing code reviews that we are still adapting.
Kaimito1@reddit
Does this mean the actual code quality is going downhill sharply? as that sounds like a short-term deal where long term you might suffer. Dependent on the product you're maintaining though to be fair
Big no. Its there for a reason. If your PR reviews are too large to review in good time then your issue is the PR sizes. Not the reviews altogether.
Nope. If you read a PR review and you don't understand it, then ask the PR owner for more clarification. If the PR owner cannot understand it, then you are essentially just adding sloppy code debt and more risk. "It just works" should never be an answer
0xPianist@reddit
The way forward is better models that write less code with less issues.
We are dealing with all these stuff by doing code reviews and using human knowledge 👉
Dannyforsure@reddit
| how are we dealing with code reviews
By doing them. Next question please.
hoppers2k9@reddit
we’re seeing this bottleneck too. I’m considering proposing WIP limits so that people are encouraged to review before picking up some new work but I do worry it’s restrictive, I want to treat everyone like grownups.
https://www.atlassian.com/agile/kanban/wip-limits