Does AI generated code change the review process itself?

[-]

ninetofivedev@reddit

The way you describe writing code makes me think you never should have been writing code.

I am all in on AI workflows, but I was always very intentional about everything I did. Your entire post just reeks of "vibes".

[-]

lokaaarrr@reddit

That may (or may not) be a bit harsh.

It was unclear to me if in the example the OP was the "author" of the PR, or the reviewer, and another human used AI.

If you are the nominal author, and you can't be sure, stop what you are doing, this won't end well for anyone.

If you are the reviewer, the burden is on the author to make it clear to you the change is correct, not on you to find any serious flaws.

[-]

ninetofivedev@reddit

Anyone who says "It looks fine enough, but something feels off"...

That's not engineering. No, I've never felt that way. I've looked at code, I figure out what it's doing, and I make the determination if that is right or wrong.

[-]

Izkata@reddit

"Something feels off" is called a code smell. It's a signal for experienced developers that they need to dig in more at that part of the code.

[-]

Lost-Albatross5241@reddit (OP)

Yeah exactly. This is the thing people keep skipping over.
Code smell used to be a signal. Not perfect, but still a signal.
AI kind of launders the smel out of the code. It can look clean while the reasoning behind it is garbage or just missing.

So now what? Deep review every single PR like it might be cursed? That sounds nice until you have an actual job.

[-]

Strict-Protection434@reddit

Deep review every single PR like it might be cursed?

what were you doing before? approving pull requests and praying? giving a glance and saying LGTM? the fact you have a "deep review" and some other form of reviewing is extremely worrying to me.

[-]

lokaaarrr@reddit

As a reviewer, it's not my job to dig into the guts of your weird change. If it seems off, I ask for better documentation in the PR description. It should be obvious what the change is doing and why.

[-]

ninetofivedev@reddit

... Looking at code and understanding what it does is in fact the job of the reviewer. If you don't do that, you're not reviewing the code.

If you don't know if it's working as intended, you don't have to dig. That's why you ask the author.

[-]

lokaaarrr@reddit

But the author needs to make it easy. Present what’s going on clearly. If it’s ambiguous, give it back to them to clarify. I’m a reviewer, not an archaeologist.

[-]

ninetofivedev@reddit

Apparently you don't know how to talk to people either and ask them questions.

If the author has to explain every line of code, we're going to be here all day.

Makes it a lot easier to just reach out to them for clarification. You're method is akin to brute force. I'm telling you to be event based.

[-]

Strict-Protection434@reddit

AI should not change how you review code, it shouldn't even change the size of contents of your pull requests. if you are unable to review every line of code in a pull request, that is a very serious issue.

AI or no AI, your team members should be sending small, easy to review pull requests to each other... what's the complication here?

something feels off

this makes absolutely no sense. you are a programmer, code is absolute. if you cannot review code, there is something seriously wrong your development processes and/or you as a developer. is the business logic a spaghetti mess that makes it impossible to know whether or not issues will occur down the line in production in several months? there just isn't a lot here to work with.

So how do you deal with it?

Unit tests, manual testing, reading code.

[-]

drguid@reddit

AI has zero ability to check its code for errors without prompting.

So many businesses are going to the wall because they have adopted untested technology.

Example: I build stock trading systems and AI gave me code that went in a time machine to find out what future stock prices were. A smart human was required to see this, but so many corporations have fired all the smart humans.

[-]

Lost-Albatross5241@reddit (OP)

The code can look smart but the mistake is not a syntax mistake. It’s a reality mistake.

And if the human who is supposed to catch it is just rubber stamping because AI wrote it, then the whole loop is fake.

That’s basically the thing I’m trying to understand here.

[-]

lastesthero@reddit

the experience you're describing — orbiting the decision rather than landing on it — is real and downstream of one specific thing. when a human writes the code, the review is implicitly evaluating their reasoning, and you can ask them about it. when AI writes the code, the reasoning isn't accessible, so the review collapses into "is this code correct in isolation," which is a much harder problem and doesn't have a stable termination point.

i_exaggerated's "ask the author why this piece of code is needed" is the right test. if the author can't answer or answers in copy-paste model output, the MR shouldn't merge. that's not anti-AI; it's "you, the engineer, are still the accountable reviewer of your own PR."

the practical change we made:

1) PR description must contain a "what i changed and why" written in the author's own words. if the description reads like model output, the reviewer rejects on procedure and asks for a rewrite. that filter alone removed a surprising amount of low-quality AI churn. 2) AI-generated tests are reviewed extra carefully because they're optimized to pass, not to catch regressions. an AI-written test suite often "covers" a function in the line-coverage sense and asserts nothing meaningful. 3) infra/security PRs get a human pair regardless of authorship. the cost of a "probably fine" infra change in production is too high to absorb the orbit-the-decision tax.

Mundane-Charge-1900's stratification is the cleanest version of where to spend reviewer time. style and typos are cheap to be lax on; "is this solving the right problem" is where the slope of returns is steep.

[-]

tndrthrowy@reddit

First off, I’ll admit I am bad at code review. I skew heavily biased towards the trust I have in the submitter. If I know them to be a good dev, I know they must be submitting the specific changes for a good reason. I’m basically a rubber stamp if there’s nothing egregious, and there really never is with the good devs. And that equation doesn’t change with AI tools they might be using.

On the other hand, I’ve had the same experience you’ve described with weaker devs even before AI tools. Sometimes the code looks vaguely ok on its face, but dig deeper and there are problems. From bad resource management to “this actually doesn’t deliver the bug fix or feature requested”. So IMO, AI is just another tool that good devs use well and bad devs use poorly, and the code reviews I’ve done recently seem to reflect that.

[-]

tetryds@reddit

I'm the exact opposite. Read everything line-by-line doesn't matter if you are an intern or the CTO. The biggest difference about AI is that the code review volume has doubles but the comment volume has tripled. I only let code pass when it meets the standard so devs get caught on the pattern where they write code very quickly but then have to wait for my review, fix tens of comments and wait for my review again (to fix more comments and so on). This is slowly putting pressure on them to write better code, so they take more time to do it right, but comments come anyway.

[-]

YakaryBovine@reddit

I’m surprised a PR with tens of comments wouldn’t prompt some kind of higher level discussion. Surely that implies that either the design was not agreed on beforehand, or that the submitter didn’t follow style guidelines?

[-]

tetryds@reddit

Adding comments for style is a waste of time. They warrant a single comment: fix styling across the entire changelist.

Yes, that is exactly the point. What gets tons of comments are things in-between, that follow the usual practices but not in the best and more clean way. When code is not reused, or low level apis are used for stuff instead of creating or leveraging abstractions. This is the exact type of thing AI does wrong the most.

[-]

YakaryBovine@reddit

That makes sense, thank you.

[-]

lasooch@reddit

This is actually important. Like, you shouldn’t just box tick a review from a good dev, it may still have issues (whether due to their decisions or due to mistakes). But that signal that ‘this is someone I trust’ is yet another thing that disappears thanks to LLMs. You might still trust the person you know was good before LLMs, though even they may have grown too reliant since, but not so much for people whom you haven’t worked with in the before times.

I wonder how much extra review time that loss of signal leads to. I think most people wouldn’t even consciously realise this happens, but I’m sure it does. I wonder how that compares to the productivity gains of vomiting code up faster.

[-]

Lost-Albatross5241@reddit (OP)

Yeah this is exactly the thing I keep thinking about.

Before AI, “this code smells weird” was actually a useful signal. Not perfect, but useful.

Now a lot of generated code has no obvious smell. It looks clean enough that you have to spend more time proving to yourself if it’s actually okay.

Have you noticed reviews taking longer because of that, or is it more just a trust thing?

[-]

lasooch@reddit

I noticed reviews taking longer... not sure how much of it is strictly because of that.

But yeah, LLMs tend to produce stuff that looks decent enough, which can lead to mistakes being harder to spot (this holds also for text). It sucks.

[-]

gyroda@reddit

Yeah, with devs I trust there's a lot more benefit of the doubt. I'm mostly checking for tests and things that might get in the way in the future (and names, because naming things is hard). But part of that is that the devs I trust tend to write code that's easy to read.

The less-capable/experienced/trusted devs get a lot more scrutiny in part because their code is harder to read. This is, by itself, a reason to push back, but it also means I need to pay closer attention in case there's some weird stuff in there. And often the harder to read stuff is harder to read because it's over complicated where a simpler solution exists and that complexity breeds bugs.

I've seen people, both with and without LLMs, submit code for review that clearly won't work. The big problem I have with the LLM usage of these people is that it makes it much easier to create code for me to review, but this is mostly a people issue - I've had people do it before LLMs were a thing and they can usually be coached into better practices

[-]

itix@reddit

IMO, the focus should be on building testable code and better test cases. If there are deeper problems, why are they not covered by tests? It is great that we can identify issues and give feedback, but we should not rely on that alone.

[-]

keelanstuart@reddit

I think peer reviews and revision control for vibe coded software (as well as a lot of software itself that's sold on app stores) will just kind of go away... PRs - why? Just run it through more testing and have the AI fix it. Revision control? Why? "AI, look at this software and make version 2.0 for me with these features." People will generate all bespoke software on their own personal devices. The value will shift more to protocol and format interoperability.

If it seems like I'm kidding..... I am, but only kind of. I think there's a huge amount of software and software development processes that are going to just go away sooner or later. It will still be important for "mission critical" (read: medical device, aerospace, etc.) purposes, but not most things... but I hope like hell those embedded systems don't use AI code anyway.

[-]

ooleary@reddit

And where does the testing come from? How do we validate that?

[-]

keelanstuart@reddit

The testing is done by humans who use the software. When you think about it, it's one of the few jobs that will be left largely untouched in this revolution. Rather than trying to ensure that software is flawless at release, it will simply be very quickly patched. That's my hot take.

[-]

PressureAppropriate@reddit

I don't even bother anymore. There's just too much of it... I run it, if it does what it was supposed to do: LGTM and move on. That is the sad truth of the job now.

[-]

Lost-Albatross5241@reddit (OP)

Tbh I appreciate the honest answer.

And I think this is probably what happens more often than people admit. At some point you can’t deep review everything, so you run it, maybe skim it, LGTM and move on.

Do you feel like more stuff slips through now, or has it mostly been fine in practice?

[-]

PressureAppropriate@reddit

No disaster so far. Bugs do slip by here and there but that’s comparable to what a human would produce…

[-]

Lost-Albatross5241@reddit (OP)

“no disaster so far” is the annoying part.

It might mean everything is fine.
Or it might mean the weird stuff just hasn’t exploded yet.
Especially with infra/security things. Some bad decisions don’t fail loudly on day one. They just sit there like a little time bomb.

[-]

PressureAppropriate@reddit

Absolutely, and I couldn't care less. Project is cooked anyway (barely any sales, massive layoffs recently). My turn is coming soon I have no doubts.

[-]

i_exaggerated@reddit

I ask the author to explain why this piece of code is needed. And then they don’t respond because they don’t know, so the MR sits open. Or they respond with 100% AI and I give up, so the MR sits open.

I don’t accept “this is probably fine” if nobody understands it.

[-]

Lost-Albatross5241@reddit (OP)

If nobody can explain the code, what are we reviewing?
At that point it’s not “please review my change" It’s “please audit this Ai output that I also don’t fully understand.”
Totally different game imo.

[-]

Lost-Albatross5241@reddit (OP)

This is the part that bugs me.

If nobody can explain the code, then what are we reviewing exactly?

At that point the MR is not really“please review my change”

It’s more like “please audit this AI output for me because I don’t fully understand it either”

Thats a very different thing imo.

[-]

subma-fuckin-rine@reddit

then someone else approves and its merged anyway 🙄

[-]

Wonderful-Habit-139@reddit

Stand your ground. Slop will not be merged in.

If they want to be "more productive" and not be blocked then they just need to write better code with the help of their AI.

[-]

nosayso@reddit

Don't get hung up on the AI of it all. It's code, like the code you or any of your teammates make. Review, give comments, refine, same exact process.

[-]

Sheldor5@reddit

no.

some people generate code without proper reviewing it (it works so who cares) and so outsource all their work (reading, understanding, questioning, improving, cleaning up, ...) to their colleagues who then have to explain why they block PRs for so often/long ...

[-]

nosayso@reddit

Okay... I'm not sure what your point its.... but in that case they'd be equally dysfunctional with our without AI.

[-]

Wonderful-Habit-139@reddit

You guys keep talking in ideals and not in practicals. Do you also say "Ideally the code generated by AI would be perfect and thus you get more speed benefits" instead of the actual reality of software engineering which includes reviews and maintainability of the codebase and the fact that it requires many corrections, and thus negating (or even worsening) any potential speed gains?

[-]

Lost-Albatross5241@reddit (OP)

Exactly “AI makes coding faster” is only true if you stop the clock right after the code appears.

But real software doesn’t stop there
Someone has to review it, maintain it, explain it, debug it later, and own the mess if it’s wrong.
If all the speed gain turns into review debt, I’m not sure we actually gained much.

[-]

nosayso@reddit

No, everyone knows AI generated code isn't perfect, but I've been using Claude every day and can definitely attest I'm continually surprised at how good it is, like 95% there first pass and the other 5% easily caught and fixed, it's definite productivity gains without a drop in code quality. I am absolutely not talking about ideals, this is practical outcomes of every day use.

[-]

nosayso@reddit

No, everyone knows AI generated code isn't perfect, but I've been using Claude every day and can definitely attest I'm continually surprised at how good it is, like 95% there first pass and the other 5% easily caught and fixed, it's definite productivity gains without a drop in code quality. I am absolutely not talking about ideals, this is practical outcomes of every day use.

[-]

New_Enthusiasm9053@reddit

No because they'd produce less. Less dysfunctional is way more tolerable than more dysfunctional.

[-]

Lost-Albatross5241@reddit (OP)

Yeah this is closer to what I mean

On paper I agree with" code is code, review it the same way"

But AI changes the amount of code someone can push without really owning it.

Before even a weak dev had to spend time writing the bad code. That limited the damage a bit

Now someone can generate a lot more “looks fine” code, understand less of it, and push the review cost to everyone else.

So maybe the review process is technically the same, but the situation around it is not the same.

[-]

Jmc_da_boss@reddit

Makes it so every review is a deep review sadly :(

[-]

Lost-Albatross5241@reddit (OP)

Right, and“ every review is a deep review” sounds responsible until Monday morning happens.
PRs pile up, people are waiting, nobody wants to be the blocker forever.
So deep review slowly turns into skim, vibes, LGTM. That’s the scary part imo.

[-]

Sheldor5@reddit

human written code: at least I can trust that my colleague put some thoughts into it and knew what he was doing and also we both know who is responsible

AI hallucinated code: none of the above

[-]

ham_plane@reddit

I don't find that colleague who are meticulous and proven tend to abandon that when they use an LLM. My general instinct is that people who I trust have standards, well, have standards regardless

[-]

Wonderful-Habit-139@reddit

I tend to see exactly that. People that are meticulous have a much worse experience using AI and having to correct it over 50 times each PR, and they get slowed down.

So either they stay meticulous, and figure out that they don't gain much from using AI, or they stop being meticulous for the sake of pumping out PRs, as sloppy as they are.

But before AI, it was much easier to be meticulous because you're writing the code directly, you're not going to write code the "wrong" way if you can just write it at a much higher level in the first place.

[-]

Lost-Albatross5241@reddit (OP)

exactly. Thats a big part of what feels different to me.

With human written code, even if it’s bad, there’s usually some signal that a person understood the change they were making.

With AI generated code, that signal gets weird. The code can look clean, but you don’t really know if anyone understood it.

That changes the whole review dynamic imo

[-]

nosayso@reddit

A) you can't trust your colleague implicitly, that's why the code review process exists and why most work is in nailing down requirements

B) whoever submits the PR is responsible for it, AI or not - that does not change

If human written code was perfect it wouldn't need review and we wouldn't need tests. I don't know why so many people seem to think you need to lower your code quality standards when using AI.

[-]

Sheldor5@reddit

A) my colleagues have a brain

B) I hope so until you get blamed for blocking bad PRs and so you give up and start LGTMing ...

never claimed that human written code is perfect, but at least it was produced by a real brain instead of a random text generator ...

[-]

nosayso@reddit

B) I hope so until you get blamed for blocking bad PRs and so you give up and start LGTMing ...

This is bad process that has nothing to do with AI, it's not a new problem, but also directing this ill will at me is profoundly unnecessary, this is a normal professional conversation, hostility is unnecessary.

[-]

Sheldor5@reddit

not AI but AI vendors and their propaganda so CEOs/managers start to force AI and expect their devs become 10x devs and if they don't they will find a way to blame the devs and not AI

many stories here on this sub ...

[-]

iiiio__oiiii@reddit

With AI assisted PR, we add a high level document explaining the features, how it is designed and how it is implemented. Human reviewers will mostly focus on this document and verify the actual implemented design (high level). Then AI reviewer will ensure the HLD matches the implementation. Human sometimes also read the critical implementation lines. At least, this is what we are trying now.

And personally, my review has been shifted to asking AI about the background concept while previously, I need to dig myself. And it is very helpful to give pointers that I will then clarify until I am satisfied on how the things are designed. Like Spring security and the filters, WebAuthn libraries, Jackson serialisation, React states, OAuth flows, CORS, Kubernetes custom metrics and its HPA, etc.

[-]

Flashy-Whereas-3234@reddit

The PR description MUST contain the WHAT and WHY of the changes, including any NOTES around unusual modifications.

The description must be SUCCINCT as it is for time-poor senior developers who can read the code but don't necessarily understand why you would modify it, and you need to persuade them the changes are logical.

That's basically also a prompt I use in an AI workflow, and it yields pretty good PR descriptions which I then modify further because I don't hate my colleagues.

I have a "if you haven't read this, then why am I reading this?" policy with AI content. This also applies to PRs and PR descriptions. I won't accept overly verbose garbage PRs or code that pollutes the codebase. I get shit comments like "// Fixes PGG-223 by enrusing the turbo encabultor is primed". No. No ticket references. Do it again. You know better.

At a code quality level, you should be following SOLID and particularly SRP (DRY can fuck off), because it's a lot easier to understand your intent if the behaviour is small and compartmentalized. Massive class, massive function? No, go do it again.

And tests. JFC you can literally just ask for them to be written now. Add the fucking tests. And write good tests. Why are you using reflection? Why are you mocking flat data objects? No, idiot. You know better. Do it again.

AI force's you to think at a high level, and if you can't do that and just release all agency to the AI, then I don't know what you're employed for.

[-]

hibikir_40k@reddit

It's not as if human code reviews are ever any good when reviewing other humans. You'd not believe the shit that people approve.

If you don't trust the code, and it's all fuzzies making you not trust, you have to explain to yourself why you don't trust it. Is it undertested? Badly factored? is it inventing a concept that didn't exist? Your standards for understanding what is in front of you, and why you don't like it, just have to go up. Just like you have to be able to call BS when an AI code review hallucinates a problem

[-]

lokaaarrr@reddit

IMO, the burden is on the author to make it clear the PR is correct. And correct in every way:

- it attempts to do something that we want to be doing , and should be doing

- it does the thing it attempts to do

- it accomplishes the goal in a way that is fairly close to the best way we can reasonably do it

- it follows our basic design patterns

- it will be safe to deploy the change, including safe to revert the change after deployment

- it follows the local style (this should be automated)

- it has the locally correct test coverage, and they pass (again, automated)

[-]

throwaway_0x90@reddit

"But if the human is stuck between "this is probably fine" and "I still don't trust this," is that actually the loop we want?"

Create a CI/CD that you can trust. A set of tests that you know if they all pass then the product is at least somewhat-decent. The only thing left is a deep dive in to security.

[-]

Empanatacion@reddit

I'm not going to get a lot of upvotes for this, but I find the AI helpful in explaining the code I'm looking at. Not, "Does this look okay?" (We've got copilot doing code reviews automatically)

But I can ask it questions about the code. Tell me what this PR does and what was the approach. Which part of the PR is where they did X. Did they cover the Y issue? Show me where.

So it helps me get more quickly to the part where I need to engage my brain.

Maybe it's because I'm on a large team with only seniors and staff, but the quality of our code has been improving, not turning to slop. The code I've been seeing is solid, and definitely with more and better tests around it. The AI has the patience to write tests for all the edge cases that would be a PITA to write.

[-]

BoBoBearDev@reddit

I have the same feeling with human code.

[-]

Mundane-Charge-1900@reddit

You have to be more critical in the review about some things while you can be more lax with others.

More critical: is this the right approach? Is it solving the right problem? Is it testing the best way?

Less critical: style, are all the fields being asserted, typos

Asking questions is also faster and more self serve. If you don’t like the answer, ask more detailed questions or escalate to the engineering who operated the agent.

[-]

ChrimsonRed@reddit

I usually give it a glance and make sure nothing major sticks out, have Claude give it a look over, then give it a final look over more carefully.