I've been trying to work towards "Level 5" AI-assisted development for a year. I'm not quite there, but this is the flow that's working for me.

Posted by rgeade@reddit | ExperiencedDevs | View on Reddit | 29 comments

To be clear this is not "vibe coding", but a pragmatic system for managing coding agents at scale. It's the structured process I've been working on since I first started using claude code nearly a year ago.

The thing that made the biggest difference wasn't the AI model I was using (although I've primarily used Opus 4.6 for dev with Codex on reviews), it was the specs. Specifically writing a real per-task spec and then running it through a separate agent that's prompted to poke holes in it, NOT validate it. Adversarial spec review sounds dramatic but it's just a second agent trying to break the plan before you build anything.

The process is ever changing, but the current flow I landed on:

strategy → phases → sprints → tasks → per-task spec → adversarial spec review → build → adversarial code review → staging → prod

The adversarial spec review was actually the last thing I added. I was doing adversarial code reviews when Codex came out and pitted a Codex agent against Opus. This caught far more issues earlier in the process, and has substantially reduced by review cycle count during code review.

Is anyone else landing on something similar? I'm by no means at "dark factory" yet, but I spend far more time speccing than I have in my entire career.

I also wrote a longer form of the above to share the details of my flow, but I'm still looking at ways to improve the process. Ultimately, it becomes a function of token cost; and at some point the review cycles can double or triple that.

[-]

allknowinguser@reddit

Curious in what “Level 5” means. I’ve seen others here mentioning “Levels”

aidencoder@reddit

It means you've ascended to third-eye AI paladin and now get to donate EVEN MORE money to Anthropic

rgeade@reddit (OP)

oh, they are getting plenty of donations after the max billing changes.

pydry@reddit

How far of the rails you're letting the LLM take you.

The original reference I know was just earlier this year, Dan Shapiro https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/

micseydel@reddit

It means OP is a larper, not an experienced dev.

It's vibe coding. Dress it up however you want.

fair assessment. I'd say the distinction in my mind is that I'm not prompting with a prayer and hoping I get a working prototype. The amount of planning that goes into it is more than I've spent in 20 years of software development.

The models are not deterministic, so there's a bit of prayer, clearly.

Ubersmush@reddit

What is the difference in a shipped product if it has been vibe coded or not?

moreVCAs@reddit

lmao

brrnr@reddit

The quantity and severity of bugs and also whether or not they can be confidently remediated in a reasonable timeframe without regression

Being able to troubleshoot bugs and fix them is still table stakes for any engineer, whether or not the code was written by hand or through agents. Quality still very much matters. Go fast and break stuff doesn't scale beyond small projects.

In theory I'd agree, but in practice - what I see from my coworkers - the mental model of the system becomes fuzzier and fuzzier and the ability to perform root cause fixes rapidly deteriorates to symptom management.

boring_pants@reddit

That code written by humans can be maintained by humans. Code not written by humans cannot.

So it comes down to how interested you are in bug maintaining your product after shipping it.

The level of downtime and how much flaky shit just doesnt work.

yet op clearly stated that it’s not vibe coding. curious 🧐🧐🧐🧐🧐

Ah, not when you add levels, then it is instantly science!

This is just like when incels came up with all the alpha/beta/sigma male nonsense. Continue doing all the shitty things you were doing but pretend you're climbing an imaginary ladder.

Well, the level does tell you how far gone they are.

potatolicious@reddit

At what point does a human look at and understand the code? At what point is everything being validated against a reliable source of truth? Because ultimately that is the difference between vibe coding and not.

The process you have going does pretty significantly reduce risk on a per-task basis - adversarial spec review and adversarial code review both demonstrably improve outputs in aggregate, but you're still looking at the exact same risk profile and failure modes of vibe coding, just with somewhat improved probabilities at each stage.

I am someone who writes the vast majority of code by LLM now, so this isn't coming from a place of "lol LLM dumb"... but why do you want to be a dark factory? What is that accomplishing?

One thing that I can't stress enough about this strange new world we're in: you still have to know stuff. You still have to understand things, in detail. You still have to know how your code works. No model we have today alleviates that responsibility.