I've been trying to work towards "Level 5" AI-assisted development for a year. I'm not quite there, but this is the flow that's working for me.
Posted by rgeade@reddit | ExperiencedDevs | View on Reddit | 29 comments
To be clear this is not "vibe coding", but a pragmatic system for managing coding agents at scale. It's the structured process I've been working on since I first started using claude code nearly a year ago.
The thing that made the biggest difference wasn't the AI model I was using (although I've primarily used Opus 4.6 for dev with Codex on reviews), it was the specs. Specifically writing a real per-task spec and then running it through a separate agent that's prompted to poke holes in it, NOT validate it. Adversarial spec review sounds dramatic but it's just a second agent trying to break the plan before you build anything.
The process is ever changing, but the current flow I landed on:
strategy → phases → sprints → tasks → per-task spec → adversarial spec review → build → adversarial code review → staging → prod
The adversarial spec review was actually the last thing I added. I was doing adversarial code reviews when Codex came out and pitted a Codex agent against Opus. This caught far more issues earlier in the process, and has substantially reduced by review cycle count during code review.
Is anyone else landing on something similar? I'm by no means at "dark factory" yet, but I spend far more time speccing than I have in my entire career.
I also wrote a longer form of the above to share the details of my flow, but I'm still looking at ways to improve the process. Ultimately, it becomes a function of token cost; and at some point the review cycles can double or triple that.
allknowinguser@reddit
Curious in what “Level 5” means. I’ve seen others here mentioning “Levels”
aidencoder@reddit
It means you've ascended to third-eye AI paladin and now get to donate EVEN MORE money to Anthropic
rgeade@reddit (OP)
oh, they are getting plenty of donations after the max billing changes.
pydry@reddit
How far of the rails you're letting the LLM take you.
rgeade@reddit (OP)
The original reference I know was just earlier this year, Dan Shapiro https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory/
micseydel@reddit
It means OP is a larper, not an experienced dev.
aidencoder@reddit
It's vibe coding. Dress it up however you want.
rgeade@reddit (OP)
fair assessment. I'd say the distinction in my mind is that I'm not prompting with a prayer and hoping I get a working prototype. The amount of planning that goes into it is more than I've spent in 20 years of software development.
aidencoder@reddit
The models are not deterministic, so there's a bit of prayer, clearly.
Ubersmush@reddit
What is the difference in a shipped product if it has been vibe coded or not?
moreVCAs@reddit
lmao
brrnr@reddit
The quantity and severity of bugs and also whether or not they can be confidently remediated in a reasonable timeframe without regression
rgeade@reddit (OP)
Being able to troubleshoot bugs and fix them is still table stakes for any engineer, whether or not the code was written by hand or through agents. Quality still very much matters. Go fast and break stuff doesn't scale beyond small projects.
brrnr@reddit
In theory I'd agree, but in practice - what I see from my coworkers - the mental model of the system becomes fuzzier and fuzzier and the ability to perform root cause fixes rapidly deteriorates to symptom management.
boring_pants@reddit
That code written by humans can be maintained by humans. Code not written by humans cannot.
So it comes down to how interested you are in bug maintaining your product after shipping it.
pydry@reddit
The level of downtime and how much flaky shit just doesnt work.
moreVCAs@reddit
yet op clearly stated that it’s not vibe coding. curious 🧐🧐🧐🧐🧐
boring_pants@reddit
Ah, not when you add levels, then it is instantly science!
This is just like when incels came up with all the alpha/beta/sigma male nonsense. Continue doing all the shitty things you were doing but pretend you're climbing an imaginary ladder.
pydry@reddit
Well, the level does tell you how far gone they are.
potatolicious@reddit
At what point does a human look at and understand the code? At what point is everything being validated against a reliable source of truth? Because ultimately that is the difference between vibe coding and not.
The process you have going does pretty significantly reduce risk on a per-task basis - adversarial spec review and adversarial code review both demonstrably improve outputs in aggregate, but you're still looking at the exact same risk profile and failure modes of vibe coding, just with somewhat improved probabilities at each stage.
I am someone who writes the vast majority of code by LLM now, so this isn't coming from a place of "lol LLM dumb"... but why do you want to be a dark factory? What is that accomplishing?
One thing that I can't stress enough about this strange new world we're in: you still have to know stuff. You still have to understand things, in detail. You still have to know how your code works. No model we have today alleviates that responsibility.
Just-Ad3485@reddit
What does level 5 mean, or dark factory?
I think the adversarial review is a good idea
Strict-Soup@reddit
Is level 5 like in Scientology where you get to a certain level and you can fly?
Kazumz@reddit
Like Stage 2 on a BMW
daze2turnt@reddit
It’s called context engineering. Attempting to get more consistent output from LLMs
MonochromeDinosaur@reddit
What are the levels?
0dev0100@reddit
In what way is this not vibe coding?
Crafty_Independence@reddit
Well. There's a new one for buzzword bingo
SnugglyCoderGuy@reddit
Its just vibe coding but with more words and more steps
bystanderInnen@reddit
*AI BAD*