How do you stop codebase from degenerating into an un-maintainable AI-slop mess?
Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 81 comments
What techniques help to reap the benefits of AI code without it accumulating into massive technical debt requiring costly re-writes?
pip25hu@reddit
Lots of manual code review, mostly. Yeah, I know that face you're making right now.
AurumDaemonHD@reddit
So all of the joy of writing code is gone all that is left is... debugging...
LordHousewife@reddit
Maybe a bit of a hot take in these parts, but if you’re properly reviewing everything that is written and fixing poorly written code along the way, writing it with an agent isn’t much faster than doing it yourself, especially when you consider how good AI autocomplete has gotten. When you write the code you have the added benefit of understanding what is being done which makes reviewing the final changes much easier.
Imaginary-Unit-3267@reddit
Bro 90% of writing code is debugging anyway.
cutebluedragongirl@reddit
This dude gets it.
fishhf@reddit
Bro it's called test driven development.
johnfkngzoidberg@reddit
I write perfect code every time. My fingers transcribe wrong sometimes.
Firm-Fix-5946@reddit
review isn't debugging and generally the former needs to happen only after the latter is complete, but you're still not wrong that they're both less fun than actually writing the code
Ok-Internal9317@reddit
i think agentic coding just speed up the typing speed for the code generation part, you are just plain doing more under the same amount of time. So obviously debugging is more.
FinBenton@reddit
Thats how all my projects start like, then I make the Ai plan a large refactor and do it to organize everything.
DeltaSqueezer@reddit (OP)
I'm planning such a refactor now. I'm going to try to get the AI to do it, but if that fails, I wonder if I should have just written it all by hand in the first place. The AI is great for rapidly standing something up and then getting it to build out features fast, but I'm thinking I need to take more control at an earlier stage somehow if I want to avoid having to re-write the whole thing by hand later.
Imaginary-Unit-3267@reddit
Test-driven design. Create functionality one piece at a time and test it one piece at a time too, with each piece being committed only after you test it and find that it works. And every few commits you step back, read the whole code base (with your AI, but not requiring it do this FOR you because they're not smart enough to do it right yet, certainly not local ones), work with it to plan a small refactor, do that, then move on to the next small piece. Do not wait until it's thousands of lines strewn across multiple files before refactoring. Do it regularly as a form of basic maintenance, and use your own brain for it.
Firm-Fix-5946@reddit
great advice.
so, basically treat the AI like a human coworker and follow best practices for human software dev.
which can be easier said than done if you're feeling impatient and you and/or management want to go faster. whether AI is involved or not, tbh.
Imaginary-Unit-3267@reddit
I think that's just best practices for literally anything. If you're navigating an unfamiliar space, you move a little, stop and reorient, move a little, stop and reorient - if instead you run a long distance before bothering to see where you are, you'll probably get lost. It's the same principle really.
FinBenton@reddit
Hmm actually I mainly use gpt 5.4 with codex and I have never had any issues with refactors, works every time and after that its much easier to keep it tidy and organized for the future.
total-context64k@reddit
You could follow the process that I shared that I developed and have been using for months to assist me with developing pretty complex software. :)
fugogugo@reddit
how do you make sure the refactor doesn't break anything?
kevin7254@reddit
Tests? That is not really specific to AI. Tests to guard yourself
FinBenton@reddit
I mean I just test stuff out once its done but generally I havent had issues with that, they have worked perfectly first try most of the time, maybe once I had to ask for a small fix after.
davidy22@reddit
Every time I've tried to make AI do a refactor across multiple files it visibly butchers something. Fortunately, that's avoidable with the solution that I would have recommended for OP's original question, which is use your human eyes to check everything that comes out of the machine
truedima@reddit
Tbh, I mostly "draft" and manually iterate. I used to use claude extensively (due to mandate and AI-booster bosses) at work, and I was never happy with the quality or the "feel" of it (esp just reviewing in the end).
So I rather went for spending more time iterating in planning mode, structuring todos, having it spit out small sketches/diffs of functionality, critiquing, trying alternatives and going one by one and manually touching up/fixing as we go point by point. Or in case of larger "jesus take the wheel" experiments I'd often just rewrite once it was clear where what belongs. Those two flows made me way happier and still more productive (sometimes, \~50:50). And that's also what I try to do with my local models. It's better anyways, because they are less capable and I can't be arsed to fix whatever random rampage an agent might have done for 60mins "thoughtlooping" on some red herring.
For chore tasks that I couldn't be arsed to script, like "irregular" mass renamings or test data generation etc, I dont mind too much most of the time to be lower touch.
But maybe I'm just old.
Pwc9Z@reddit
I seriously believe that we're headed towards a future where codebases do not even get maintained long-term, but rather re-vibecoded completely because it's going to be less work
StupidScaredSquirrel@reddit
I think a good chunk of "apps" will be just vibecoded on the go, used, and canned on the spot. If code is so cheap to generate I can see apps being tailored not to the user but to the request they just made.
droptableadventures@reddit
That's basically what OpenClaw is doing, and why it's such a bad idea.
StupidScaredSquirrel@reddit
Openclaw didn't become famous because it was good. It became famous because people liked the idea of what it could do if it were any good.
I'm sure in the future any modern OS will have a built in system that writes and executes code in the background to provide a service to a custom request.
droptableadventures@reddit
Yep, that's basically every wildly successful Silicon Valley product in the past 15 years right there!
Due-Function-4877@reddit
I'm staying on Linux in that "modern" future. What you're suggesting is dystopia. Operating systems by definition should provide the bare minimum. If I want an agent process running, I should make that choice.
StupidScaredSquirrel@reddit
I think you are being dramatic. Almost all operating systems ship with some non essential startup kit of apps, including linux distros. This is sometimes done as a courtesy and sometimes as adware.
Having a tool shipped with a distro that you may or may not use isn't "dystopia".
Vivarevo@reddit
How is this even safe
mild_geese@reddit
Its not, but if there is lots of software which isn't used by a lot of people, there aren't many software monocultures, so attacking a large amount of people requires attacking a lot more software
ILikeBubblyWater@reddit
Doesnt need to for short term profits
Fast-Satisfaction482@reddit
An attacker can't have zero days stocked for a software that does not yet exist.
Pwc9Z@reddit
Fuck it, we ball. Now, hand over the control over your machine to a text generator
lxgrf@reddit
That's the neat part - it's not.
inrea1time@reddit
Tokens are being subsidized like crazy right now, this will be much more expensive in the future and any app with any complexity cannot just be easily vibe coded from scratch even with strong specs and expected to work. This does not even consider having a stable tested system that your customers can depend on.
erwan@reddit
That's what people said when Rapid Application Development arrived in the late 90's.
Result: many apps were used for many years in an unmaintainable state until a painful rewrite in a maintainable way.
kyr0x0@reddit
Hard Steering but most importantly, split your codebase into small modules where every module is responsible for a single thing and a small API contract ensures that no Cross-Module boundary is crossed. Helps a ton!
o0genesis0o@reddit
Only move as fast as you can understand the architecture (module decomposition, control flow, dataflow). You need to hold full control over this aspect at every commit. AI agent can help you to enforce this to some degree if you can articulate it, but at the end of the day, it’s all on you.
It’s no different from leading a team of real devs, honestly. We have people who cut corner, patching data through, deleting tests. And we have those who love to over engineer the heck out of the codebase for no reasons. Both of those behaviour are there in LLM coding agent.
GoofAckYoorsElf@reddit
Aside from manual code review I have linters, prettifiers, unit tests, integration tests, review agents, double check review agents, critics agents all prompted and preconfigured on different aspects of clean code.
I'm pretty impressed what you can achieve with multi agent AI coding and the proper tool chain. Of course it's not perfect. But honestly, show me one human developer that writes perfect code!
Edzomatic@reddit
The old school way. I open the web chat, manually paste snippets of relevant parts, and explain what I want
This way I'm forced to actively think of what I want exactly and how to do it instead of outsourcing my thinking to an AI. I do use antigravity sometimes and I find it great at giving insights or recommendations on how to refactor a bad piece of code, or doing one very specific task quickly
Skeptic-AI-This-User@reddit
I wouldn’t even consider this the “old way”.
The actual old way is opening up your code editor and debugging it yourself. Of course there’s even older ways, and if you needed to you could use them as well.
Edzomatic@reddit
By old I meant when chatgpt released 3.5 years ago, which given the progress in AI and the 10 coding editors that pop up everyday feels much older.
Although I'm glad I learned to code before then and explicitly mentioned not outsourcing thinking to AIs
Skeptic-AI-This-User@reddit
Sorry, just having a hard time considering "3.5 years ago" as "old". But given the pace at which things are progressing, I suppose that can make sense.
Edzomatic@reddit
Yeah I didn't mean old literally. Nice username though
Skeptic-AI-This-User@reddit
Thanks! I wanted to go full-on Yoda with "Skeptic AI This User Is" but didn't have enough space
total-context64k@reddit
I use the Unbroken Method which is built into my coding harness. It provides basic engineering/development practices natively so agents do full investigative work before making changes to the codebase. I have multiple >100k LoC codebases that I maintain (including the harness itself) and managing them is no problem at all.
JollyJoker3@reddit
It looks like concentrating on giving info from one session to the next instead of formalizing documentation, best practices and architecture. I'd want to see epics broken into stories and tasks, modular architecture, language-specific rules, linting, naming consistency checks etcetc.
SadBBTumblrPizza@reddit
check out crosslink or chainlink on github. Local issue trackers for ai agents. They do exactly that.
total-context64k@reddit
Some of this is domain specific, and some doesn’t really fit. Agents don’t need agile process, todos and plan documents are fine. Domain specific content belongs In AGENTS.md. I would love to see you develop and test the concepts though, these are proven to work for me over months of use. :)
inrea1time@reddit
I have seen a lot of the issues you describe and tried to overcome with instructions, agent files, etc ... I will def take a look at your project. I have several decently sized code bases and and my projects are moving along but I spend a lot of time fixing unintended changes, code, misunderstandings, over design and the constant band aids the AI puts in.
total-context64k@reddit
Yeah, bandaids and patch works were a common problem for me too, this helps because it requires the model do real investigative work before making changes and present a plan. It requires spending more tokens of course, but I'd rather spend more tokens and have it done right than spend them anyway cleaning up later (or having to clean things manually).
I use this method for all of my Open Source work, I don't do a lot of manual development myself anymore.
inrea1time@reddit
Have you tried creating and maintaining more detailed code documentation and pointing the model to it? I think it helps with tokens. It also can help with detecting drift if its detailed enough.
total-context64k@reddit
Take a look at the docs/ directories and code comments in any of my projects. :)
Manitcor@reddit
Docs, code as doc only really works for people, we make leaps of logic and assumptions the machines do not.
You need excruciating detail as a reference point for the expected intents. There are concpets that just get lost in commit history and even if they are recoverable it will take a lot of tokens and trash, easier to maintain an authoritative corpus and have updates/revision/sync part of CI.
brstra@reddit
It won’t degrade if it’s AI-slop from the beginning.
CondiMesmer@reddit
I don't let it plan the architecture. I will have it help occasionally but I get the final say. Architecture is mostly where it turns to slop, and it will gladly generate 10x the code that is actually necessary, so you can have it make a couple of passes.
Most of my improved quality doesn't come from better models myself, but my various copilot instructions/agents/skill/etc I've built up.
Pleasant-Shallot-707@reddit
By being very specific with your architecture, feature requirements, and direction
gatewaynode@reddit
I have the AI document extensively (TODO.md, BUGS.md, PRD.md, LESSONS.md) and always generate an ARCHITECTURE.md file for anything over 500 lines. Rotate and date the files as things get bigger, include regular review and analysis tasks, keep individual source files under 500 lines in size, decompose and modularize as regular tasks.
decrement--@reddit
I even have it generate a codebase.md that has a description of each file and what it does.
gatewaynode@reddit
Oh nice. I need to try that.
VoiceApprehensive893@reddit
if its slop,then its replaceable contained slop in a module that i can replace
marathon664@reddit
Stick to trunk based. Every PR should be the smallest unit of work that gets toward a goal, no commits to main (only use reviewed pull requests), ideally do some test driven development. The same strategy as before.
Ok-Definition8003@reddit
Instruct the AI to write code using a high quality dev process. intervene when necessary.
AI code is initially pretty bad. With a standard structured dev process it can get to ok. With occasional interventions it's good.
Elegant_Tech@reddit
If vibe coding break everything down into small steps and test to insure working as intended before moving on. For every feature ask it to plan each step. If it breaks it into phases do a single phase at a time, pland then implement each phase. Start new conversations if context window breaks 100k if you have a large context. I fully built out an idle defense game with coded svg graphics with tons of features and UI polish with Qwen3.5 that you can play for months as a test. It has content across the first 1000 waves with numerous systems. Unless you can run 400B parameter models or larger locally ask the AI to plan first then implement second. Again, keep the work small in scope and test each step. Temp <=0.4 Top K = 20
Terminator857@reddit
GLM 5.1 is very good at code reviews. 3x better than anything else including opus.
I have lots of unit tests, 2-3 unit integration tests, heavy integration tests, and end to end tests. I frequently refactor when I detect code smell.
SnooPaintings8639@reddit
The same way you keep a system healthy when you have 5 juniors and a mid in your team. You have be good at architecture and overall system designs.
zoupishness7@reddit
You don't avoid rewrites, you do clean-room rebuilds. If you do it right, each time you do, you'll have a program that's smaller and faster, with less bugs, that was build for less than the original.
xAdakis@reddit
Technical Specifications and other Software Engineering Practices
You create documents that plan out the architecture and API of your code and the general approach to implementing it. You can use AI pretty effectively to build and refine this document. This document becomes your source of truth on how your code is organized and functions.
You then ask the agent to create a "granular AGILE-like" (important phrasing) implementation plan to turn the technical specification into a real package or application. Review that plan against the technical spec to ensure it is doing things sensibly.
Next, ask the AI/agent to implement it. When the AI is finished, test it yourself and look for any gaps or inconsistencies.
Then give your AI/agent this prompt: `Perform a comprehensive code review on all packages and applications in this workspace. Take into consideration all files and configuration, NOT just recent changes. Fix any and all identified issues. Make logical commits for all changes you make.`
It SHOULD run through the entire workspace and identify gaps and inconsistencies itself and even do some optimization. Run that prompt multiple time with a fresh context until it does not discover any issues.
Finally, ask it to update all documentation, including READMEs, JSDoc documentation (if typescript/javascript), and code comments, etc.
In general, this usually leads to well organized code that is easy to understand.
It is also extremely important/useful to define a "style guide" for the programming language you're using. For example, include https://google.github.io/styleguide/tsguide.html in your system prompts/instructions and telling the agents to strictly adhere to that. Perhaps adding another review pass to ensure compliance.
When all that is said and done, then treat any and all changes like a merge/pull request to your repo and run over everything with a fine toothed comb yourself before deploying/publishing.
If you ever need to make change, edit the technical spec first and repeat the whole process.
CorpusculantCortex@reddit
Context and documentation. If using ai you need to make sure it writes and updates comprehensive and coherent context documentation so that future revisions (and your team) always know what has been done and why. Also I find it important to set rules and templates to have cohesive approaches to solutioning so it isn't some random best fit code for every task you set out, but it looks and feels more like a single developer/team with a unified approach, it might sound silly but it helps with code review and future dev if there is a standard approach to similar problems and use shared libraries and functions when repeating tasks rather than having code spaghetti with 4 different functions doing the same thing in different ways in different modules.
Pretty much all things that you should be doing if doing the coding manually, but i think gets forgotten with ai because it is the boring part and ai makes everything else seem so easy and quick it is simple to just ask it to make a module to do the thing you need and forget to bind it to your codebase.
Stepfunction@reddit
LLMs are best when making a single, focused piece of functionality, so making projects modular, with few dependencies between parts reduces the chance of the quagmire.
Imaginary-Unit-3267@reddit
Fun fact: humans are also best when making a single, focused piece of functionality, hence modularity being the most important part of software development in general.
ParaboloidalCrest@reddit
Micromanagement.
MrWhoArts@reddit
The current wave of vibe-coded applications, broadly defined as software generated primarily through AI prompts with minimal traditional engineering, has introduced a measurable shift in how quickly products can be prototyped and launched. However, the gap between rapid creation and durable production systems remains significant. Early data from developer platforms, internal engineering reports, and industry observations suggests that while a large majority of AI-assisted applications reach a functional prototype stage, only a small fraction achieve stable, scalable production use.
A realistic assessment places successful production outcomes for vibe-coded applications in the range of roughly 10 to 25 percent. The remaining 75 to 90 percent either fail before deployment or degrade shortly after release due to technical or operational issues. This failure rate is notably higher than traditionally engineered software, where structured development processes and testing reduce early collapse, even if long term product success is still uncertain.
One of the most consistent patterns is the gap between prototype viability and production readiness. Vibe coding excels at generating working interfaces, basic logic, and integrations in a short time. In controlled environments, these applications often appear complete. Once exposed to real users, unpredictable inputs, higher traffic, and integration edge cases quickly reveal weaknesses. Applications that perform well with a single user or limited dataset often fail under concurrent usage, where state management, data consistency, and latency become critical.
Bug density is also materially higher in AI-generated codebases. Internal benchmarking across development teams indicates that vibe-coded applications can exhibit two to five times the number of defects per thousand lines of code compared to manually reviewed and engineered systems. Many of these defects are not immediately visible, as they emerge under specific conditions rather than standard test flows. This creates a false sense of reliability during early testing phases.
Security vulnerability rates present another major concern. Studies and audits of AI-generated code suggest that between 30 and 60 percent of vibe-coded applications contain at least one significant security flaw at the time of initial deployment. Common issues include improper input validation, insecure authentication flows, exposed API keys, and insufficient access control. Because AI models tend to prioritize functionality over defensive design unless explicitly instructed, many generated systems lack the layered protections expected in production environments.
Scalability is a frequent failure point. Vibe-coded systems are rarely designed with infrastructure constraints in mind. Database queries may be inefficient, caching is often absent or incorrectly implemented, and background processing is typically overlooked. As a result, applications that function smoothly at low volume can experience rapid performance degradation or complete failure as usage increases. Retrofitting scalability into these systems often requires substantial re-engineering, reducing the initial time advantage.
A deeper issue underlying these technical problems is the absence of engineering depth. Vibe coding abstracts away many foundational decisions, which can lead to architectures that are inconsistent or poorly structured. Without a clear understanding of system design, developers may accept AI-generated solutions that are internally contradictory or fragile. This becomes especially problematic when maintaining or extending the application, as small changes can introduce cascading failures.
Product-market fit also plays a role in the high failure rate. The speed of vibe coding enables rapid idea generation, but it also encourages the creation of applications without sufficient validation. Many projects reach a functional state without confirming real demand, resulting in technically working products that fail to gain users. In this sense, vibe coding accelerates both innovation and the rate at which unviable ideas are exposed.
Despite these limitations, the advantages of vibe coding are real and measurable. Development cycles that previously took weeks can now be reduced to hours or days. This has significant value in early-stage exploration, internal tooling, and low-risk applications. Teams that combine AI-assisted generation with traditional engineering practices tend to achieve better outcomes, using vibe coding as a starting point rather than a complete solution.
The current state of vibe-coded applications reflects a transitional phase rather than a stable endpoint. The technology has dramatically improved accessibility and speed, but it has not yet replaced the need for disciplined engineering. Success in this space increasingly depends on how effectively teams bridge the gap between rapid generation and production-grade systems.
Polite_Jello_377@reddit
Fuck off clanker
-dysangel-@reddit
Use types. Make sure they're being used properly and the agent is not cheating with them. Keep things modular. Make sure to remove dead or conflicting systems to stop confusion both on your and the agent's part. It's very easy for cruft to accumulate in vibe coded projects
awitod@reddit
Lots of planning, iteration and attention to detail. Don't build a house of cards in the first place.
It will get easier to extend as you go as the designs become self-reenforcing if they are consistent or much harder as you fill an ever-growing bowl with slop.
CalmMe60@reddit
Some architectural work might help?
VonDenBerg@reddit
i really think this depends on what you're doing... but poly repo strategy has been a huge strategy for us.
We do Data Analytics pipelines so it's easy to segment it. As little context as possible with each repo.
I maintain a larger map of how things are connected and subagents that reference that when architecting a change.
StirlingG@reddit
It will slow down progress a lot, but you need to put in the time to have a multi-model review of every plan before and after implmenting it. For bug fixes I usually have the models review the plan 2 or 3 times, and then review the implementing 2-5 times, until they don't find any issues. For massive refactors to cleanup the codebase, I'll sometimes do 20 or more review sessions back and forth between claude and codex. I also tend to ask my models, could you have fixed any of these problems in a way that adds less entropy to the project, or is better from first principles thinking?
Equal_Passenger9791@reddit
I refactored one of my projects (an image generation model training project) to go from an infinitude of different code files testing many different small things, architectures, sanity checks and training files, to just being a single launch_pipe.py that accepted a clearly defined configuration file as an argument.
But your accumulated experience will be guiding for whatever you're making, you have to start somewhere or you'll be stuck in pre-launch optimization hell forever.
Maybe sit down to think of how your project evolves over time? Do you scatter a littering of old files(that's me)? Growing layers of dependancy? How can you make it more compact and cleaner? Or do you need to? Will your project need infinite development and maintenance, audit and debugging? Or is it a one shot thing that will be done forever and put in cold storage after 3 days?
stopbanni@reddit
Do .bak backup, and have a SPEC.md