Monorepo vs Polyrepo for AI-driven development

Posted by PmMeCuteDogsThanks_@reddit | ExperiencedDevs | View on Reddit | 15 comments

Short background: our system has always been in a monorepo (15+ years), but over the last couple of years there has been a push toward a polyrepo approach. As a result, we now have about 10 repositories, with different test strategies and no shared resources. I consider it a total mess. While I’m sure everything can be improved with better tooling, I can’t help but think: why even bother?

More importantly, I feel that for AI-driven development, a monorepo is even more advantageous. Our monorepo is well documented, with Claude files at every relevant level—about 150 files in total. I can open Claude at the root and get strong system-wide support. If I want to focus on something more specific, I can open Claude at a deeper level. Common capabilities can be shared in the root .claude directory.

I recognize there may be some home bias here, but I really value the ability to create a single PR for system-wide changes, have centralized PR management, and rely on a single commit hash to represent the entire system state.

The main complaint I hear is essentially: “shared responsibility becomes no responsibility.” I’m not saying a monorepo is without problems—it’s a compromise. But as we move toward more AI-driven development, I feel it simplifies many aspects.

Technically, I could replicate this setup by cloning multiple repositories into a tree structure or by using submodules. But again—why bother?

Question: Does using a monorepo improve the effectiveness of AI?

Disclaimer: I used AI to proofread this post, English isn't my native language.

[-]

Esseratecades@reddit

These are the kinds of decisions that miss the forest for the trees.

Put the hype down. AI is not a first-class citizen in your codebase. You and your teammates are. It doesn't matter if it's better for AI. What matters is what's better as a product solution that you can understand and maneuver through.

Is your stack one in which all of the components are always deployed, redeployed, or torn down at the same time? Then it's best as a monorepo. Otherwise, a polyrepo is better.

Whether it's better for AI or not is irrelevant.

[-]

titpetric@reddit

The LLMs have limitations where all the things you can do to make them work well with a codebase, are also all of the things to do to make humans work well with a codebase.

Cognitive complexity, line count, file size, package size, structure, small contexts, couplings, dependencies - in all cases, I have not yet found an intervention that would have no overlap between LLMs and dev teams, so if something is done for one, it should cover both positively.

[-]

Esseratecades@reddit

While that's likely true, it's a better use of your time to ask about how to help people instead of LLMs, even if the answer is the same.

[-]

titpetric@reddit

Maybe your time. :)

[-]

PmMeCuteDogsThanks_@reddit (OP)

>Is your stack one in which all of the components are always deployed, redeployed, or torn down at the same time? Then it's best as a monorepo.

Indeed, this is the case. While the monorepo has \~15 applications all in all, including a large multi-module Maven project with source dependencies to common libraries etc, deployment is always done as an atomic unit. Again, this may come as a side-effect of using a monorepo, but this is not a problem that needs to be addressed. The fact that we can take a git commit hash as single input for setting up new environments is used many ways.

ugh_my_@reddit

One day the makers of git will realize what a terrible job they did with submodules

I suppose the question becomes why write custom CICD tooling if you can just create another repo.

Or the alternative, why create dependency management tooling with multiple repositories, when you can have a monorepo.

It is easier to track and update dependencies than doing monorepo CICD tweaks based on what files change. The dumb way is to put some Makefiles around into the repo and run them selectively (if not make, then task, lefthook, atkins or other custom runner).

FitNerdDude@reddit

Multi root workspaces solve this easily enough in Cursor.

Though worktree support isn't there.

Entuaka@reddit

Same with Kiro

throwaway_0x90@reddit

This should be voted on by your teammates.

calab2024@reddit

“Why bother?” this. People argue about folders with vague reasons all the time.

AI can be started in any subdirectory to limit context regardless of source control. So that’s not a real constraint

I push my teams to be VERY specific about what problem is happening with metrics: “we’re spending $X on Claude tokens because of X” or “builds take 45minutes because of Y” and then propose three solution each. If all have same answer “monorepo” fine. But usually it’s simpler stuff. Better tooling, rewrite .claude files, caching, action optimization

Thank you. I also realize that this is mainly a problem of (poor) organisation. As it stands, our existing monorepo is basically left as is, considered legacy, where everything new is to be done in dedicated repos. Which is a total WTF, because the monorepo contains literally 99% of the code base, and this mess is mainly the result of new tech leaders that have zero interest in understanding existing setup, and existing CTO couldn't even comprehend the problem.

The old CTO is shoved, and as a disgruntled founder I'm looking at making a return and trying to clean up this mess. Whatever I say, will be done, but I still want to tread lightly.

The main problem I want to solve is: "I want to be able to start Claude from a single point of entry, and be able to query and manipulate all levels of the system". While the problem could be stated from a position of a underlying desired outcome, I see that it helps POs to independently query the system to understand functionality. So the idea is to rather present this problem I want to solve, I ask for solutions. If a polyrepo approach also solves it, fine, but to me so many things are simplified by just literally having all code in the same directory structure.

roy_malcolm@reddit

You’re right to assume that it probably shouldn’t matter much. If you’ve got multiple test suites and no shared assets, decoupling makes more sense. Cross-repo context is also easily doable with tools with good harness plugins. We use linear for tickets and notion for dos, both of which Claude code can read and write to easily.

Axmirza2@reddit

I’ve worked with both. I prefer monorepos, but specifically for llms, if you use something like sourcegraph or github’s mcp server claude can figure it out.

Fresh-String6226@reddit

Multiple repos can also work well if you just teach your agent to work across multiple of them locally.

A monorepo may be easier organizationally if you’re a larger organization with a dedicated set of folks to maintain the tooling. If your company is not large enough to justify having people dedicated to creating monorepo infrastructure and taking care of the health of the monorepo as it grows, don’t do it.