Git Monorepo vs Multi-repo vs Submodules vs subtrees : Explained
Posted by sshetty03@reddit | programming | View on Reddit | 196 comments
I have seen a lot of debates about whether teams should keep everything in one repo or split things up.
Recently, I joined a new team where the schedulers, the API code, the kafka consumers and publishers were all in one big monorepos. This led me to understand various option available in GIT, so I went down the rabbit hole to understand monorepos, multi-repos, Git submodules, and even subtrees.
Ended up writing a short piece explaining how they actually work, why teams pick one over another, and where each approach starts to hurt.
seweso@reddit
I never understood the desire to create more repo's than there are teams. Can someone explain why you would ever want that in the first place?
r1veRRR@reddit
Because every single Git tool is first and foremost good at treating a repo as a single complete project. Monorepos require a bunch of extra faff to work with multiple projects in the same repo.
So, if a team writes two different, separate projects, why would they need to be in the same repo? What do they gain?
seweso@reddit
Most tools can work on subfolders just fine, doesn’t matter what level you are in, everything is the same.
And subtrees exist.
Kungpost@reddit
What do you do if you have 20 teams that have their own infra apps and then you introduce a platform team that will take over the responsibility for each of the 20 teams infra apps?
seweso@reddit
If i were the platform team i would subtree the infra folders of those 20 teams/repo's into the platform repo. And as a normal dev team i would want infra to also remain in the teams repo.
So i don't get the issue here.
Kungpost@reddit
TIL about git subtrees.
Alright, at first glance that seems like a great solution. Seeing as you seem to be used to it, I have some follow up questions:
seweso@reddit
Subtrees have no influence on CI/CD. Those tend to just check out the entire repo. And pipelines can work/trigger on folders.
Merge conflicts i haven't tried. I tend to stay way from merge conflicts. Infra isn't the kind of code where you work for days/weeks on a branch. You pull, change and push.
Kungpost@reddit
So when I look into this, I find that you have to do something along the lines of
git subtree pull --prefix=teams/team-a team-a main --squashto pull down commits to theteam-ainfra directory. Is this how you would do it too?mirvnillith@reddit
To scope tagging/versioning. Or does a monorepo have a single global versioning sequence?
edgmnt_net@reddit
Do you absolutely need different release cycles for components? In many cases you really don't and they're not really independent components.
mirvnillith@reddit
You’re talking about seeing the monorepo ”product” as a distributed monilith? I.e. release managed as a whole but deployed as multiple services.
I’m not sure if a monorepo really solves the ”alignment problem” people claim it does. Will the changing team really complete transitions across the repo? Or will the ybuild something compatible and push tasks to others to catch up? I think the latter and then service repos would work just as well (while, IMHO, helping keep developers focused on the service at hand).
But I’m certain I’m against team repos!
edgmnt_net@reddit
Not all of the artifacts need to be deployables, but yeah.
Ideally yes, every dev takes care of an entire vertical slice / makes all the needed changes. Less ideally, there will be collaboration and contributions made by more than one person (perhaps using Git trailers to show co-authorship). Separate services won't work well because they don't let you make and record atomic changes for one thing, if splitting work is even possible. In many cases I think it's just wishful thinking that you can just isolate teams.
mirvnillith@reddit
I’d think that ideal would be a major limiting factor on size of the monorepo system as usage of something is usually context dependent and the providing contributor would find it hard to know enough of these different contexts to correctly implement their change consequences. (sorry for the long sentence ;)
edgmnt_net@reddit
I get your point, but this may be more reasonable than it appears. This is essentially how many open source projects work, including the Linux kernel: you are expected to make all related changes and avoid breaking anything. Secondly, while I get that companies may want to set a lower bar, this sort of thing is kinda essential for vertical scaling in software development IMO. Especially because you want to move fast, to deal with complexities beyond other fields and all that, so you can't really expect to work like an assembly line. And the other side of this coin is that devs who can do that kind of work have good opportunities. But if a company simply wants to pump money around and get a lot of random work done, yeah, maybe it's hard to find enough qualified people with the given budget. Those aren't the best jobs, though.
mirvnillith@reddit
Yeah, I guess my limitation is my current team surroundings where very high autonomy has bred isolation and ”PR-sniping” whenever you need an external change. But I also see a risk of knowledge spreading never really covering the need when being responsible for the long tail of a monorepo change.
That’s why I asked about more stretched out changes where compatibility is maintained while usage is gradually upgraded. And those upgrades could then be distributed to where the knowledge is.
angiosperms-@reddit
You can do hacky releases with a monorepo where you are also tagging the component.
The biggest reason not to use monorepos, if you are using GitHub, is that GitHub native support for monorepos is non existent and I do not expect it to ever happen because too many companies are charging money to make all those features work with monorepos now. For releases or required checks you are fucked unless you want to manage a custom solution or pay for something.
FortuneIIIPick@reddit
Martin Fowler popularized microservices and the notion of service per repo was born out of that. Overnight, monorepos, along with SOA (Service Oriented Architecture) went from being the norm to being disparaged as relics of the past.
seweso@reddit
Yeah, like I said, I never understood micro services.
It always felt like a way to handle multiple teams working together, giving each agency etc. But then shit doesn’t stay micro.
Great way to bill more hours though.
edgmnt_net@reddit
I personally think that even one repo per team is often too much. Realistically a cohesive project may have many people working on it, far more than the average team. And thinking that you can somehow isolate teams from one another is wishful thinking.
FortuneIIIPick@reddit
I agree, I think domains makes more sense and is far less ephemeral in nature.
tecedu@reddit
We are a team of 3 (Was 6 at some point), different libraries and code was used across different projects and common projects. So create a different repo, make it a package, there is already a CICD pipeline setup if you used the template, and then based on your pyproject.toml version and name; you just add it in your working project pyproject.toml. Do pip install and its installed.
No having to keep everything is sync, if a a project works on old version of utils and pipelines packages then just pin those.
BinaryIgor@reddit
Usually just resorting to the defaults; if somebody is accustomed to the repo per service and that's how they've worked in the past, they usually repeat that, even if all services are managed by a single team. Got to be especially vigilant when the project starts!
more_exercise@reddit
It also separates the concept of "change project 1" from "change project 2" when a single team maintains more than one project
kilobrew@reddit
It allows a huge number of teams to do a huge number of things while only touching what’s necessary. As long as everything is properly versioned, tested, and released it works fine. The people is people suck at paperwork.
ase1590@reddit
For a larger project, a main git repo that pulls in submodules is the correct way to go.
Any project sticking to a monorepo at that stage has a team afraid of git and/or has terrible project management that prevents coordination with other teams.
There are so many developers I have seen that cannot use git and instead rely entirely on the github desktop gui interface
bazookatroopa@reddit
Almost all large tech companies use monorepos. Git doesn’t scale well without you building out scaffolding and that’s best built around a mono repo or other tools like Mercurial. I love it for small scale though.
civilian_discourse@reddit
The largest, most complex and fastest moving software project in the world uses git (Linux). Git was literally built for it.
bazookatroopa@reddit
Linux’s repo is not a monorepo since it’s only the kernel. It also is only 40 million lines of code while some of the large monorepos are billions. The complexity of the software is not directly related to the limits of Git, which are mostly based on volume of code. GitHub tooling is already struggling with some of the largest repos like Linux kernel even with support built on top of Git.
Also when Git was made Linux was only about 2 million lines of code. Git at its core is not keeping up with the size of modern codebases without vertical scaling and infra layered on top. Especially now with AI tooling being more efficient working in a single repository and generating even more code faster.
civilian_discourse@reddit
I guess I’m not convinced that monorepos are an intelligent way to organize anything. I also don’t see any connection between AI and massive monorepos.
I get the impression that when you say large companies do this monorepo thing, the connotation is that if there was a better way then they would have done that instead. However, in my experience, large companies don’t optimize for long term efficiency, they optimize for short term solutions. Meaning, they plug holes with money.
So, sure, git doesn’t scale if you’re doing a bad job of breaking your repository up into manageable pieces and instead just trying to brute force everything.
bazookatroopa@reddit
I personally love Git as a tool and it’s my go to for most projects. It just has trade-offs that submodules don’t resolve. Submodules are good for specific use cases like 3rd party dependencies or rarely updated internal dependencies, but can become a shortcut instead of building robust infrastructure to handle performance and dependency management.
I have worked from small startups using many microservices split across multirepos with submodules and version management hell to large orgs with massive repos. The large orgs optimize for reduced risk too since failures hurt trust and cost them much more than it costs a small company. They have more short term demand for robust solutions here than a small company so you actually find they have better solutions without even needing to think ahead.
civilian_discourse@reddit
I agree that submodules can be difficult, but I have a hard time following you from there to monorepos are great. For instance, if you use a package manager, you don't have to have monorepos or submodules.
It just seems to me that embracing the monorepo is a total rejection of the SOLID principals at a high level. I'm by no means arguing that there should be tons of repos, there is a balance between the benefits you get from having a code in the same repo and the benefits you get from having code separated between repos, but the idea of the monorepo seems to me to be a complete rejection of any balance.
bazookatroopa@reddit
I agree that package managers are a big improvement over submodules in many ways, but they are not a complete solution and still introduce similar coordination problems. Even with internal packages, you often end up dealing with version skew where different services are pinned to different releases, making it hard to know which versions are compatible or safe to bump. Transitive dependencies can create a tangled web where updating one library forces a cascade of changes across many others. There is also operational overhead from publishing, tagging, maintaining changelogs, ensuring compatibility, and orchestrating releases in the right order. In other words, package managers shift the complexity around rather than eliminating it.
Monorepos are not inherently opposed to the SOLID principles because SOLID concerns how code is structured and how responsibilities are divided within the software itself, not how that code is stored or versioned. A single repository can contain well-designed, modular, independently testable components that fully respect SOLID, just as multiple small repositories can contain tightly coupled or poorly designed code. The repository model is simply an infrastructure and operations choice, not a design philosophy. In fact, monorepos often make it easier to maintain good design boundaries because all modules are visible to shared tooling. Teams can enforce ownership and dependency rules, run global static checks, and perform atomic refactors across related components. The benefit of the monorepo is that governance and automation become easier to layer on top since everything is accessible, consistent, and changeable in one place.
civilian_discourse@reddit
The larger a project/organization, the more operational overhead there should be. Operational overhead is a necessary thing when things scale large enough.
The intentions behind SOLID are more fundamental than code structure. It should be no surprise that there are similarities to the way you manage code complexity and other forms of complexity, I mean it would be wild if there weren’t.
We may have to agree to disagree here. Everything you’ve said about the monorepo just smells like overwhelming systemic complexity.
bazookatroopa@reddit
We can agree to disagree, but the main benefit of a monorepo is atomicity… you can make coordinated changes across all services in a single commit, keeping everything consistent and enabling shared tooling.
civilian_discourse@reddit
There is an inverse correlation between the scale of an organization working in the same repo and the capacity to coordinate synchronously/atomically safely across the organization. Or, in other words, synchronous coordination gets harder and more unrealistic the more people and things that you are trying to coordinate. The solution to this is to find places to break the organization down into smaller groups that can coordinate safely and effectively while using more formal asynchronous forms of coordination between these groups.
This to me is a fundamental law of coordination, you can find it referred to sometimes as the O(n²) communication problem. Attempting to subvert it is far more dangerous and reckless than acknowledging and embracing it.
bazookatroopa@reddit
I think we’re using the term atomicity in different ways. Atomicity in a monorepo isn’t about forcing synchronous coordination between everyone in an organization. It’s almost the opposite. Atomicity means that when a cross-cutting change needs to happen, it can be done safely, completely, and in one place, without requiring a huge sequence of meetings, dependency updates, or staggered rollouts across dozens of repositories. It reduces the coordination burden because developers don’t have to align version bumps or chase inconsistencies across separate codebases. The O(n^2 ) communication problem is real, but monorepos exist in part to mitigate it through tooling and process—automated testing, ownership rules, code review gates, and CI systems handle consistency asynchronously. People aren’t all coordinating at once; the infrastructure enforces consistency automatically.
That’s why many of the world’s largest engineering organizations operate with monorepos that contain billions of lines of code. It isn’t because they want everyone to work in lockstep..it’s because the monorepo model allows atomic, automated coordination without manual synchronization between thousands of teams. In that sense, atomicity scales better than trying to coordinate versioned multi-repo changes through human processes.
It’s similar to why databases evolved from ISAM-style record storage to ACID-compliant transactional systems. In the ISAM model, every application had to manually handle consistency, locking, and rollback logic. That approach worked at small scales but quickly broke down as data and concurrency grew. The shift to ACID transactions didn’t make databases “more synchronous”…it automated consistency so that developers didn’t have to coordinate manually across every operation.
civilian_discourse@reddit
Okay, I'm starting to understand I think. You're saying that it's possible to automate coordination rules to such a high degree that anyone can make changes across any part without necessarily needing to talk to anyone as long as it all passes the automated checking, right? So then the burden of making sure things don't break falls on the people updating and adding into that automated infrastructure more than it falls on the people who are making logic changes.
I think the work I do tends to be so much more on the visual/interactive/subjective side that such automated testing is impossible without some degree of visual inspection involved.
bazookatroopa@reddit
The database analogy is not perfect as some coordination is still required, but almost. Manual review processes are also typically part of the change management process. This kind of model also requires a lot of effort to build out the infra.
Your approach works far better without heavy custom infra investment. Most orgs don’t have this level of automation either outside of the largest engineering orgs where it makes sense to do so short term due to the scale, so we can both be right based on the need of the org and what kind of work you are doing. Your model definitely works when teams are working on completely standalone microservices, interfaces are stable, you want full release-cycle independence between teams, etc. There is less complexity around having to solve all the teams infra problems at once requiring less initial and ongoing investment at the trade-off overhead of each team needing to self manage (that gets expensive / risky at massive scale).
Adventurous-Date9971@reddit
Choose monorepo vs multi-repo based on change patterns and the infra you can realistically run, not dogma. If >20–30% of work touches multiple services each quarter, a monorepo with strong guardrails usually wins; if interfaces are stable and teams ship independently, multi-repo is simpler.
What makes a monorepo work: clear ownership (CODEOWNERS), per-folder CI gates, incremental builds (Bazel/Pants or Nx with remote cache), and dev ergonomics (sparse/partial clone). What makes multi-repo work: contract tests (Pact), version policies, and a release train for shared libs.
A hybrid I’ve used: platform monorepo for shared libs/schemas and wide refactors; product services in their own repos pinned to platform releases; schedule quarterly syncs and maintain deprecation windows.
I’ve used Bazel and Nx for builds; for cross-repo interfaces, Kong for routing and DreamFactory to generate stable REST from databases so teams weren’t blocked on each other.
Pick the model that fits your change topology and the ops budget you’ll actually fund.
ase1590@reddit
and almost all large tech companies I've been at have poorly managed cross team coordination. Hell, I'm in a battle now with teams within a certain fortune 100 company trying to yell at devs to stop just randomly adding things with NO coordination with other teams.
bazookatroopa@reddit
If they’re a planet scale tech company they usually have over aggressive auto detection built into their infra to prevent that at merge time and require the other teams to approve + the dependent areas test infra to pass
ase1590@reddit
Sure.
but the problem here is that teams have become silo'd
so the approval chain is now vertical and while one team can implement things, it can now be done without talking to other sides. Like I said, tech solutions cant fix human behavior issues lol. its a fools errand.
bazookatroopa@reddit
I think we’re in alignment on this. That can’t be fixed by automation regardless of the VCS or infra and requires more cross-team collaboration/ leadership.
BinaryIgor@reddit
Having work in both, I definitely prefer mono repos or at least mono repo per team/domain. It often makes tracking various dependencies easier as well as just visualizing and understanding the whole system you're working on.
On the flipside, mono repos can get pretty huge though - but that's a concern for a very few systems.
Commenting on the article:
You can do the same with mono repo; it just makes CI/CD a little more complicated :)
TheWix@reddit
I'm in a giant mono repo right now and I hate it. The backend is C++, the middle layer is C# and the front end is React. The build takes 2 hours and the git history is annoying to work with.
I prefer repos per app/domain, not team. Teams are too ephemeral.
seweso@reddit
What does the mono repo have to do with your bad build system? How on earth do you even get to the point of a 2 hour build? That's a feat. You can parallelism your build with a mono repo just the same.
And even if you don't use subtrees, most tools allow you to look at the git history of just one folder. So i don't get the git history annoyance. That also has little to do with a mono repo.
TheWix@reddit
Fair point about this app and its build process. It's a C++ app for FX. I don't work on the C++ side but it's a combination of build and unit tests. It's awful.
A changeset will often cross multiple folders and with several dozen devs in the monorepo it becomes hard to get a picture of how your app is evolving. It's especially bad with shared dependencies. You are inheriting changes without necessarily being aware of it. You need good unit testing coverage to catch that and more often than not they are lacking.
Then all the customized processes you need for builds and deployments so they target specific folders or tag conventions.
For me the juice isn't worth the squeeze just so it's easier to manage shared dependencies.
ilawon@reddit
I have the same problems where I work but its not a monorepo. Maybe the problem lies somewhere else?
We additionally have the issues of integrating all these services that depend on each other into something we can deploy and test without any breaking change.
TheWix@reddit
If they are libraries and you are publishing them through a package manager at least you can see the dependencies being updated.
If you have many APIs that all depend on each other then it could be that you have a distributed monolith.
ilawon@reddit
From experience that only exacerbates the problem: a simple change will require at least two PRs and we end up having to do release management of packages in addition to the problem it's trying to solve.
It's a single product comprised of dozens of micro-services. We can kinda group them in individual, self contained, functional units but, more often than not, a single feature can span quite a few of them and it's hard to coordinate changes.
TheWix@reddit
If it's a shared package it should be a separate PR because it affects more than one library. You should also ask how much code you should be sharing between microservices.
Additionally, if you have services that depend on each other you don't have microservices, you have a distributed monolith.
Microservices are about extreme decoupling and independence. When you start sharing code, or depend on another service at runtime you lose that. This might be the cause of your issues.
When I do microservices they very, very rarely call another microservice, and the only libraries they share are thin ones for cross-cutting concerns. These will rarely change and when they do they better be thoughtful and well tested because they can break multiple services.
ilawon@reddit
That's taking it up too far, in my opinion. How do they even work together? Are each and single one of them a product?
TheWix@reddit
I should clarify this by saying avoid 'synchronous' communication between services.
Each microservice is not a product. They are just parts of a larger product.
The issues you are describing are exactly what happens when you diverge a lot from that independent requirement of microservices. It's why I caution people about them. Monoliths are fine. Distributed monoliths are an anti-pattern.
ilawon@reddit
There's still a contract to be fulfilled. And if you don't treat it as a product within the product you risk having changes that break that contract, and I don't mean the communication details only.
I think there's a video somewhere of a developer explaining to a PO how hard it would be to say happy birthday to a user in the front-end because of all the places that had to be updated.
TheWix@reddit
I guess it depends on what you mean by "product"? They should be owned by a single team, and that team and is responsible for the quality of the microservice. But I'd never expose a microservice directly to a client.
That's the tradeoff with Microservices. If you don't need the independence then go with a monolith. Too many people think they need it but in reality they don't and end up making distributed monoliths which are the worst of both worlds.
ilawon@reddit
Independent. Defined contract, has a product owner, backlog, release process, etc.
If other components depend on it: it's a product if you need to create a change request (separate epic/user story. If the teams just go ahead and change it directly when needed then it's just a component.
edgmnt_net@reddit
It's best if people take care of everything and entire vertical slices when making more impactful changes, you simply don't let them merge breakage. Things like strong static typing and semantic patching can help tremendously with large-scale refactoring (unit tests aren't the only way to get assurance). Which becomes very difficult if you split stuff across a hundred repos, in those cases people just don't do refactoring anymore, actively fear it and you get stuck with whatever choices you made a year ago.
Several dozen devs is nothing, really. Projects like the Linux kernel have thousands of contributors per release cycle and do just fine with a single repo because they do it right.
JDublinson@reddit
Try building Unreal Engine from source
martinus@reddit
We sometimes have 5 hour build times. The problem is we need to build on lots of different platforms and scaling some of these isn't easy. It sucks.
BinaryIgor@reddit
Yeah, the git history could definitely become a nightmare - with mono repos, you must have convention there, otherwise it becomes a mess; in single repos, you don't have to bother that much, since they are much smaller and focused.
As far as builds are concerned that obviously depends on the particular mono repo setup; but usually, you change only a small part of mono repo and only that part needs be rebuild + its dependencies
TheWix@reddit
You'd think... I've experienced this at three companies so far.
You also get into a weird situation of apps being versioned together, but not? You can have n apps in a single repo, all on different release cycles, but when you create a tag you aren't just tagging one app. All the apps are versioned together because tagging is repo-wide.
Monorepos kinda fight the paradigm a bit when applied to non-monoliths. You need to create more processes around it to make it work.
BinaryIgor@reddit
True, but you definitely can make it work; there are tradeoffs to both models, as usual; I've worked in mixed model as well - a few "mono" repos and it worked pretty well.
But if you have lots of technologies and independent apps, it probably makes sense to have many repos :)
TheWix@reddit
I like the mini-mono repo approach. Where the apps have a similar scope rather than just shared dependencies, because it means they will likely change for similar reasons, and keeping them together makes more sense
Difficult-Court9522@reddit
And I hate the global (planetary) mono repo. There is just too much history. I can’t see if anyone touched my shit if there are 1000 commits a day.
Kered13@reddit
Then you need better repository history tools. You should be able to easily view only commits that touched a given file or directory.
Difficult-Court9522@reddit
Yea. We’re still using mercurial instead of git :(
Kered13@reddit
There are tools like this for mercurial.
hg log [FILE]on the command line will show you the history of that file or directory.hg servewill launch a handy local server that lets you browse the repository history in your web browser. If you use TortoiseHg (I love it) then you can launch this server from there as well.thatpaulbloke@reddit
And that's the part that has soured me on monorepos - if I have a set of utilities in a single repo with responsible teams that are actually doing version control then they are getting a notification that Utility1 just went from version 1.3.12 to 1.4.0 with change notes of "here are a bunch of changes to Utility2". Even more fun than that is when someone has made a breaking change to Utility3 so now Utility1 and Utility2 both just went from 1.4.0 to 2.0.0 without any actual changes to either.
If you end up in a situation with five hundred repositories then it can get unwieldy, but if your repos are vended out from a control space (personally I use Terraform Cloud, but there's dozens of options) and thus cleaned up when out of use it's not really that bad.
Hacnar@reddit
I've worked on a monorepo that combined a big C++ desktop app, a C# server app and a js frontend. We've set up multiple pipelines, each for its given part of the repo. Each had its own triggers, so if only frontend was needed, the rest of the pipelines didn't have to run.
Ayjayz@reddit
Build time is completely independent of how many code repositories you have.
UMANTHEGOD@reddit
If my builds took more than a minute or two, I'd probably just kill myself (in a video game).
sionescu@reddit
What is "the build" ?
edgmnt_net@reddit
Yeah, then you switch to separate repos and you run into other problems. You can no longer make atomic changes and you have to make 5 PRs that need to be merged in a very specific order to avoid breaking stuff. Stuff like CI is also difficult, how do you even test pieces of a larger logical change (there are possible answers, but if you say unit tests I'm not convinced).
To deal with large projects I'd say stuff like incremental rebuilding becomes essential. And history is as good or bad as you make it, if people are in the habit of just dropping huge changes (possibly containing binaries or various other data), then it will suck anyway.
recycled_ideas@reddit
A successful monorepo requires two things.
The second is the most important thing. If you have direct dependencies between code and you update the dependency you need to update the code that depends on it at the same time. If you can't or aren't allowed to do that then a direct dependency is a disaster and a monorepo is a terrible idea.
RabbitLogic@reddit
For number 2, anyone should be able to file a PR but codeowners are an important tool to maintaining code standards for a team/group of teams.
Also not everything has to be a direct dep, you can have published package live in the monrepo which is only referenced by a version published on a package feed.
recycled_ideas@reddit
If you have separate code owners and a package feed what the hell is the point of having a monorepo?
UMANTHEGOD@reddit
I'd say CODEOWNERS is often very necessary to run a monorepo successfully.
RabbitLogic@reddit
Developer experience is vastly improved if you work in the same repo with easy access to the abstractions you build on top. If team A owns a library a developer from team B is more than welcome to propose a change to said library but ultimately it is a collaborative effort.
recycled_ideas@reddit
Based on what evidence? Sure it can be helpful to be able to see code you're working with from time to time, but if you're digging into code that someone else owns often enough that it's beneficial to be in the same repo someone isn't doing their job properly.
This is the whole damned problem.
Monorepos aren't free, the companies that use them have had to make major changes to git just so they actually work at all.
There are advantages to them, but those advantages come from developers being able to easily see the impact and make changes across projects. That's why places like Google do them and why companies like Microsoft do not (because they don't want or allow those sorts of changes).
You have to have a reason why you want a monorepo.
sionescu@reddit
The most successful monorepo on this planet (Google's) has neither of those.
recycled_ideas@reddit
Google absolutely allows changes across projects and teams, that's why they made a monorepo in the first place.
sionescu@reddit
It allows, with strict code owners' approvel.
recycled_ideas@reddit
Google requires PRs for everything, which is sensible and rational and not remotely out of line with what I said.
But Google explicitly uses a mono repo because they want breaking changes to be fixed up immediately by the person making the changes (with the input of others if required). That's the whole damned purpose.
If you're not going to allow changes across projects in the monorepo then breaking changes will break the repo and you can't have direct dependcies. If you don't have direct dependencies then what's the benefit of a monorepo in the first place? Just yo not have to type gut clone more than once?
sionescu@reddit
No, it's completeley out of line. You said "If you have separate code owners and a package feed what the hell is the point of having a monorepo?". That means you believe that a monorepo and strict code ownership are in conflict, and I gave you an example of the most successful monorepo in the world, which goes precisely against what you said.
In the Google monorepo, all engineers are allowed to propose a change, anywhere. That requires strict approvals from code owners: usually one person, rarely two in sensitive parts. Code ownership is essential in a large monorepo.
recycled_ideas@reddit
If you are using a package feed you have no direct dependencies and your code gains nothing from being in the same repo. Period. If you're using code ownership as a counter argument to my original statement (developers need to be able and allowed to make changes across projects) you're talking about a different kind of ownership than Google uses.
Again.
The entire fucking reason that Google has a monorepo is so that if a change is made in a dependency that any downstream errors are detected and fixed right away.
The PR approval process they use is largely irrelevant. You could argue that in a company with strict PR processes all any developer actually can do is propose a change.
Ravek@reddit
If you’re making a breaking change to a dependency then yeah you need to update the dependents. This isn’t suddenly different if you use git submodule or git subtree.
recycled_ideas@reddit
Well no, but if you can't make that update then you should be using library versions and not a monorepo.
centurijon@reddit
It comes down to how big the team is, honestly. Monorepo is great until you have 10 different devs trying to merge their own features at the same time and 3/10 didn’t pull updates first. That’s when it’s time to split into multirepos
codesnik@reddit
but deployment could become LESS complicated.
BinaryIgor@reddit
In mono repo you mean? I guess there are some abstract build tools like Bazel; but I would argue that they add lots of complexity
bearfromtheabyss@reddit
monorepos r great but coordination between packages gets messy
we use https://github.com/mbruhler/claude-orchestration for our build pipeline:
(build_pkg1 || build_pkg2 || build_pkg3) -> integration_tests -> @release_review -> publish
parallel builds (||) for independent packages, then sequential integration. the workflow syntax makes the dependencies explicit which helps new devs understand the structure
TheoreticalDumbass@reddit
ive heard git submodules are bad, im starting to think ive been lied to
jacobs-tech-tavern@reddit
I find it such a weird sort of religion what kind of git strategy someone has. Most places I work, there'll be someone who really really really fucking cares about it, and then everyone will just go along with what they want. I've never really felt it mattered that much.
actinium226@reddit
You suggested that monorepos are best for small teams but I disagree. Some time ago I worked for an auto manufacturer and all the code for multiple vehicles was in one large monorepo. What was really great about it was that you could use a single githash to refer to "the code on the car." The devops team did an amazing job where you could push your change for your ECU (Electronic Control Unit, automotive term for something with a microcontroller, there can be dozens on a modern vehicle) and trigger a full build, and within like an hour you have something you can OTA to a dev car.
Many Silicon Valley companies like Google, Facebook, and X/Twitter are well known for having large monorepos.
sionescu@reddit
The article is misusing the term "monorepo". It means that the entire company is using a single repo, not that a project is using one repo instead of one per component.
FortuneIIIPick@reddit
So, I disagree with the claim that the term is misused.
I just checked with Gemini, OpenAI and Grok, all three agree most people do not assume monorepo means one repo per company. They also agree a monorepo is more than one project or library per repo, organized per team, or per domain, or per organization and in the case of FAANG level, may be the entire company.
This also follows my experiences with the term at many companies.
sionescu@reddit
AI bro. Blocked.
Used_Indication_536@reddit
I’ve found that that a lot of articles these days conflate terminology left and right. It makes understanding wtf the author is talking about so difficult sometimes.
BusEquivalent9605@reddit
Git: fine grain track all of your changes. Revert to a given system state in a single command.
my team: lets create 50 small, independent repos with nonsense names that require certain versions of each other and nowhere is a full working state ever documented
Cheap-Economist-2442@reddit
We must work at the same place
conventionalWisdumb@reddit
It was nice seeing you both at the last all-hands!
IlliterateJedi@reddit
Sounds like you need a Galactus service to tie your nonsense named services together
TheWix@reddit
Too fine grain is problematic, as is too coarse grained. Also, sharing too much between apps isn't ideal. I agree with sharing cross-cutting concerns, but we tend to share domain objects and things like that, as well. Depending on your architecture, that isn't ideal.
In C# devs automatically create a project (translates to a dll) per layer which is overkill, especially for smaller projects. Folders and namespaces are often enough.
FullPoet@reddit
I have never seen this in all my time in a professional C# dev.
Do people just really go on the internet and extrapolate their few poor experiences as a universal norm?
daringStumbles@reddit
Project not repo. App is bundled into a solution of multiple csproj files.
This is incredibly common and is what I've seen across 6 different companies in completly different sectors.
So common, that I once had a "senior" .net guy (he was at least 60) spend 20 min reaming me out because he couldnt open a dotnet core project because I didnt add the sln file that just wraps the csproj files because he didnt know dotnet core doesnt need the sln file.
FullPoet@reddit
They specifically said project, so a .csproj project. I did not mention repo, at all.
daringStumbles@reddit
I made an assumption you were equating them because of how widespread the practice is
FullPoet@reddit
? Are you the same type who confuses monorepo and monolith?
It isnt that widespread, if at all.
Again, I think people have this weird issue where they think that what they see locally or regionally is true everywhere.
lelanthran@reddit
Depends on how long your time was. I worked on a large Windows app in 1999, and even though it was in C++, it was still one .dll per layer, with each layer limited to calling functions only in it's own layer or one immediately below in the hierarchy.
This was a pain when a layer had to call 2x layers down as it meant a wrapper of some sort in the existing intermediate layer, but it was almost trivially easy to maintain, especially as each layer was a versioned .dll.
TheWix@reddit
Been working on .net for 20 years. I've seen it everywhere. Fair enough, if it isn't true and I just have bad luck.
MadRedX@reddit
I've been working at .NET shops for 5 years and I've seen both sides.
One place was primarily using Java backends for everything but one Customer Info Web API is that was in .NET for some reason. It might as well have been a Java structured project with all the folders.
Next place used ASP.NET only, and they were a single project too with a folder.
Current place uses a project per layer - it initially single sourced the data layer for both a website and desktop app, but then the devs said "fuck best practices, we're going fast" and all of the abstractions & respect for layers have been bastardized to hell.
walterbanana@reddit
People take "Don't repeat yourself" too seriously. I don't care if 5 microservices have copies of the same couple of functions repeated. Dependencies are more painful to deal with that copied code if the scope is limited
edgmnt_net@reddit
I think even coarse-grained services are pretty hard to get right under usual circumstances. Because most projects are morally just cohesive apps and it's difficult to split them into independent parts that are robust enough to avoid going back and forth with changes between repos. It also happens that some splits prompt other splits due to legitimate needs for sharing, so coarse-grained has a tendency to devolve to fine-grained. And eliminating sharing is far from straightforward: sure, you can implement the same logic over and over in 20 different services, but they're heavily coupled among themselves anyway and the result is very brittle. To some degree you can treat that as a contract, but whether it's reasonable or not is debatable.
Good open source libraries out there have long-lived contracts and you don't go changing things back and forth between your code and the lib. This is because they're inherently robust and general, unlike the average enterprise project which has very shifty requirements and ad-hoc purposes.
SirClueless@reddit
I agree with this. Splitting things up into separate repos only works if you commit to backwards compatibility at the interfaces between them.
If you aren't willing to commit to that, putting things together in the same repo and testing and releasing them together is the cheaper and easier option. Doing that is a luxury afforded by working at the same organization for the same stakeholders, it should really be the default in more places.
wellnowidontwantit@reddit
If you have good devs, everything is enough. If not, then you look for ways to enforce some boundaries. That’s probably the main problem: “it depends”.
xtravar@reddit
Yes but, even good devs cut corners sometimes. Part of setting up boundaries is for myself.
serrimo@reddit
How the fuck else do you put 20 years of large scale microservice on your CV?
Shogobg@reddit
It’s all the same service, just copied 40 times.
chickpeaze@reddit
one repo per version
tanaciousp@reddit
do we work at the same place? Copy pasta architecture drives me nuts
spaceneenja@reddit
It’s micro and the principal says to do it. End discussion.
kenlubin@reddit
Why have one microservice to handle authentication when seven will do?
kintar1900@reddit
Funny you should say that. The application I inherited when I took my current position was billed as a web application backed by a micro-service based API. Upon opening the API repository, I discovered a pseudo-TypeScript (because most things get cast to
anyat SOME point in the call chain) monolith HTTP application with a full router and middleware stack that is compiled to a single minified JS file and deployed as 68 different lambda functions, each of which is passed the full HTTP call via API Gateway http proxy integration.I am truly in awe of this gigantic pile of gilded crap.
FenrirBestDoggo@reddit
Uhm sir, thats called a distributed system, duh
serrimo@reddit
Holy! Why didn't I think of that?
I could have put 25 years of massive scale distributed system as well on my CV easily too.
BrainwashedHuman@reddit
Don’t have a choice if that’s what jobs are requiring to consider your application.
Packeselt@reddit
🥲
OrphisFlo@reddit
That's a distributed monorepo. All the inconvenience and none of the advantages of a monorepo.
equeim@reddit
Full working state is the current commit in the master branch which also includes specific commits of all submodules. If you merged it then it passes your CI and tests (which you of course have, right?).
Mazo@reddit
I often say we have a distributed monolith rather than microservices for the same reasons
fumar@reddit
Yeah that's what we do and it sucks.
BlueGoliath@reddit
Git modules, contender for the most half baked garbage feature in existence.
RadicalDwntwnUrbnite@reddit
They're like the regex of version control. You have one problem, so you use modules, now you have two problems. I'm sure they have their place but I've just never encountered a repo where they made life better with them than without
BinaryIgor@reddit
Yeah, they have a lot of their things to be aware of, but they often are a quite useful way of sharing some code when you don't have or don't want to maintain infrastructure for shared libs for example; or for the kind of code/files where sharing and versioning is simply not supported yet
donalmacc@reddit
At work we use Perforce (games). The solution is to vendor everything. We have compiler toolchains in source control. Need to build the version of the game from 12 June 2021 for some random reason?
p4 sync //... && ./buildand go make lunch.They're a great idea, but they come with so many footguns that I genuinely don't believe that anyone defending them has tried just vendoring everything instead!
BinaryIgor@reddit
What do you mean by vendoring in this context? Not familiar with the term
thicket@reddit
“Vendoring” usually means including all the source for a dependency, so your project retains control of build and versioning. Many of us avoid it because it can be binary-heavy and you don’t get any bugfixes in the dependency. OTOH, your build never breaks because of some upstream change.
donalmacc@reddit
Agreed, but in the context of games the compiler toolchain is 3GB compared to the 600GB of content that is required to boot the editor…
Only if you don’t update, which is true if you’re using a package manager or sub modules. Updating is simple - delete the old directory add the new one and submit to source control.
thicket@reddit
I'm totally sold on vendoring when conditions are right. And, like you say, when it's done right, there are whole classes of problems that just... disappear.
It's also worth talking through pretty carefully with new or naive developers. Sometimes people's first instinct is "I need this thing, so I'll just copy it over here in source control" and that kind of thinking can cause big problems.
When I'm interviewing candidates and ask a question, I almost never want a "definitely A" or "definitely B" answer. Most of the time I'm looking for "it depends..." and a conscious list of trade-offs involved. It sounds like you guys are very conscious of the trade-offs involved in vendoring and have found use cases where it's the superior solution
SippieCup@reddit
Git-lfs technically versions binaries. It just doesn’t diff them.
donalmacc@reddit
Git LFS is a bolt on. It removes the D from DVCS for git, which is (apparently) one of the main reasons to use git.
SippieCup@reddit
That’s why it’s a technicality.
But I agree that proforce is a better choice for games or stuff with heavy assets where you might switch between those assets a lot.
edgmnt_net@reddit
It's awful and causes a lot of trouble. Like how do you even review a PR that changes the compiler and drops thousands of files in place? There are ways, such as checking if unpacking the compiler yourself results in no diff, but it still kinda sucks. Much better if you have proper dependency management and you simply point at the upstream source or something like that.
The better way to do that is to have some sort of fully-persistent proxying cache for dependencies.
lood9phee2Ri@reddit
It's an odd term, I don't like it either, especially as to me it sounds like almost the exact opposite of what it means. "Vendoring" has come to mean roughly when you pull your upstream deps into your own tree and potentially maintain them yourself instead of using an external dependency on some upstream project maintained by a vendor (or in the modern era some open source project).
i.e. if you need vendor's project foo, you basically fork it into your own codebase in /vendor/foo or something instead of using it as an external dependency.
But by the sound of it it sounds like you're deciding to rely on external vendor instead of keeping a local copy. That is exactly not what it currently means.
Advantage: you're shielded from some potential upstream bullshit, no matter what happens upstream you have your own working copy.
Disadvantage: you don't pick up upstream's non-bullshit automatically, if you make local changes you're stuck maintaining it yourself etc.
In context various open source projects themselves often have minor "vendored" dependencies.
https://stackoverflow.com/questions/26217488/what-is-vendoring
Given how cheap git cloning is, I tend to do something in-between, have a local git repo clone from the upstream for safekeeping but don't munge it into my own main repo. And I don't like monorepos they're unwieldy and no-one gets them "right" because they're not a multinational corporation with an entire full time team just looking after the precious monorepo, they just cargo cult them because they heard google does it or something.
edgmnt_net@reddit
Vendoring means dropping the dependency right into the sources, but if you don't vendor it doesn't mean you can't keep a copy elsewhere. Caching dependency proxies can do that. You only keep a URL/handle and maybe a hash and build recipe in the repo, then if the upstream source disappears you still have it mirrored on the company's servers.
There are two somewhat different notions of a monorepo. Google is probably the one where they just shove literally everything into the same repo, even tooling and completely separate projects (sometimes only to avoid multiple clones). Another is for monolithic projects and their repos, like the Linux kernel. The latter is just very straightforward and poses few problems.
donalmacc@reddit
I don’t love the term either but we’re stuck with it!
We use perforce for source control, and the process for vebdoring something is: Check in the unmodified source into a clean location, update the deps in the clean location and then merge the update into your working tree. If you have modifications to the library then you would keep a “dirty” tree with your modifications and introduce that between the clean third party source and your project.
timewarp33@reddit
Versioning
arcanin@reddit
In JS we can enable something like this with Yarn. It has some drawbacks when you keep the same high-velocity repository for more than 5-6 years, but it holds surprisingly well until then.
Blueson@reddit
They do fullfill a need, as proven by their usage.
But managing them is a pain in the ass and would need a full revamp.
edgmnt_net@reddit
I suppose they could be improved. However, what we really need is dependency management and that's better done separately. The only thing this has going on is that submodules support is present if Git is present.
lottspot@reddit
They are neither half baked nor poorly thought out. They simply weren't designed with your problem in mind, so you should probably stop trying to shoehorn them.
BlueGoliath@reddit
POV: you have the most common sense use case for modules
Reddit: wAsNt DeSiGnEd FoR yOuR uSe CaSe.
lottspot@reddit
Yes, from your own POV I'm sure you believe whatever your unexplained use case is constitutes "common sense" (whatever the hell that means). That still doesn't mean submodules were built with it in mind (some features are actually built for niche use cases, believe it or not!).
I should probably work on coming around to your perspective though, because I'm sure the way more reasonable explanation is that the people who build git are a bunch of bumbling morons who don't know how to design software!
alohashalom@reddit
> Without that, people can make changes in places they shouldn’t.
CODEOWNERSfiles help define who reviews which parts of the repository.This doesn't prevent jack shit
guygizmo@reddit
I've bounced around between all of these methods and every time I find them lacking because of the compromises involved. This article does a good job of laying it all out. I feel like there's an approach out there that would be better than all of them.
What I wish we had was submodules, but with them not sucking. I think the problem with them is largely that of their UI, because it puts too much burden on the user to remember niggling details, and makes it far, far too easy to make mistakes.
Make them a mandatory part of the workflow. You always pull them with the repo, they always update with checkouts, unless you explicitly say not to. Remove them as something the user has to keep in their consciousness except in those instances where they are explicitly working with it. And in those moments, have an actual good interface for working with them, so that when you change a file in a submodule, it's easy to make a commit in it, making it very clear what you're doing and which repo it's in, and have the parent repo track it. Don't let the user simply forget any important step. And for the love of god, introduce something so that a submodule is tracking more than a single commit hash, divorced from any context.
It's basically a matter of figuring out how to make them always "do the right thing", which of course is easier said than done. But clearly right now they aren't even close to doing the right thing, and it ought to be fixed.
alohashalom@reddit
Why git developers after a million years still don't want to fix this is beyond me
bazookatroopa@reddit
The best way to use Git submodules is not using the versioning and having everything always on head of main branch… which is basically a mono repo. The only problem with mono repos is that Git sucks at performing at scale since it was designed for open source projects so they do these workarounds.
iga666@reddit
the problem with submodules - they are broken and nobody going to fix them. you can not revert to any previous commit and expect to have repository in a consistent state. also git tools work bad with submodules, committing changes in all of them is hell of a ride. i use submodules because i have to, otherwise maybe better use monorepo.
Digitalunicon@reddit
Monorepo = unity, Multirepo = independence, Submodules = pain.
lolwutpear@reddit
What does multirepo look like if you're not doing submodules?
Maxatar@reddit
It works the same as using a third party dependency.
loup-vaillant@reddit
Which by the way is probably how companies should work in the first place.
darthwalsh@reddit
rysama@reddit
Your company’s custom implementation of submodules.
opello@reddit
I imagine independent packaging and then resolving dependencies with a package manager.
NetflixIsGr8@reddit
Chaos and version mismatches everywhere if you have no automation for knowing what matches with what
RecklesslyAbandoned@reddit
Google Repo feels like it does a moderately good job of packaging up submodules and generating a full project tree.
Shame it still suffers from issues with it being oddly configured by vendors that lead to headaches.
Sebbean@reddit
I love submodules
Digitalunicon@reddit
Strange!!
beaverfingers@reddit
Lmao
disperso@reddit
Same. I learnt how to use submodules by tracking up to 100 vim plugins into my config. I ended automating some details with some aliases, and I've never had a problem. I rarely need to alter those repositories, but sometimes I do (as some of those plugins are of my own, or I have to switch to my own fork for a PR or some other reason), so I think I've used them in a pretty standard way.
I still have not seen anything better than submodules. Perhaps some day, but so far, I don't see any alternative. I like git-subtree, but for other, perhaps more niche, cases.
OnkelDon@reddit
svn externals is the perfect blueprint. You would get the handling of a monorepo, but can still have independent repos also referenced in other projects.
pt-guzzardo@reddit
The problem with submodules is that you have to convince your coworkers to learn like two new commands and that's like pulling teeth.
Flashy_Current9455@reddit
No kinkshaming here
BlueGoliath@reddit
Not to their face anyway.
Firstevertrex@reddit
Unless that's their kink
twigboy@reddit
I tried out submodules years ago for a personal android dev project
I get what it's trying to achieve, but even at that scale I decided it wasn't worth the pain.
timwoj@reddit
I've learned to not mind submodules in a project I work on. The only thing I don't like about them is that they break
git worktree.nerooooooo@reddit
you mean they get duplicated across worktrees?
bladeofwill@reddit
Submodules can be a pain, but sometimes the alternatives are worse and they should be set up in a way that most developers don't need to think about them too much.
I helped set up a project where we had github actions set up to keep everything in sync for the develop branch automatically, and each developer would just need to worry about using a 'core' submodule along with their project-specific submodule for day to day stuff. Multirepo was impractical due to licensing costs for the environments, making reuse from the core module a manual copy and paste process, and general overhead problems while monorepo was fine most of the time when it was one or two teams working on projects but it quickly became problematic when Team A needs to release a feature on a specific date for legal reasons, Team B has changes in develop that haven't been fully tested yet, and Team C has introduced a UI bug in existing functionality that might not be a blocker but we'd need stakeholders from A & C to fight it out to see if they'd rather delay a release or release with a minor but very visible UI bug. Modules gave us the flexibility to push everything to the dev/QA environment for testing but more easily fine tune what modules actually get updated when we released to production.
ILikeCutePuppies@reddit
"Monorepos work best for small to medium teams where services are closely connected."
The largest companies use monorepos for very good reasons. Merging lots of individual branches becomes a nightmare when a team gets to large.
Google has issues where merges were talking 6 month to get in because as soon as a commit went in, everyone else would have to update and the futher down the stack you are the worse the changes would get - and no you can't be 9 branches away with 200 developers between you and mainline and keep branches all in sync (in most cases).
You should read How Google Tests Software.
jathanism@reddit
Monorepos are the most surefire way to ensure that cross-app dependencies proliferate and tightly couple everything to everything. No thank you.
Difficult-Court9522@reddit
Global Monorepo = a single search result through the history takes at least 10 minutes. Multirepo = undocumented dependence. Submodules = documented dependence.
OlivierTwist@reddit
What about vcpkg?
Messy-Recipe@reddit
I worked at a place that had separate repos for all our applications & shared libraries, BUT we also had custom perl scripts that we were supposed to run that would checkout an identically-named branch in each repo, & file PRs for any project that had changes on your branch.
Also if we changed those shared libraries & then did a local build, it would publish that artifact, so that people working on a different branch ended up automatically pulling your local edited version! What fun!
Also the PRs would only be generated AFTER hours-long browser automation tests passed. If they passed. They were more like 10-40 minutes, but with a dozen or so apps, multiple people working on their own branches, & not enough runners for even one branch to run them all a the same time, it was hours. Then our CTO would 'review' them (rubber stamp). Good luck if multiple changes went in that passed individually but not when combined! Good luck if you had merge conflicts!
We eventually basically rebelled & versioned the libraries & stopped doing the pretend-monorepo stuff (since it wasn't one...) or using any of those scripts at all. & more normal testing setups etc, and actual devs doing code reviews. But took a long time to get it all in place
Anyway, all that aside, don't use git sub-anything, almost nobody will understand it. I say that as the person whose usually the best at git on the team
Sify007@reddit
Probably worth mentioning git subrepo as a third way to bring in dependencies from outside and maybe compre it to submodule and subtree?
more_exercise@reddit
It's worth noting that subrepo is an external, high-quality tool. It's an extension beyond what bare native git clients can do.
ElectrSheep@reddit
Subrepo is basically submodule/subtree done right. This is what I would be reaching for in scenarios where mono/multi repo isn't the best option. The biggest issue I've encountered is performance due to the usage of filter branch, but it's not like that can't be patched.
RelevantEmergency707@reddit
Submodules suck TBH. We tried managing separate config in different repos with submodules and it often broke in unexpected ways
bazookatroopa@reddit
Mono repos are the best for large orgs… except Git sucks at them because it’s not performant at scale. Multi repos becomes a spaghetti hellscape of versioning and splitting. Submodules barely alleviate that problem.
kilobrew@reddit
To all developers who started in the last 10 years and proclaim that mono repos are the only way to go.
I hate to break it to you. But you will eventually learn a hard lesson that has been learned multiple times in the last 50 years of software development through tools like SVN, Mercurial, SourceForge, CVS, etc…
Single repo has its benefits, and it’s horrible, horrible problems. Just like every other technique.
ZZartin@reddit
Dependencies should all be in one repo.
Git makes it relatively painful to switch between repos and branches so yeah you need everything in one place.
Chance-Plantain8314@reddit
Like absolutely everything in software engineering, the answer is: it depends
r_de_einheimischer@reddit
There is no right answer to this and I am a bit annoyed how in articles or conferences, there is people pandering a specific approach as a general advice.
Look at how team structure is, what you are actually developing, how you are staffed etc and decide. If an approach doesn’t work for you, change it.
captain_zavec@reddit
I didn't know about subtrees, that seems like a good feature to be aware of. Thanks!
AWildMonomAppears@reddit
Mono repos ftw. If you have a lot of people you might want to split it per team.