Our main project nowadays is a microservice-infrastructured system. Being microservice based, we automatically thought the best way to manage its code is in different repos. Cross project libraries were consumed using Nuget packaging. Everything works automatically, we have CI pipelines for the Nuget packages, for integration deployment and testing and we even have a template to start with a new microservice.
Everything was working well.
Two changes arised that needed our attention:
- We decided to change all the microservices to work under Docker.
- We decided to change all logs to JSON format.
These both changes are an example for cross-repo changes that touch the lower level of infrastructure and not the service logic itself. We would want to make each change in one branch and we would want to have the ability to revert all the changes in “one-click”. This is a symptom of the multi-repo drawbacks, and this is the topic for today.
Multi-Repo vs Mono-Repo
Let’s start with terminology; Multi-Repo is when you have a different code repository for each package/project/library (we’ll call it “unit”). Each repository is self managed, has its own branches, commits, branch-policies and more. Mono-Repo is when you have one code repository for all your units that are related in some way. Might be a business relationship, same system or any other connection.
Important: Don’t confuse Mono-Repo as Mono-Solution. I’m not talking of having all the code in one solution, that’s a bad approach in my opinion. A Mono-Repo will have, in its root directory, a folder for each project. It can also contain metadata information such as a .gitignore file.
Let’s see the difference between the two approaches in different aspects
Where is the code?
When using Mono-Repo approach and a new employee enters the team, he will clone one repository and will be able to see the whole system. If you were using the Multi-Repo approach, the new employee would need to first learn where all the repositories are and then start cloning them one by one.
When using the Mono-Repo approach, all references are in-place and one can use IDE shortcuts to travel across the logic without caring for the infrastructure.
When using a multi-repo approach, shared libraries need to be shared using a package manager (nuget, npm, pip and such). This adds another layer of CI/CD to publish the packages to a common Packages Repository and then you’ll need to update those packages in all the clients.
Whereas in a mono-repo approach, those libraries are just a directory away and can be imported directly. This not only helps with integration, but also helps debugging and sharing communication between the developers.
Cross unit changes
Let’s say you need to make a cross service change. For any reason you might think about (look at the two reasons we found above). Would you rather have your team wasting time making a branch for each repo, making the change (as small as it is) and then creating dozens or even hundreds of PRs? This has two drawbacks: First, you’ll need to use a few developers instead of only one. Moving from repo to repo we’ll be difficult for one developer. Secondly, and more importantly, there is no “one-click-rollback” for all the changes.
Let’s say the change was moving all logs to JSON format.That’s pretty straight forward in some languages. But what if we forgot to set up the encoding? Then I’ll need to go back to each and every PR and update it.
When using a Mono-Repo, you’ll have a single branch opened for these changes and everything will be managed there.
When using the Multi-Repo approach, each repository is relatively small. It contains exactly what that unit needs. No need to clone the whole system to make a small change in one unit.
But, if using the Mono-Repo approach, each clone will bring all the changes in all the units. If you have many changes daily or have been on a vacation for a week, this can be pretty slow.
CI should be taken into account
In one project I’ve moved from Multi-Repo to Mono-Repo, I’ve created a simple script that travels between all the directories (each one representing a unit) and runs the Docker Build script for that unit. This way, I have, in a very simple and fast approach, a CI pipeline. Each Dockerfile for each unit is specifically written for that unit (.Net Core vs Python vs NodeJs), runs the unit’s tests and builds and pushes the image.
When working under the Multi-Repo approach, this process changes. We can argue which approach is easiest to implement and I don’t have a clear answer but, this is an important topic that needs to be evaluated before thinking of moving from one approach to another.
Another question is how to version the units. All at once? Each unit with its version? This depends on the type of the system, your CI/CD infrastructure and more.. I can’t think of an ultimate better way, but I find this a very important aspect of the difference between the two approaches.
What is better? Both and Neither. It depends on your system, your developers and your company. For us, handling Multi-Repo for one project isn’t the best approach and we are in the midst of converting the Multi-Repo to a Mono-Repo. We are using this manual which talks about merging git repositories and saving all the repos history. I do recommend to merge all inner branches to the Master before starting, declare a code freeze and spend a day or two merging the branches, changing the pipelines and running tests. Obviously, don’t remove the original repositories until everything is OK!