If you're working at a large company and downtime is extremely expensive, this checklist is a good guide. Otherwise, if you have good test coverage, you can get by with something simpler. It's super rare to have a breaking change in go.
We do quarterly upgrades of all services in a monorepo (about 20-30). The steps are basically this:
- Upgrade all dependencies to their latest versions, fixing build and test breaks (I read release notes for Go, but not for dependencies)
- Look for deprecated packages and replace them
- Upgrade all toolchains, including CI/CD containers, go.mod, etc.
- Run all tests
- Deploy to the test environment and make sure everything is green
- Deploy to staging and do some sanity checks
- Deploy to prod, keeping an eye on metrics for an hour or two
We're on k8s and the state of all clusters (i.e. which images are running) is tracked in git, so a rollback is just git revert + apply.
In practice, after about four years of this, we've seen maybe a dozen build breaks, and I can only remember one regression caused by a breaking change in a library[1].
Quarterly upgrades, four years: 16 upgrades. A dozen build breaks means that 75% upgrades face a build break.
Since it's over the total number of builds for 20-30 services, it should not be that bad; instead, sometimes there happened a completely uneventful upgrade of everything!
You’re right. Going back over some notes (including go.mod’s history), we’ve been at this for six years, not four. And a dozen build breaks is probably an overestimate — it’s more like one every 12-18 months. Most upgrades are uneventful, everything just continues working.
Out of curiosity, were you dealing with microservices defined within a monorepo, or microservices each in their own repo? The steps here:
> Build your binaries with the new version. Go through the build errors if any.
> Run all the unit tests with the new version. Go through the test failures.
are a lot easier in a monorepo.
Separately, I've experienced frequent breaking changes in the golangci-lint configuration file. I can't point to a specific instance of this happening but one thing I'd suggest is pinning your version of golangci-lint in development and in CI rather than using "latest".
Golang's backwards compatibility and simplified toolchain is one of my favorite parts about it. Bumping go.mod and downloading the new version of go is usually all it takes!
Not Hakan, but I was working closely with him at the time. Lyft was on a microservice many-repo setup, and we did pin the version of golangci-lint.
I've found it's actually not so bad to do this kind of work across many repos, as long as you have the tooling to apply the same change to any number of codebases all at once. Our strategy was typically:
- Write an idempotent codemod to do an upgrade. This is easy as long as your configuration is in a declarative language.
- Regularly apply it or update it on all of the applicable repos.
- Merge upgrades incrementally until you've upgraded 100%.
I’ll add an item that is not yet on our checklist but has already bitten us several times: check your code generation. Since code generation is so popular in the Go ecosystem, we’ve got 5 or 6 different codegen tools that update on various timelines. Twice now we’ve gone through a checklist similar to this article, patted ourselves on the back, and a week later found out no one can regenerate any code.
This is one reason why code generation should run as part of the build process, every time. Even if you decide to check-in the generated code for visibility.
> Even if you decide to check-in the generated code for visibility.
I prefer checking it in by default (generated and checked in by users; CI failing if re-generation during the build generates diff).
It enables much simpler debugging collaboration ("do you have diff in the gen/ dir when you try to repro this bug? I don't."), mistaken-checkin prevention ("did you accidentally run protoc on the wrong version before adding files to this commit? CI's failing because it sees changes in gen/ without changes to requirements or .proto files") and easier verification of upgrades just like this one.
With the ability to use .gitattributes to suppress generated diff visibility by default widely supported (if not well-standardized) across Git repo management platforms, the drawbacks of checking in generated sources are minimal.
The internal tools I've used regenerate the code as part of the CI process, and will fail the pipeline if the regenerated code has dirtied the git tree.
Another suggestion: if your monorepo's service packaging is sufficiently uniform, build every service against both Go versions, package both binaries into the deploy artifact, and install a feature flag that lets you select which binary to boot when the service starts. This also lets you canary an arbitrary percentage of the fleet with the new Go version, and you can execute a version rollback by redeploying (without needing to revert any commits).
Currently (Go 1.24), the official team has not published a tool to identify all of the breaking cases caused by this change. So you might need to check the code by your eyes.
I'm not sure you can actually break code with the new for-loop semantics (I mean, in real life situations). It can probably fix some buggy code in the wild, but I have a hard time believing anyone would voluntarily write code relying on the old semantics of loop variables being reassigned instead of reinitialized.
> I'm not sure you can actually break code with the new for-loop semantics (I mean, in real life situations).
The linked issue thread provides real life cases.
> It can probably fix some buggy code in the wild, but I have a hard time believing anyone would voluntarily write code relying on the old semantics of loop variables being reassigned instead of reinitialized.
For "for-range" loops, you might be correct. However, for 3-clause-for loops, your opinions lack substantial evidence to support them.
> Honestly I think this is a non-issue.
You can think anything. But the facts are there. There are at least 3 important facts of the semantic change on 3-clause-for loops:
1. The new semantics of 3-clause-for loops are more error prone for concurrency programming.
2. The new semantics of 3-clause-for loops may silently downgrade Go code performance.
3. The expected benefits of the change are actually so minuscule that they can be disregarded. On the other hand, the drawbacks of the change are huge.
> With the introduction of generics at 1.18, many linters lacked support for generics for months. We delayed the upgrade due to this issue.
I wouldn't plan on using a new feature in production in the release that introduced it. Why would you plan to be using generics on day one?
> There was talk of trying to solve this issue in the upstream ourselves.
Was there a genuine business case that would make Lyft more profit if they used generics? If not then why would you even consider this?
> Fortunately, by the time we seriously started exploring this option, linter support was added and go 1.19 was also released. We eventually upgraded directly to 1.19 from 1.17 but we were around 10 months late.
You weren't late. You were precisely on time. This is some odd project mentality.
We do quarterly upgrades of all services in a monorepo (about 20-30). The steps are basically this:
- Upgrade all dependencies to their latest versions, fixing build and test breaks (I read release notes for Go, but not for dependencies)
- Look for deprecated packages and replace them
- Upgrade all toolchains, including CI/CD containers, go.mod, etc.
- Run all tests
- Deploy to the test environment and make sure everything is green
- Deploy to staging and do some sanity checks
- Deploy to prod, keeping an eye on metrics for an hour or two
We're on k8s and the state of all clusters (i.e. which images are running) is tracked in git, so a rollback is just git revert + apply.
In practice, after about four years of this, we've seen maybe a dozen build breaks, and I can only remember one regression caused by a breaking change in a library[1].
[1] https://github.com/golang/go/issues/24211