- Published on
What Happens When You Go Get — A Closer Look At The Internals Of Go Modules
- Authors
- Name
- Tobi Okewole
Whenever you need an external module while working on a project in Golang, You go get it — By running the command go get <pkg-name>
. Unlike in many other languages, Go does not have a central software registry or package manager (like NPM for JavaScript, Maven for Java) where modules can be accessed from and published to. Whether you are just getting started with Go, running your first go get
command or are a veteran, understanding what happens behind the scenes when you run the go
command is important.
In this post, we would discuss the steps that take place when you attempt to download an external module to your project. Let’s get started 🚀🚀
The GOPROXY
Package management in go is [semi-]decentralized. There is no single server that hosts the source code for all the various modules that exist. Any git repository can be used as a module package system in Go. There are various arguments for and against the decentralized nature of Go’s module system. Personally, I think most of the arguments against Go’s decentralized module system are weak, especially with the introduction of module mirrors & the checksum database which we will discuss in the coming sections.
Although a centralized module system tends to be simpler and faster, the central server might not be available everywhere (in countries with national firewalls), it requires developers to trust a central entity.
[As I write this, I am reminded of this story when Azer Koçulu deleted an NPM package & “broke the internet” ]
A decentralized module system avoids those big problems, but you may have a lot of smaller trust and availability problems with individual servers. Go tries to strike a balance between the two with proxies.
Generally speaking, when you run a go get
command, Go downloads the package to your computer if it is not present in your local cache. The source from which the go
command downloads external modules from depends on the value of the GOPROXY
environment variable. By default, the value of the GOPROXY
variable is set to https://proxy.golang.org, direct
. The above value specifies that the go
command should attempt to download the specified external module from https://proxy.golang.org
and fall back to the direct URL provided if it is unable to.
Module Mirrors
In this section, we will answer the questions what is https://proxy.golang.org
? Why does the go
command try to download the external module I specified from there instead of the actual URL appended to the go get
command?
proxy.golang.org
is a module mirror run by google. The module mirror is a type of proxy that fetches modules from the origin servers(git repositories) and caches them in its own storage for use in future requests. Module mirrors ensure that changes to the source of the module/downtime in the origin servers do not affect your builds. Downloading Modules using a Proxy is more efficient, faster and requires less storage in comparison to direct module downloads.
Asides from downloading modules, the go
command is also tasked with resolving the dependencies of these newly downloaded modules. Using the direct download method, the go
command would have to download the entire source history of a dependency whether it is going to be used in the build or not. Using a proxy, the go
command downloads a zip file which is a partial snapshot of a repository at a specific commit. The snapshot contains everything in the module’s root directory (the directory containing its go.mod
file) but excludes everything in nested modules (subdirectories containing go.mod
files). That includes the source code of all the module’s packages (regardless of whether they’re actually needed for a build). It may also include files in directories that aren’t Go packages. Also, the go
command fetches the .mod
& .info
files of other dependencies[-version] by making HTTP requests to endpoints on the module proxy server.
Although the most popular, https://proxy.golang.org
is not the only available module proxy. Module proxies are not sacred, in fact, you can create your own module proxy. Projects like this let you even host your own Go proxy. There is a GOPROXY protocol every module proxy must implement. Once your HTTP server implements all the specs in the GOPROXY protocol, you have yourself a module proxy. The spec includes a list of endpoints all module proxies must have.
I would also like to mention that the GOPROXY protocol, exists to ensure uniformity across all the various proxies that exist. The go
command is not interested in what Proxy you have set up, It just needs to be able to access all the necessary endpoints as specified in the spec. Running go get github.com/aws/aws-sdk-go-v2
, the go
command will make an HTTP request to $GOPROXY/github.com/aws/aws-sdk-go-v2@v/list
. The environment variable in the command above will be whatever you set it to be, go don’t care.
Let’s Talk Security
At this point, you are probably wondering about the security of the module proxy. In this section, we would discuss the steps the go
command takes to ensure that modules that it downloads are secure & not tampered with.
Ever wondered what the go.sum
file that sits right beside your go.mod
file does? Tried to open it but realised it was incomprehensible? The first time an external module(dependency) is used by your project, a list of cryptographic hashes for the .mod
& .zip
file of that dependency & all its transitive dependencies are generated and added to your go.sum
file. Subsequently, when the said dependency is [re-]downloaded, the go
command checks to ensure that the generated hash of the .mod
& .zip
file of that dependency matches with the corresponding entry in the go.sum
file for that dependency.
Although the go.sum
file makes sure hashes match, ensuring reproducible builds, it does nothing to make sure that the first time a dependency is added, It is secure/not tampered with. The checksum database exists as a global source of truth for all publicly available module versions. Using the checksum database, a module is verified on the first download and compare with the go.sum
file on subsequent downloads. sum.golang.org
is an auditable checksum database powered by google. The go
command uses sum.golang.org
by default when downloading an external dependency that does not have any module version specified in the existing go.sum
file.
In Conclusion
We have discussed Go Modules, Module Mirrors & the checksum database! I hope you got a little more clarity about Go modules and all the stuff happening behind the scenes. If you are interested in learning more, I suggest you check out the following materials.
- https://www.youtube.com/watch?v=KqTySYYhPUE
- https://proxy.golang.org/
- https://jayconrod.com/posts/118/life-of-a-go-module
If you have any questions or feedback, please feel free to share them with me on Twitter: @oluwatvbi or via Email: tobade02@gmail.com
Big Thanks to Chidi Williams & Jay Conrod(Jay worked on Go modules at Google🤯) for taking the time to review the original draft for this article & give points for improvements!