GitImageRegistry: Using Git as a Docker image registry

Another software startup has been experiencing growing pains recently, Docker and DockerHub. Like most modern software startups they grow big on what is effectively "unlimited" VC money. This allows them to pretend to be your friend for a while. They shower you with a bunch of gifts, buy your loyalty, and "Wow, they're so nice!" However when that money runs out, or starts to run out, they have to stop pretending to be your friend and start running a business.

That's when reality hits, when it comes down to it no company is your friend. They have their interests and you have yours. Maybe there can be a beneficial relationship still but one thing should be definitely clear, things are not the same and they never will be.

Now, I won't say I fully blame them for this state of affairs. Any reasonable person should've been able to tell from the start that this was an unsustainable friendship. However, at the same time modern tech companies have taken a poison pill with VC money and they're all too willing to craft and maintain the illusion that your friendship with them is built upon.

What sucks is that you've invested a lot into this friendship and now you're in a position where you may not be able to easily walk away. What sucks more is there are plenty of vultures waiting in the wings to get you on a rebound with promises like "5TB of free bandwidth."

But if you're reading this it's very likely that you realize the cornerstone of the "friendship" you've had with these companies has been removed. And maybe, hopefully, you've decided you're not willing to take on another "friendship" like this so fast and can recognize the signs of one of these "friendships" before you put yourself into a bad situation again.

Luckily there is good news and that good news is an old friend who doesn't need anything from you except your friendship. Git.

Using Git, a local docker registry and FUSE we can replace what we lost in our friendship with DockerHub and also avoid a future loss of a theoretical friendship with AWS. You can pull as many images as you want, no costs, and it's portable.

For this proof of concept we'll use GitHub as a host for our docker images. Before I dig into a ton of details though I'll just give you the quickstart so you can use this right now.

Using GitImageRegistry

Quickstart

$ git clone https://github.com/ethanwillis/GitImageRegistry
$ cd GitImageRegistry
$ chmod +x startRegistry.sh
$ ./startRegistry.sh

So how does this work?

GitImageRegistry is just at its core Git repo that contains a single file that maps docker images to other Git repos that contain the images themselves. For simplicity the former will be referred to as the Index and the latter will be referred to as the Blob.

Aside from this the other major component is a FUSE wrapper that treats the Index and the Blob as a regular old filesystem.

Let's take a look at how the Index is structured first.

The Index

The Index is a set of image registry entries addressed by either their set of tag(s) or uniquely the image's sha256 digest. This forms a tuple of the following format ( [tag1, ..., tagN], digest ). We'll refer to this as an Index Key.

Index Key

The Index Key identifies a set of unique Git remotes that host the associated Blob. We'll refer to these Git remotes as Blob Hosts.

Blob Hosts

Blob Hosts are simply pointers to publicly accessible (or maybe also privately accessible) Git repos that contain the Blob for an image.

The Blob

The Blob is a set of files in a Git Repo that together form the actual binary image data. This set of files is a chunked version of the image. This allows for the implementation to be portable across lots of Blob hosts regardless of how they are hosting Git repos. But more importantly file chunking of this binary data allows for a future implementation of GitImageRegistry where remotes are associated with subsets of all chunks to allow for parallel reads and writes for faster image pulls. And on top of that this also allows for Blob hosts to be able to contribute without requiring the storage of even a full image.

Putting it all together. The Index Key, Blob Hosts, and The Blob

This all comes together to form a single image registry entry.

The full Index composed of a set of image registry entries will look like the following.

So what does the Index really look like?

It's just plaintext. Here's an example Index that just contains a single image. Formatted more nicely for mortal consumption.

(
  ;;; Image
  (
    ;;; Index Key
    (
      ("registry:2.7.1", "registry:2.7", "registry:2", "registry:latest"),           
      "sha256:e09ed8c6c837d366a501f15dcb47939bbbb6242bf3886270834e2a0fa1555234"
    )
    ;;; Remotes
    (
      "https://github.com/ethanwillis/registry"
      "https://github.com/ethanwillis/registry-backup"
    )
    ;;; Chunks
    (
      "chunk1.bin"
      "chunk2.bin"
      "chunk3.bin"
      "chunk4.bin"
    )
  )
)

FUSE Wrapper around the Index

Writing Files

Reading Files