Stargz snapshotter is low-key magic
Sometimes you run into tech that makes you go “Heh?”, which is the positive variant of its cousin, “Wtf?”. My newest addition, Stargz snapshotter, is exactly that kind of tech.
Without immediately spoiling how the magic works: Stargz snapshotter is something you install on your Docker host that causes the daemon to download only the files your workload actually uses, just in time. Heh? I know right. Instead of gobbling down a 9GB CUDA container during docker pull only to use a fraction of it, it fetches precisely what’s needed, when it’s needed.
To pull off that trick, it has to solve a few problems:
- Know which files are being accessed by your workload or processes inside the container
- Know where to fetch those files from
- Perform some additional sleight of hand
So how does it do that? It’s a combination of things working together:
- A FUSE layer that observes file access and fetches content on demand
- A gz/tar/toc index on where to exactly find those file in the image (and download them)
Now, this is already cool but your containers would suffer from a slow start/unpredictable run times. You can also choose to optimize the images.
StarGz then profiles your workloads (by running them) and puts the most frequently asked and most used files first, bundles them up so that you have a quick first fetch with range requests.
Now isn’t that awesome?
It’s internals are really worth reading
Also, seems that Nydus ran with the concept and put more fancy things op top like chunk dedup.