11 minutes
Nvix: implementing a Tvix store with NATS
Ok, before we get started, the name isn’t great. I get it. But in my defence, naming things is hard, and I wanted something short for the eventual cli command.
So let’s take it as read that I have heard your groans and accept your position that Nvix as a name is lame and should be changed at my earliest convenience 😉.
With that out of the way, let’s get to it.
What is Nvix?
Nvix is an implementation of a Tvix store using my favourite hammer, NATS, which came into this world as part of the hack day at NixCon back at the beginning of September.
Since then, I have taken what was a hastily implemented and hacky version of the Blob service and, during the recent Numtide Retreat, expanded it with hastily implemented and hacky versions of the Directory and Path Info services, rounding out the trilogy of gRPC services required to consider Nvix an implementation of a Tvix store.
At this point, you might be forgiven for wondering what Tvix is. So, let me explain.
What is Tvix?
Simply put, it’s “a new implementation of the Nix language and package manager”, essentially a rewrite of Nix from the ground up. It started during the pandemic and has been making steady progress since then. I was first introduced to it by @flokli at the Numtide retreat in 2022.
At that time, I was still somewhat of a baby Nix developer, and much of what @flokli was talking about went over my head. I nodded along and offered general advice for the store implementation he was hacking on at the time and not much else.
Since then, as I have progressed into my Nix adolescence, I have begun to grasp the underlying problems that Tvix is trying to address. In particular, I was curious about the Tvix store and, given my recent experiments, interested in how it can help provide highly available multi-tenant Nix stores, a.k.a binary caches.
How it might help will become clearer once I explain the three gRPC services/interfaces that comprise what is commonly referred to as the Tvix store.
The Blob Service
At the most basic level, any implementation of a Nix store needs to be able to, well, store things. In Tvix, this begins with the Blob Service.
service BlobService {
// In the future, Stat will expose more metadata about a given blob,
// such as more granular chunking, baos.
// For now, it's only used to check for the existence of a blob, as asking
// this for a non-existing Blob will return a Status::not_found gRPC error.
rpc Stat(StatBlobRequest) returns (BlobMeta);
// Read returns a stream of BlobChunk, which is just a stream of bytes with
// the digest specified in ReadBlobRequest.
//
// The server may decide on whatever chunking it may seem fit as a size for
// the individual BlobChunk sent in the response stream.
rpc Read(ReadBlobRequest) returns (stream BlobChunk);
// Put uploads a Blob, by reading a stream of bytes.
//
// The way the data is chunked up in individual BlobChunk messages sent in
// the stream has no effect on how the server ends up chunking blobs up.
rpc Put(stream BlobChunk) returns (PutBlobResponse);
}
When you want to add a file to the store, you must first send a stream of BlobChunk
s to the BlobService
. These are
just arrays of bytes representing parts of the content you want to store, and, in return, you will receive a PutBlobResponse
.
This is where things already start to get interesting. First, you will notice that there are no fields for metadata in
the BlobChunk
or the Put
method:
// This represents some bytes of a blob.
// Blobs are sent in smaller chunks to keep message sizes manageable.
message BlobChunk {
bytes data = 1;
}
And if we look at the PutBlobResponse
, what we receive after uploading a Blob is just a byte array called digest
:
message PutBlobResponse {
// The blake3 digest of the data that was sent.
bytes digest = 1;
}
As the comment above alludes to, when you upload something to the Blob Service
, what you get in response is a
Blake3 digest of the byte stream you provided. This digest is a
unique key used to retrieve this Blob later.
What this means is that the Blob Service is, in fact, a content-addressed store. You do not retrieve entries by some arbitrary key provided during a put operation but instead by a property intrinsic to the content you store.
This has enormous implications for storage, namely, de-duplication, with an implementation of the Blob Service able to leverage the digest to ensure they only store one copy of a given piece of content regardless of how many times it might be uploaded.
Whilst cloud providers worldwide might lament the loss of revenue due to the reduced storage costs this brings, you might be wondering how we can refer to these blobs in some standard fashion, such as filename, when we want to actually use them…
The Directory Service
Since we refer to our blobs of bytes by their digest, we need some way to incorporate them into a directory structure. Enter the Directory Service.
service DirectoryService {
// Get retrieves a stream of Directory messages, by using the lookup
// parameters in GetDirectoryRequest.
// Keep in mind multiple DirectoryNodes in different parts of the graph might
// have the same digest if they have the same underlying contents,
// so sending subsequent ones can be omitted.
rpc Get(GetDirectoryRequest) returns (stream Directory);
// Put uploads a graph of Directory messages.
// Individual Directory messages need to be send in an order walking up
// from the leaves to the root - a Directory message can only refer to
// Directory messages previously sent in the same stream.
// Keep in mind multiple DirectoryNodes in different parts of the graph might
// have the same digest if they have the same underlying contents,
// so sending subsequent ones can be omitted.
// We might add a separate method, allowing to send partial graphs at a later
// time, if requiring to send the full graph turns out to be a problem.
rpc Put(stream Directory) returns (PutDirectoryResponse);
}
You send a stream of Directory
messages to the Directory Service to upload a directory tree. Each Directory
message
contains a list of subdirectories, a list of files, and a list of symlinks.
message Directory {
repeated DirectoryNode directories = 1;
repeated FileNode files = 2;
repeated SymlinkNode symlinks = 3;
}
In the case of a file, we refer to the file contents using its digest, which, you guessed it, allows us to retrieve said
contents from the Blob Service. In the case of a symlink, its target is a relative or absolute path. And in the case of
a directory, we refer to it by its digest. That’s right, a Directory
message also has a Blake3 digest.
To calculate it, we marshal the Directory
protobuf message using the Deterministic
marshalling option and then compute
the Blake3 digest of the resultant bytes. To further ensure there’s only one possible representation for a directory,
all subdirectories, files and symlinks within a Directory
message must be ordered lexicographically.
In addition, when uploading a directory tree, the stream of Directory
messages must be in a particular order,
starting at the leaves and working towards the root, e.g. a depth-first traversal.
With these constraints in place, we are actually generating a Merkle Tree
when uploading a directory tree to the Directory Service. This is why the PutDirectoryResponse
message contains a root_digest
.
From an implementation standpoint, with each Directory
having its own digest, we can de-duplicate the directory metadata
in much the same way as we can de-duplicate blobs. And, when we want to retrieve a given directory tree later, we look
it up using a digest that is intrinsic to the tree contents themselves.
CA Store
Having implemented a Blob Service and a Directory Service, we already have an effective means of storing directory trees in a content-addressed fashion, with the associated storage reductions and security improvements that brings.
When receiving a directory tree, we can verify the metadata as we receive it by calculating the intermediate Blake3 hashes for each Directory message and ensuring they match.
And, in the not-so-distant future, tvix-castore
will introduce a verified streaming approach for large blobs, allowing each
chunk to be verified as it is received. This has enormous implications for untrusted environments and for facilitating
the sharing of store content in a peer-to-peer fashion.
That said, one service is left to implement if we want a Nix store.
The Path Info Service
service PathInfoService {
// Return a PathInfo message matching the criteria specified in the
// GetPathInfoRequest message.
rpc Get(GetPathInfoRequest) returns (PathInfo);
// Upload a PathInfo object to the remote end. It MUST not return until the
// PathInfo object has been written on the the remote end.
//
// The remote end MAY check if a potential DirectoryNode has already been
// uploaded.
//
// Uploading clients SHOULD obviously not steer other machines to try to
// substitute before from the remote end before having finished uploading
// PathInfo, Directories and Blobs.
// The returned PathInfo object MAY contain additional narinfo signatures,
// but is otherwise left untouched.
rpc Put(PathInfo) returns (PathInfo);
// Calculate the NAR representation of the contents specified by the
// root_node. The calculation SHOULD be cached server-side for subsequent
// requests.
//
// All references (to blobs or Directory messages) MUST already exist in
// the store.
//
// The method can be used to produce a Nix fixed-output path, which
// contains the (compressed) sha256 of the NAR content representation in
// the root_node name (suffixed with the name).
//
// It can also be used to calculate arbitrary NAR hashes of output paths,
// in case a legacy Nix Binary Cache frontend is provided.
rpc CalculateNAR(tvix.castore.v1.Node) returns (CalculateNARResponse);
// Return a stream of PathInfo messages matching the criteria specified in
// ListPathInfoRequest.
rpc List(ListPathInfoRequest) returns (stream PathInfo);
}
This adds another layer of Nix-specific metadata, allowing us to track things such as references to other paths, the deriver used to build a given store path and signatures.
Much like the rest of the services, the Path Info object uses a Node reference, which can be a directory, a file, or a symlink, all of which use a Blake3 digest.
In addition to tracking store paths, the Path Info Service allows us to take a Nix store path and generate a NAR representation. This is useful when bridging between Tvix land and a traditional binary cache.
Why NATS?
At this stage, you might think, “Tvix looks cool, but why did you have to bring NATS into the equation?” And you can be forgiven for thinking I’m just seeing nails everywhere for my new shiny hammer.
However, there are good reasons why having an implementation of the Tvix store backed by NATS is a valuable addition. And the first reason lies in that sentence: an implementation.
Layering Stores
You see, Tvix stores are intended to be layered. There are supposed to be many implementations. You could have, for example, an in-memory store, which defers to an on-disk implementation, which defers to a remote S3 or, in this case, NATS implementation.
Or there could be a peer-to-peer-based store implementation that attempts to source store paths over your local network. That could reach out to a NATS-based store, which acts as a fallback if there are no local peers.
The combinations are endless and can be tailored to a specific use case.
Storage and Distribution
My main reason for choosing NATS is that, as a subset of its functionality, it gives me a rich key-value store with fine-grained storage and replication options in one place.
On a local level, a NATS cluster provides resilience and flexibility, which, in and of itself, is not unique. But when you consider some of the super-cluster architectures that NATS can support, it doesn’t take long before you arrive at a place that resembles a combination of the current S3 and Fastly setup that cache.nixos.org relies upon.
Combined with its first-class multi-tenancy, I see the potential for a globally distributed and permissioned binary cache in which both public and private paths can be hosted safely together, benefiting from a high level of de-duplication behind the scenes.
Summary
In this post, I have introduced the Tvix store and covered its basic architecture and the various services you need to implement. I have also touched on why I built https://github.com/brianmcgee/nvix and feel NATS is a good fit for Tvix.
You might have noticed I spent little time on the actual implementation of Nvix. That’s because this post is already quite long, and I try to keep the reading time to a max of 10 minutes or so. But another reason is that I’m not doing anything exceptional NATS-wise in Nvix so far.
There are some caching patterns I’m experimenting with that use the Re-Publish feature. Besides that, the exciting stuff only happens when deploying Nvix to a clustered or super-cluster architecture, which I’ve yet to set up.
Rest assured that it is on the roadmap, and when I get around to it, I’ll be sure to post a follow-up!