Expand description

The data that we will serialize and deserialize.

Notionally, the dep-graph is a sequence of NodeInfo with the dependencies specified inline. The total number of nodes and edges are stored as the last 16 bytes of the file, so we can find them easily at decoding time.

The serialisation is performed on-demand when each node is emitted. Using this scheme, we do not need to keep the current graph in memory.

The deserialization is performed manually, in order to convert from the stored sequence of NodeInfos to the different arrays in SerializedDepGraph. Since the node and edge count are stored at the end of the file, all the arrays can be pre-allocated with the right length.

The encoding of the de-pgraph is generally designed around the fact that fixed-size reads of encoded data are generally faster than variable-sized reads. Ergo we adopt essentially the same varint encoding scheme used in the rmeta format; the edge lists for each node on the graph store a 2-bit integer which is the number of bytes per edge index in that node’s edge list. We effectively ignore that an edge index of 0 could be encoded with 0 bytes in order to not require 3 bits to store the byte width of the edges. The overhead of calculating the correct byte width for each edge is mitigated by building edge lists with EdgesVec which keeps a running max of the edges in a node.

When we decode this data, we do not immediately create SerializedDepNodeIndex and instead keep the data in its denser serialized form which lets us turn our on-disk size efficiency directly into a peak memory reduction. When we convert these encoded-in-memory values into their fully-deserialized type, we use a fixed-size read of the encoded array then mask off any errant bytes we read. The array of edge index bytes is padded to permit this.

We also encode and decode the entire rest of each node using SerializedNodeHeader to let this encoding and decoding be done in one fixed-size operation. These headers contain two Fingerprints along with the serialized DepKind, and the number of edge indices in the node and the number of bytes used to encode the edge indices for this node. The DepKind, number of edges, and bytes per edge are all bit-packed together, if they fit. If the number of edges in this node does not fit in the bits available in the header, we store it directly after the header with leb128.



  • Amount of padding we need to add to the edge list data so that we can retrieve every SerializedDepNodeIndex with a fixed-size read then mask.
  • Number of bits we need to store the number of used bytes in a SerializedDepNodeIndex. Note that wherever we encode byte widths like this we actually store the number of bytes used minus 1; for a 4-byte value we technically would have 5 widths to store, but using one byte to store zeroes (which are relatively rare) is a decent tradeoff to save a bit in our bitfields.