Expand description
The data that we will serialize and deserialize.
Notionally, the dep-graph is a sequence of NodeInfo with the dependencies specified inline. The total number of nodes and edges are stored as the last 16 bytes of the file, so we can find them easily at decoding time.
The serialisation is performed on-demand when each node is emitted. Using this scheme, we do not need to keep the current graph in memory.
The deserialization is performed manually, in order to convert from the stored sequence of NodeInfos to the different arrays in SerializedDepGraph. Since the node and edge count are stored at the end of the file, all the arrays can be pre-allocated with the right length.
The encoding of the de-pgraph is generally designed around the fact that fixed-size
reads of encoded data are generally faster than variable-sized reads. Ergo we adopt
essentially the same varint encoding scheme used in the rmeta format; the edge lists
for each node on the graph store a 2-bit integer which is the number of bytes per edge
index in that nodeβs edge list. We effectively ignore that an edge index of 0 could be
encoded with 0 bytes in order to not require 3 bits to store the byte width of the edges.
The overhead of calculating the correct byte width for each edge is mitigated by
building edge lists with EdgesVec
which keeps a running max of the edges in a node.
When we decode this data, we do not immediately create SerializedDepNodeIndex
and
instead keep the data in its denser serialized form which lets us turn our on-disk size
efficiency directly into a peak memory reduction. When we convert these encoded-in-memory
values into their fully-deserialized type, we use a fixed-size read of the encoded array
then mask off any errant bytes we read. The array of edge index bytes is padded to permit this.
We also encode and decode the entire rest of each node using SerializedNodeHeader
to let this encoding and decoding be done in one fixed-size operation. These headers contain
two Fingerprint
s along with the serialized DepKind
, and the number of edge indices
in the node and the number of bytes used to encode the edge indices for this node. The
DepKind
, number of edges, and bytes per edge are all bit-packed together, if they fit.
If the number of edges in this node does not fit in the bits available in the header, we
store it directly after the header with leb128.
Structs§
- Edge
Header πA packed representation of an edgeβs start index and byte width. - Encoder
State π - Graph
Encoder π - Node
Info π - Data for use when recompiling the current crate.
- Serialized
Node πHeader A packed representation of all the fixed-size fields in aNodeInfo
. - Stat π
- Unpacked π
Constants§
- DEP_
NODE_ πPAD Amount of padding we need to add to the edge list data so that we can retrieve every SerializedDepNodeIndex with a fixed-size read then mask. - DEP_
NODE_ πSIZE - DEP_
NODE_ πWIDTH_ BITS Number of bits we need to store the number of used bytes in a SerializedDepNodeIndex. Note that wherever we encode byte widths like this we actually store the number of bytes used minus 1; for a 4-byte value we technically would have 5 widths to store, but using one byte to store zeroes (which are relatively rare) is a decent tradeoff to save a bit in our bitfields.
Functions§
- mask π