cargo::core::compiler

Module fingerprint

Source
Expand description

Tracks changes to determine if something needs to be recompiled.

This module implements change-tracking so that Cargo can know whether or not something needs to be recompiled. A Cargo Unit can be either “dirty” (needs to be recompiled) or “fresh” (it does not need to be recompiled).

§Mechanisms affecting freshness

There are several mechanisms that influence a Unit’s freshness:

  • The Fingerprint is a hash, saved to the filesystem in the .fingerprint directory, that tracks information about the Unit. If the fingerprint is missing (such as the first time the unit is being compiled), then the unit is dirty. If any of the fingerprint fields change (like the name of the source file), then the Unit is considered dirty.

    The Fingerprint also tracks the fingerprints of all its dependencies, so a change in a dependency will propagate the “dirty” status up.

  • Filesystem mtime tracking is also used to check if a unit is dirty. See the section below on “Mtime comparison” for more details. There are essentially two parts to mtime tracking:

    1. The mtime of a Unit’s output files is compared to the mtime of all its dependencies’ output file mtimes (see check_filesystem). If any output is missing, or is older than a dependency’s output, then the unit is dirty.
    2. The mtime of a Unit’s source files is compared to the mtime of its dep-info file in the fingerprint directory (see find_stale_file). The dep-info file is used as an anchor to know when the last build of the unit was done. See the “dep-info files” section below for more details. If any input files are missing, or are newer than the dep-info, then the unit is dirty.
  • Alternatively if you’re using the unstable feature checksum-freshness mtimes are ignored entirely in favor of comparing first the file size, and then the checksum with a known prior value emitted by rustc. Only nightly rustc will emit the needed metadata at the time of writing. This is dependent on the unstable feature -Z checksum-hash-algorithm.

Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking is notoriously imprecise and problematic. Only a small part of the environment is captured. This is a balance of performance, simplicity, and completeness. Sandboxing, hashing file contents, tracking every file access, environment variable, and network operation would ensure more reliable and reproducible builds at the cost of being complex, slow, and platform-dependent.

§Fingerprints and UnitHashs

Metadata tracks several UnitHashs, including Metadata::unit_id, Metadata::c_metadata, and Metadata::c_extra_filename. See its documentation for more details.

NOTE: Not all output files are isolated via filename hashes (like dylibs). The fingerprint directory uses a hash, but sometimes units share the same fingerprint directory (when they don’t have Metadata) so care should be taken to handle this!

Fingerprints and UnitHashs are similar, and track some of the same things. UnitHashs contains information that is required to keep Units separate. The Fingerprint includes additional information that should cause a recompile, but it is desired to reuse the same filenames. A comparison of what is tracked:

ValueFingerprintMetadata::unit_idMetadata::c_metadataMetadata::c_extra_filename
rustc
Profile
cargo rustc extra args11
CompileMode
Target Name
TargetKind (bin/lib/etc.)
Enabled Features
Declared Features
Immediate dependency’s hashes2
CompileKind (host/target)
__CARGO_DEFAULT_LIB_METADATA3
package_id
authors, description, homepage, repo
Target src path relative to ws
Target flags (test/bench/for_host/edition)
-C incremental=… flag
mtime of sources4
RUSTFLAGS/RUSTDOCFLAGS11
Lto flags
config settings5
is_std
[lints] table6
[lints.rust.unexpected_cfgs.check-cfg]

When deciding what should go in the Metadata vs the Fingerprint, consider that some files (like dylibs) do not have a hash in their filename. Thus, if a value changes, only the fingerprint will detect the change (consider, for example, swapping between different features). Fields that are only in Metadata generally aren’t relevant to the fingerprint because they fundamentally change the output (like target vs host changes the directory where it is emitted).

§Fingerprint files

Fingerprint information is stored in the target/{debug,release}/.fingerprint/ directory. Each Unit is stored in a separate directory. Each Unit directory contains:

  • A file with a 16 hex-digit hash. This is the Fingerprint hash, used for quick loading and comparison.
  • A .json file that contains details about the Fingerprint. This is only used to log details about why a fingerprint is considered dirty. CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build can be used to display this log information.
  • A “dep-info” file which is a translation of rustc’s *.d dep-info files to a Cargo-specific format that tweaks file names and is optimized for reading quickly.
  • An invoked.timestamp file whose filesystem mtime is updated every time the Unit is built. This is used for capturing the time when the build starts, to detect if files are changed in the middle of the build. See below for more details.

Note that some units are a little different. A Unit for running a build script or for rustdoc does not have a dep-info file (it’s not applicable). Build script invoked.timestamp files are in the build output directory.

§Fingerprint calculation

After the list of Units has been calculated, the Units are added to the JobQueue. As each one is added, the fingerprint is calculated, and the dirty/fresh status is recorded. A closure is used to update the fingerprint on-disk when the Unit successfully finishes. The closure will recompute the Fingerprint based on the updated information. If the Unit fails to compile, the fingerprint is not updated.

Fingerprints are cached in the BuildRunner. This makes computing Fingerprints faster, but also is necessary for properly updating dependency information. Since a Fingerprint includes the Fingerprints of all dependencies, when it is updated, by using Arc clones, it automatically picks up the updates to its dependencies.

§dep-info files

Cargo has several kinds of “dep info” files:

  • dep-info files generated by rustc.
  • Fingerprint dep-info files translated from the first one.
  • dep-info for external build system integration.
  • Unstable -Zbinary-dep-depinfo.
§rustc dep-info files

Cargo passes the --emit=dep-info flag to rustc so that rustc will generate a “dep info” file (with the .d extension). This is a Makefile-like syntax that includes all of the source files used to build the crate. This file is used by Cargo to know which files to check to see if the crate will need to be rebuilt. Example:

/path/to/target/debug/deps/cargo-b6219d178925203d: src/bin/main.rs src/bin/cargo/cli.rs # … etc.
§Fingerprint dep-info files

After rustc exits successfully, Cargo will read the first kind of dep info file and translate it into a binary format that is stored in the fingerprint directory (translate_dep_info).

These are used to quickly scan for any changed files. The mtime of the fingerprint dep-info file itself is used as the reference for comparing the source files to determine if any of the source files have been modified (see below for more detail).

Note that Cargo parses the special # env-var:... comments in dep-info files to learn about environment variables that the rustc compile depends on. Cargo then later uses this to trigger a recompile if a referenced env var changes (even if the source didn’t change).

§dep-info files for build system integration.

There is also a third dep-info file. Cargo will extend the file created by rustc with some additional information and saves this into the output directory. This is intended for build system integration. See the output_depinfo function for more detail.

§-Zbinary-dep-depinfo

rustc has an experimental flag -Zbinary-dep-depinfo. This causes rustc to include binary files (like rlibs) in the dep-info file. This is primarily to support rustc development, so that Cargo can check the implicit dependency to the standard library (which lives in the sysroot). We want Cargo to recompile whenever the standard library rlib/dylibs change, and this is a generic mechanism to make that work.

§Mtime comparison

The use of modification timestamps is the most common way a unit will be determined to be dirty or fresh between builds. There are many subtle issues and edge cases with mtime comparisons. This gives a high-level overview, but you’ll need to read the code for the gritty details. Mtime handling is different for different unit kinds. The different styles are driven by the Fingerprint::local field, which is set based on the unit kind.

The status of whether or not the mtime is “stale” or “up-to-date” is stored in Fingerprint::fs_status.

All units will compare the mtime of its newest output file with the mtimes of the outputs of all its dependencies. If any output file is missing, then the unit is stale. If any dependency is newer, the unit is stale.

§Normal package mtime handling

LocalFingerprint::CheckDepInfo is used for checking the mtime of packages. It compares the mtime of the input files (the source files) to the mtime of the dep-info file (which is written last after a build is finished). If the dep-info is missing, the unit is stale (it has never been built). The list of input files comes from the dep-info file. See the section above for details on dep-info files.

Also note that although registry and git packages use CheckDepInfo, none of their source files are included in the dep-info (see translate_dep_info), so for those kinds no mtime checking is done (unless -Zbinary-dep-depinfo is used). Repository and git packages are static, so there is no need to check anything.

When a build is complete, the mtime of the dep-info file in the fingerprint directory is modified to rewind it to the time when the build started. This is done by creating an invoked.timestamp file when the build starts to capture the start time. The mtime is rewound to the start to handle the case where the user modifies a source file while a build is running. Cargo can’t know whether or not the file was included in the build, so it takes a conservative approach of assuming the file was not included, and it should be rebuilt during the next build.

§Rustdoc mtime handling

Rustdoc does not emit a dep-info file, so Cargo currently has a relatively simple system for detecting rebuilds. LocalFingerprint::Precalculated is used for rustdoc units. For registry packages, this is the package version. For git packages, it is the git hash. For path packages, it is the a string of the mtime of the newest file in the package.

There are some known bugs with how this works, so it should be improved at some point.

§Build script mtime handling

Build script mtime handling runs in different modes. There is the “old style” where the build script does not emit any rerun-if directives. In this mode, Cargo will use LocalFingerprint::Precalculated. See the “rustdoc” section above how it works.

In the new-style, each rerun-if directive is translated to the corresponding LocalFingerprint variant. The RerunIfChanged variant compares the mtime of the given filenames against the mtime of the “output” file.

Similar to normal units, the build script “output” file mtime is rewound to the time just before the build script is executed to handle mid-build modifications.

§Considerations for inclusion in a fingerprint

Over time we’ve realized a few items which historically were included in fingerprint hashings should not actually be included. Examples are:

  • Modification time values. We strive to never include a modification time inside a Fingerprint to get hashed into an actual value. While theoretically fine to do, in practice this causes issues with common applications like Docker. Docker, after a layer is built, will zero out the nanosecond part of all filesystem modification times. This means that the actual modification time is different for all build artifacts, which if we tracked the actual values of modification times would cause unnecessary recompiles. To fix this we instead only track paths which are relevant. These paths are checked dynamically to see if they’re up to date, and the modification time doesn’t make its way into the fingerprint hash.

  • Absolute path names. We strive to maintain a property where if you rename a project directory Cargo will continue to preserve all build artifacts and reuse the cache. This means that we can’t ever hash an absolute path name. Instead we always hash relative path names and the “root” is passed in at runtime dynamically. Some of this is best effort, but the general idea is that we assume all accesses within a crate stay within that crate.

These are pretty tricky to test for unfortunately, but we should have a good test suite nowadays and lord knows Cargo gets enough testing in the wild!

§Build scripts

The running of a build script (CompileMode::RunCustomBuild) is treated significantly different than all other Unit kinds. It has its own function for calculating the Fingerprint (calculate_run_custom_build) and has some unique considerations. It does not track the same information as a normal Unit. The information tracked depends on the rerun-if-changed and rerun-if-env-changed statements produced by the build script. If the script does not emit either of these statements, the Fingerprint runs in “old style” mode where an mtime change of any file in the package will cause the build script to be re-run. Otherwise, the fingerprint only tracks the individual “rerun-if” items listed by the build script.

The “rerun-if” statements from a previous build are stored in the build output directory in a file called output. Cargo parses this file when the Unit for that build script is prepared for the JobQueue. The Fingerprint code can then use that information to compute the Fingerprint and compare against the old fingerprint hash.

Care must be taken with build script Fingerprints because the Fingerprint::local value may be changed after the build script runs (such as if the build script adds or removes “rerun-if” items).

Another complication is if a build script is overridden. In that case, the fingerprint is the hash of the output of the override.

§Special considerations

Registry dependencies do not track the mtime of files. This is because registry dependencies are not expected to change (if a new version is used, the Package ID will change, causing a rebuild). Cargo currently partially works with Docker caching. When a Docker image is built, it has normal mtime information. However, when a step is cached, the nanosecond portions of all files is zeroed out. Currently this works, but care must be taken for situations like these.

HFS on macOS only supports 1 second timestamps. This causes a significant number of problems, particularly with Cargo’s testsuite which does rapid builds in succession. Other filesystems have various degrees of resolution.

Various weird filesystems (such as network filesystems) also can cause complications. Network filesystems may track the time on the server (except when the time is set manually such as with filetime::set_file_times). Not all filesystems support modifying the mtime.

See the A-rebuild-detection label on the issue tracker for more.


  1. extra-flags and RUSTFLAGS are conditionally excluded when --remap-path-prefix is present to avoid breaking build reproducibility while we wait for trim-paths 

  2. Build script and bin dependencies are not included. 

  3. __CARGO_DEFAULT_LIB_METADATA is set by rustbuild to embed the release channel (bootstrap/stable/beta/nightly) in libstd. 

  4. See below for details on mtime tracking. 

  5. Config settings that are not otherwise captured anywhere else. Currently, this is only doc.extern-map

  6. Via Manifest::lint_rustflags 

Re-exports§

Modules§

Structs§

  • Dependency edge information for fingerprints. This is generated for each dependency and is stored in a Fingerprint.
  • A fingerprint can be considered to be a “short string” representing the state of a world for a package.

Enums§

Functions§