Module cargo::core::global_cache_tracker

source ·
Expand description

Support for tracking the last time files were used to assist with cleaning up those files if they haven’t been used in a while.

Tracking of cache files is stored in a sqlite database which contains a timestamp of the last time the file was used, as well as the size of the file.

While cargo is running, when it detects a use of a cache file, it adds a timestamp to DeferredGlobalLastUse. This batches up a set of changes that are then flushed to the database all at once (via DeferredGlobalLastUse::save). Ideally saving would only be done once for performance reasons, but that is not really possible due to the way cargo works, since there are different ways cargo can be used (like cargo generate-lockfile, cargo fetch, and cargo build are all very different ways the code is used).

All of the database interaction is done through the GlobalCacheTracker type.

There is a single global GlobalCacheTracker and DeferredGlobalLastUse stored in GlobalContext.

The high-level interface for performing garbage collection is defined in the crate::core::gc module. The functions there are responsible for interacting with the GlobalCacheTracker to handle cleaning of global cache data.

§Automatic gc

Some commands (primarily the build commands) will trigger an automatic deletion of files that haven’t been used in a while. The high-level interface for this is the crate::core::gc::auto_gc function.

The GlobalCacheTracker database tracks the last time an automatic gc was performed so that it is only done once per day for performance reasons.

§Manual gc

The user can perform a manual garbage collection with the cargo clean command. That command has a variety of options to specify what to delete. Manual gc supports deleting based on age or size or both. From a high-level, this is done by the crate::core::gc::Gc::gc method, which calls into GlobalCacheTracker to handle all the cleaning.

§Locking

Usage of the database requires that the package cache is locked to prevent concurrent access. Although sqlite has built-in locking support, we want to use cargo’s locking so that the “Blocking” message gets displayed, and so that locks can block indefinitely for long-running build commands. [rusqlite] has a default timeout of 5 seconds, though that is configurable.

When garbage collection is being performed, the package cache lock must be in CacheLockMode::MutateExclusive to ensure no other cargo process is running. See crate::util::cache_lock for more detail on locking.

When performing automatic gc, crate::core::gc::auto_gc will skip the GC if the package cache lock is already held by anything else. Automatic GC is intended to be opportunistic, and should impose as little disruption to the user as possible.

§Compatibility

The database must retain both forwards and backwards compatibility between different versions of cargo. For the most part, this shouldn’t be too difficult to maintain. Generally sqlite doesn’t change on-disk formats between versions (the introduction of WAL is one of the few examples where version 3 had a format change, but we wouldn’t use it anyway since it has shared-memory requirements cargo can’t depend on due to things like network mounts).

Schema changes must be managed through migrations by adding new entries that make a change to the database. Changes must not break older versions of cargo. Generally, adding columns should be fine (either with a default value, or NULL). Adding tables should also be fine. Just don’t do destructive things like removing a column, or changing the semantics of an existing column.

Since users may run older versions of cargo that do not do cache tracking, the GlobalCacheTracker::sync_db_with_files method helps dealing with keeping the database in sync in the presence of older versions of cargo touching the cache directories.

§Performance

A lot of focus on the design of this system is to minimize the performance impact. Every build command needs to save updates which we try to avoid having a noticeable impact on build times. Systems like Windows, particularly with a magnetic hard disk, can experience a fairly large impact of cargo’s overhead. Cargo’s benchsuite has some benchmarks to help compare different environments, or changes to the code here. Please try to keep performance in mind if making any major changes.

Performance of cargo clean is not quite as important since it is not expected to be run often. However, it is still courteous to the user to try to not impact it too much. One part that has a performance concern is that the clean command will synchronize the database with whatever is on disk if needed (in case files were added by older versions of cargo that don’t do cache tracking, or if the user manually deleted some files). This can potentially be very slow, especially if the two are very out of sync.

§Filesystems

Everything here is sensitive to the kind of filesystem it is running on. People tend to run cargo in all sorts of strange environments that have limited capabilities, or on things like read-only mounts. The code here needs to gracefully handle as many situations as possible.

See also the information in the Performance and Locking sections when considering different filesystems and their impact on performance and locking.

There are checks for read-only filesystems, which is generally ignored.

Macros§

Structs§

  • BasePaths 🔒
    Filesystem paths in the global cache.
  • This is a cache of modifications that will be saved to disk all at once via the DeferredGlobalLastUse::save method.
  • The key for a git checkout entry stored in the database.
  • The key for a git db entry stored in the database.
  • Tracking for the global shared cache (registry files, etc.).
  • ParentId 🔒
    Type for SQL columns that refer to the primary key of their parent table.
  • The key for a registry .crate entry stored in the database.
  • The key for a registry index entry stored in the database.
  • The key for a registry src directory entry stored in the database.

Constants§

Functions§

  • du 🔒
  • Returns the disk usage for a git checkout directory.
  • Returns whether or not the given error should cause a warning to be displayed to the user.
  • migrations 🔒
    Migrations which initialize the database, and can be used to evolve it over time.
  • now 🔒
    Returns the current time.
  • Converts a SystemTime to a Timestamp which can be stored in the database.

Type Aliases§

  • Timestamp 🔒
    Type for timestamps as stored in the database.