Expand description
Support for tracking the last time files were used to assist with cleaning up those files if they haven’t been used in a while.
Tracking of cache files is stored in a sqlite database which contains a timestamp of the last time the file was used, as well as the size of the file.
While cargo is running, when it detects a use of a cache file, it adds a
timestamp to DeferredGlobalLastUse
. This batches up a set of changes
that are then flushed to the database all at once (via
DeferredGlobalLastUse::save
). Ideally saving would only be done once
for performance reasons, but that is not really possible due to the way
cargo works, since there are different ways cargo can be used (like cargo generate-lockfile
, cargo fetch
, and cargo build
are all very
different ways the code is used).
All of the database interaction is done through the GlobalCacheTracker
type.
There is a single global GlobalCacheTracker
and
DeferredGlobalLastUse
stored in GlobalContext
.
The high-level interface for performing garbage collection is defined in
the crate::core::gc
module. The functions there are responsible for
interacting with the GlobalCacheTracker
to handle cleaning of global
cache data.
§Automatic gc
Some commands (primarily the build commands) will trigger an automatic
deletion of files that haven’t been used in a while. The high-level
interface for this is the crate::core::gc::auto_gc
function.
The GlobalCacheTracker
database tracks the last time an automatic gc
was performed so that it is only done once per day for performance
reasons.
§Manual gc
The user can perform a manual garbage collection with the cargo clean
command. That command has a variety of options to specify what to delete.
Manual gc supports deleting based on age or size or both. From a
high-level, this is done by the crate::core::gc::Gc::gc
method, which
calls into GlobalCacheTracker
to handle all the cleaning.
§Locking
Usage of the database requires that the package cache is locked to prevent
concurrent access. Although sqlite has built-in locking support, we want
to use cargo’s locking so that the “Blocking” message gets displayed, and
so that locks can block indefinitely for long-running build commands.
[rusqlite
] has a default timeout of 5 seconds, though that is
configurable.
When garbage collection is being performed, the package cache lock must be
in CacheLockMode::MutateExclusive
to ensure no other cargo process is
running. See crate::util::cache_lock
for more detail on locking.
When performing automatic gc, crate::core::gc::auto_gc
will skip the
GC if the package cache lock is already held by anything else. Automatic
GC is intended to be opportunistic, and should impose as little disruption
to the user as possible.
§Compatibility
The database must retain both forwards and backwards compatibility between different versions of cargo. For the most part, this shouldn’t be too difficult to maintain. Generally sqlite doesn’t change on-disk formats between versions (the introduction of WAL is one of the few examples where version 3 had a format change, but we wouldn’t use it anyway since it has shared-memory requirements cargo can’t depend on due to things like network mounts).
Schema changes must be managed through migrations
by adding new
entries that make a change to the database. Changes must not break older
versions of cargo. Generally, adding columns should be fine (either with a
default value, or NULL). Adding tables should also be fine. Just don’t do
destructive things like removing a column, or changing the semantics of an
existing column.
Since users may run older versions of cargo that do not do cache tracking,
the GlobalCacheTracker::sync_db_with_files
method helps dealing with
keeping the database in sync in the presence of older versions of cargo
touching the cache directories.
§Performance
A lot of focus on the design of this system is to minimize the performance impact. Every build command needs to save updates which we try to avoid having a noticeable impact on build times. Systems like Windows, particularly with a magnetic hard disk, can experience a fairly large impact of cargo’s overhead. Cargo’s benchsuite has some benchmarks to help compare different environments, or changes to the code here. Please try to keep performance in mind if making any major changes.
Performance of cargo clean
is not quite as important since it is not
expected to be run often. However, it is still courteous to the user to
try to not impact it too much. One part that has a performance concern is
that the clean command will synchronize the database with whatever is on
disk if needed (in case files were added by older versions of cargo that
don’t do cache tracking, or if the user manually deleted some files). This
can potentially be very slow, especially if the two are very out of sync.
§Filesystems
Everything here is sensitive to the kind of filesystem it is running on. People tend to run cargo in all sorts of strange environments that have limited capabilities, or on things like read-only mounts. The code here needs to gracefully handle as many situations as possible.
See also the information in the Performance and Locking sections when considering different filesystems and their impact on performance and locking.
There are checks for read-only filesystems, which is generally ignored.
Macros§
- Helper to generate the upsert for the parent tables.
Structs§
- Filesystem paths in the global cache.
- This is a cache of modifications that will be saved to disk all at once via the
DeferredGlobalLastUse::save
method. - The key for a git checkout entry stored in the database.
- The key for a git db entry stored in the database.
- Tracking for the global shared cache (registry files, etc.).
- Parent
Id 🔒Type for SQL columns that refer to the primary key of their parent table. - The key for a registry
.crate
entry stored in the database. - The key for a registry index entry stored in the database.
- The key for a registry src directory entry stored in the database.
Constants§
- The filename of the database.
- How often timestamps will be updated.
Functions§
- du 🔒
- Returns the disk usage for a git checkout directory.
- Returns whether or not the given error should cause a warning to be displayed to the user.
- Migrations which initialize the database, and can be used to evolve it over time.
- now 🔒Returns the current time.
- Converts a
SystemTime
to aTimestamp
which can be stored in the database.
Type Aliases§
- Type for timestamps as stored in the database.