cargo/core/resolver/
encode.rs

1//! Definition of how to encode a `Resolve` into a TOML `Cargo.lock` file
2//!
3//! This module contains all machinery necessary to parse a `Resolve` from a
4//! `Cargo.lock` as well as serialize a `Resolve` to a `Cargo.lock`.
5//!
6//! ## Changing `Cargo.lock`
7//!
8//! In general Cargo is quite conservative about changing the format of
9//! `Cargo.lock`. Usage of new features in Cargo can change `Cargo.lock` at any
10//! time, but otherwise changing the serialization of `Cargo.lock` is a
11//! difficult operation to do that we typically avoid.
12//!
13//! The main problem with changing the format of `Cargo.lock` is that it can
14//! cause quite a bad experience for end users who use different versions of
15//! Cargo. If every PR to a project oscillates between the stable channel's
16//! encoding of Cargo.lock and the nightly channel's encoding then that's a
17//! pretty bad experience.
18//!
19//! We do, however, want to change `Cargo.lock` over time. (and we have!). To do
20//! this the rules that we currently have are:
21//!
22//! * Add support for the new format to Cargo. This involves code changes in
23//!   Cargo itself, likely by adding a new variant of `ResolveVersion` and
24//!   branching on that where necessary. This is accompanied with tests in the
25//!   `lockfile_compat` module.
26//!
27//!   * Do not update `ResolveVersion::default()`. The new lockfile format will
28//!     not be used yet.
29//!
30//!   * Preserve the new format if found. This means that if Cargo finds the new
31//!     version it'll keep using it, but otherwise it continues to use whatever
32//!     format it previously found.
33//!
34//! * Wait a "long time". This is at least until the changes here hit stable
35//!   Rust. Often though we wait a little longer to let the changes percolate
36//!   into one or two older stable releases.
37//!
38//! * Change the return value of `ResolveVersion::default()` to the new format.
39//!   This will cause new lock files to use the latest encoding as well as
40//!   causing any operation which updates the lock file to update to the new
41//!   format.
42//!
43//! This migration scheme in general means that Cargo we'll get *support* for a
44//! new format into Cargo ASAP, but it won't be exercised yet (except in Cargo's
45//! own tests). Eventually when stable/beta/nightly all have support for the new
46//! format (and maybe a few previous stable versions) we flip the switch.
47//! Projects on nightly will quickly start seeing changes, but
48//! stable/beta/nightly will all understand this new format and will preserve
49//! it.
50//!
51//! While this does mean that projects' `Cargo.lock` changes over time, it's
52//! typically a pretty minimal effort change that's just "check in what's
53//! there".
54//!
55//! ## Historical changes to `Cargo.lock`
56//!
57//! Listed from most recent to oldest, these are some of the changes we've made
58//! to `Cargo.lock`'s serialization format:
59//!
60//! * A `version` marker is now at the top of the lock file which is a way for
61//!   super-old Cargos (at least since this was implemented) to give a formal
62//!   error if they see a lock file from a super-future Cargo. Additionally as
63//!   part of this change the encoding of `git` dependencies in lock files
64//!   changed where `branch = "master"` is now encoded with `branch=master`
65//!   instead of with nothing at all.
66//!
67//! * The entries in `dependencies` arrays have been shortened and the
68//!   `checksum` field now shows up directly in `[[package]]` instead of always
69//!   at the end of the file. The goal of this change was to ideally reduce
70//!   merge conflicts being generated on `Cargo.lock`. Updating a version of a
71//!   package now only updates two lines in the file, the checksum and the
72//!   version number, most of the time. Dependency edges are specified in a
73//!   compact form where possible where just the name is listed. The
74//!   version/source on dependency edges are only listed if necessary to
75//!   disambiguate which version or which source is in use.
76//!
77//! * A comment at the top of the file indicates that the file is a generated
78//!   file and contains the special symbol `@generated` to indicate to common
79//!   review tools that it's a generated file.
80//!
81//! * A `[root]` entry for the "root crate" has been removed and instead now
82//!   included in `[[package]]` like everything else.
83//!
84//! * All packages from registries contain a `checksum` which is a sha256
85//!   checksum of the tarball the package is associated with. This is all stored
86//!   in the `[metadata]` table of `Cargo.lock` which all versions of Cargo
87//!   since 1.0 have preserved. The goal of this was to start recording
88//!   checksums so mirror sources can be verified.
89//!
90//! ## Other oddities about `Cargo.lock`
91//!
92//! There's a few other miscellaneous weird things about `Cargo.lock` that you
93//! may want to be aware of when reading this file:
94//!
95//! * All packages have a `source` listed to indicate where they come from. For
96//!   `path` dependencies, however, no `source` is listed. There's no way we
97//!   could emit a filesystem path name and have that be portable across
98//!   systems, so all packages from a `path` are not listed with a `source`.
99//!   Note that this also means that all packages with `path` sources must have
100//!   unique names.
101//!
102//! * The `[metadata]` table in `Cargo.lock` is intended to be a generic mapping
103//!   of strings to strings that's simply preserved by Cargo. This was a very
104//!   early effort to be forward compatible against changes to `Cargo.lock`'s
105//!   format. This is nowadays sort of deemed a bad idea though and we don't
106//!   really use it that much except for `checksum`s historically. It's not
107//!   really recommended to use this.
108//!
109//! * The actual literal on-disk serialization is found in
110//!   `src/cargo/ops/lockfile.rs` which basically renders a `toml::Value` in a
111//!   special fashion to make sure we have strict control over the on-disk
112//!   format.
113
114use super::{Resolve, ResolveVersion};
115use crate::core::{Dependency, GitReference, Package, PackageId, SourceId, Workspace};
116use crate::util::errors::CargoResult;
117use crate::util::interning::InternedString;
118use crate::util::{Graph, internal};
119use anyhow::{Context as _, bail};
120use cargo_util_schemas::lockfile::{
121    TomlLockfile, TomlLockfileDependency, TomlLockfilePackageId, TomlLockfilePatch,
122    TomlLockfileSourceId,
123};
124use serde::ser;
125use std::collections::{HashMap, HashSet};
126use tracing::debug;
127
128/// Convert a `Cargo.lock` to a Resolve.
129///
130/// Note that this `Resolve` is not "complete". For example, the
131/// dependencies do not know the difference between regular/dev/build
132/// dependencies, so they are not filled in. It also does not include
133/// `features`. Care should be taken when using this Resolve. One of the
134/// primary uses is to be used with `resolve_with_previous` to guide the
135/// resolver to create a complete Resolve.
136pub fn into_resolve(
137    resolve: TomlLockfile,
138    original: &str,
139    ws: &Workspace<'_>,
140) -> CargoResult<Resolve> {
141    let path_deps: HashMap<String, HashMap<semver::Version, SourceId>> = build_path_deps(ws)?;
142    let mut checksums = HashMap::new();
143
144    let mut version = match resolve.version {
145        Some(n @ 5) if ws.gctx().nightly_features_allowed => {
146            if ws.gctx().cli_unstable().next_lockfile_bump {
147                ResolveVersion::V5
148            } else {
149                anyhow::bail!("lock file version `{n}` requires `-Znext-lockfile-bump`");
150            }
151        }
152        Some(4) => ResolveVersion::V4,
153        Some(3) => ResolveVersion::V3,
154        Some(n) => bail!(
155            "lock file version `{}` was found, but this version of Cargo \
156             does not understand this lock file, perhaps Cargo needs \
157             to be updated?",
158            n,
159        ),
160        // Historically Cargo did not have a version indicator in lock
161        // files, so this could either be the V1 or V2 encoding. We assume
162        // an older format is being parsed until we see so otherwise.
163        None => ResolveVersion::V1,
164    };
165
166    let packages = {
167        let mut packages = resolve.package.unwrap_or_default();
168        if let Some(root) = resolve.root {
169            packages.insert(0, root);
170        }
171        packages
172    };
173
174    // `PackageId`s in the lock file don't include the `source` part
175    // for workspace members, so we reconstruct proper IDs.
176    let live_pkgs = {
177        let mut live_pkgs = HashMap::new();
178        let mut all_pkgs = HashSet::new();
179        for pkg in packages.iter() {
180            let enc_id = TomlLockfilePackageId {
181                name: pkg.name.clone(),
182                version: Some(pkg.version.clone()),
183                source: pkg.source.clone(),
184            };
185
186            if !all_pkgs.insert(enc_id.clone()) {
187                anyhow::bail!("package `{}` is specified twice in the lockfile", pkg.name);
188            }
189            let id = match pkg
190                .source
191                .as_ref()
192                .map(|source| SourceId::from_url(&source.source_str()))
193                .transpose()?
194                .or_else(|| get_source_id(&path_deps, &pkg).copied())
195            {
196                // We failed to find a local package in the workspace.
197                // It must have been removed and should be ignored.
198                None => {
199                    debug!("path dependency now missing {} v{}", pkg.name, pkg.version);
200                    continue;
201                }
202                Some(source) => PackageId::try_new(&pkg.name, &pkg.version, source)?,
203            };
204
205            // If a package has a checksum listed directly on it then record
206            // that here, and we also bump our version up to 2 since V1
207            // didn't ever encode this field.
208            if let Some(cksum) = &pkg.checksum {
209                version = version.max(ResolveVersion::V2);
210                checksums.insert(id, Some(cksum.clone()));
211            }
212
213            assert!(live_pkgs.insert(enc_id, (id, pkg)).is_none())
214        }
215        live_pkgs
216    };
217
218    // When decoding a V2 version the edges in `dependencies` aren't
219    // guaranteed to have either version or source information. This `map`
220    // is used to find package ids even if dependencies have missing
221    // information. This map is from name to version to source to actual
222    // package ID. (various levels to drill down step by step)
223    let mut map = HashMap::new();
224    for (id, _) in live_pkgs.values() {
225        map.entry(id.name().as_str())
226            .or_insert_with(HashMap::new)
227            .entry(id.version().to_string())
228            .or_insert_with(HashMap::new)
229            .insert(id.source_id(), *id);
230    }
231
232    let mut lookup_id = |enc_id: &TomlLockfilePackageId| -> Option<PackageId> {
233        // The name of this package should always be in the larger list of
234        // all packages.
235        let by_version = map.get(enc_id.name.as_str())?;
236
237        // If the version is provided, look that up. Otherwise if the
238        // version isn't provided this is a V2 manifest and we should only
239        // have one version for this name. If we have more than one version
240        // for the name then it's ambiguous which one we'd use. That
241        // shouldn't ever actually happen but in theory bad git merges could
242        // produce invalid lock files, so silently ignore these cases.
243        let by_source = match &enc_id.version {
244            Some(version) => by_version.get(version)?,
245            None => {
246                version = version.max(ResolveVersion::V2);
247                if by_version.len() == 1 {
248                    by_version.values().next().unwrap()
249                } else {
250                    return None;
251                }
252            }
253        };
254
255        // This is basically the same as above. Note though that `source` is
256        // always missing for path dependencies regardless of serialization
257        // format. That means we have to handle the `None` case a bit more
258        // carefully.
259        match &enc_id.source {
260            Some(source) => by_source
261                .get(&SourceId::from_url(&source.source_str()).unwrap())
262                .cloned(),
263            None => {
264                // Look through all possible packages ids for this
265                // name/version. If there's only one `path` dependency then
266                // we are hardcoded to use that since `path` dependencies
267                // can't have a source listed.
268                let mut path_packages = by_source.values().filter(|p| p.source_id().is_path());
269                if let Some(path) = path_packages.next() {
270                    if path_packages.next().is_some() {
271                        return None;
272                    }
273                    Some(*path)
274
275                // ... otherwise if there's only one then we must be
276                // implicitly using that one due to a V2 serialization of
277                // the lock file
278                } else if by_source.len() == 1 {
279                    let id = by_source.values().next().unwrap();
280                    version = version.max(ResolveVersion::V2);
281                    Some(*id)
282
283                // ... and failing that we probably had a bad git merge of
284                // `Cargo.lock` or something like that, so just ignore this.
285                } else {
286                    None
287                }
288            }
289        }
290    };
291
292    let mut g = Graph::new();
293
294    for (id, _) in live_pkgs.values() {
295        g.add(*id);
296    }
297
298    for &(ref id, pkg) in live_pkgs.values() {
299        let Some(ref deps) = pkg.dependencies else {
300            continue;
301        };
302
303        for edge in deps.iter() {
304            if let Some(to_depend_on) = lookup_id(edge) {
305                g.link(*id, to_depend_on);
306            }
307        }
308    }
309
310    let replacements = {
311        let mut replacements = HashMap::new();
312        for &(ref id, pkg) in live_pkgs.values() {
313            if let Some(ref replace) = pkg.replace {
314                assert!(pkg.dependencies.is_none());
315                if let Some(replace_id) = lookup_id(replace) {
316                    replacements.insert(*id, replace_id);
317                }
318            }
319        }
320        replacements
321    };
322
323    let mut metadata = resolve.metadata.unwrap_or_default();
324
325    // In the V1 serialization formats all checksums were listed in the lock
326    // file in the `[metadata]` section, so if we're still V1 then look for
327    // that here.
328    let prefix = "checksum ";
329    let mut to_remove = Vec::new();
330    for (k, v) in metadata.iter().filter(|p| p.0.starts_with(prefix)) {
331        to_remove.push(k.to_string());
332        let k = k.strip_prefix(prefix).unwrap();
333        let enc_id: TomlLockfilePackageId = k
334            .parse()
335            .with_context(|| internal("invalid encoding of checksum in lockfile"))?;
336        let Some(id) = lookup_id(&enc_id) else {
337            continue;
338        };
339
340        let v = if v == "<none>" {
341            None
342        } else {
343            Some(v.to_string())
344        };
345        checksums.insert(id, v);
346    }
347    // If `checksum` was listed in `[metadata]` but we were previously
348    // listed as `V2` then assume some sort of bad git merge happened, so
349    // discard all checksums and let's regenerate them later.
350    if !to_remove.is_empty() && version >= ResolveVersion::V2 {
351        checksums.drain();
352    }
353    for k in to_remove {
354        metadata.remove(&k);
355    }
356
357    let mut unused_patches = Vec::new();
358    for pkg in resolve.patch.unused {
359        let id = match pkg
360            .source
361            .as_ref()
362            .map(|source| SourceId::from_url(&source.source_str()))
363            .transpose()?
364            .or_else(|| get_source_id(&path_deps, &pkg).copied())
365        {
366            Some(src) => PackageId::try_new(&pkg.name, &pkg.version, src)?,
367            None => continue,
368        };
369        unused_patches.push(id);
370    }
371
372    // We have a curious issue where in the "v1 format" we buggily had a
373    // trailing blank line at the end of lock files under some specific
374    // conditions.
375    //
376    // Cargo is trying to write new lockfies in the "v2 format" but if you
377    // have no dependencies, for example, then the lockfile encoded won't
378    // really have any indicator that it's in the new format (no
379    // dependencies or checksums listed). This means that if you type `cargo
380    // new` followed by `cargo build` it will generate a "v2 format" lock
381    // file since none previously existed. When reading this on the next
382    // `cargo build`, however, it generates a new lock file because when
383    // reading in that lockfile we think it's the v1 format.
384    //
385    // To help fix this issue we special case here. If our lockfile only has
386    // one trailing newline, not two, *and* it only has one package, then
387    // this is actually the v2 format.
388    if original.ends_with('\n')
389        && !original.ends_with("\n\n")
390        && version == ResolveVersion::V1
391        && g.iter().count() == 1
392    {
393        version = ResolveVersion::V2;
394    }
395
396    return Ok(Resolve::new(
397        g,
398        replacements,
399        HashMap::new(),
400        checksums,
401        metadata,
402        unused_patches,
403        version,
404        HashMap::new(),
405    ));
406
407    fn get_source_id<'a>(
408        path_deps: &'a HashMap<String, HashMap<semver::Version, SourceId>>,
409        pkg: &'a TomlLockfileDependency,
410    ) -> Option<&'a SourceId> {
411        path_deps.iter().find_map(|(name, version_source)| {
412            if name != &pkg.name || version_source.len() == 0 {
413                return None;
414            }
415            if version_source.len() == 1 {
416                return Some(version_source.values().next().unwrap());
417            }
418            // If there are multiple candidates for the same name, it needs to be determined by combining versions (See #13405).
419            if let Ok(pkg_version) = pkg.version.parse::<semver::Version>() {
420                if let Some(source_id) = version_source.get(&pkg_version) {
421                    return Some(source_id);
422                }
423            }
424
425            None
426        })
427    }
428}
429
430fn build_path_deps(
431    ws: &Workspace<'_>,
432) -> CargoResult<HashMap<String, HashMap<semver::Version, SourceId>>> {
433    // If a crate is **not** a path source, then we're probably in a situation
434    // such as `cargo install` with a lock file from a remote dependency. In
435    // that case we don't need to fixup any path dependencies (as they're not
436    // actually path dependencies any more), so we ignore them.
437    let members = ws
438        .members()
439        .filter(|p| p.package_id().source_id().is_path())
440        .collect::<Vec<_>>();
441
442    let mut ret: HashMap<String, HashMap<semver::Version, SourceId>> = HashMap::new();
443    let mut visited = HashSet::new();
444    for member in members.iter() {
445        ret.entry(member.package_id().name().to_string())
446            .or_insert_with(HashMap::new)
447            .insert(
448                member.package_id().version().clone(),
449                member.package_id().source_id(),
450            );
451        visited.insert(member.package_id().source_id());
452    }
453    for member in members.iter() {
454        build_pkg(member, ws, &mut ret, &mut visited);
455    }
456    for deps in ws.root_patch()?.values() {
457        for dep in deps {
458            build_dep(dep, ws, &mut ret, &mut visited);
459        }
460    }
461    for (_, dep) in ws.root_replace() {
462        build_dep(dep, ws, &mut ret, &mut visited);
463    }
464
465    return Ok(ret);
466
467    fn build_pkg(
468        pkg: &Package,
469        ws: &Workspace<'_>,
470        ret: &mut HashMap<String, HashMap<semver::Version, SourceId>>,
471        visited: &mut HashSet<SourceId>,
472    ) {
473        for dep in pkg.dependencies() {
474            build_dep(dep, ws, ret, visited);
475        }
476    }
477
478    fn build_dep(
479        dep: &Dependency,
480        ws: &Workspace<'_>,
481        ret: &mut HashMap<String, HashMap<semver::Version, SourceId>>,
482        visited: &mut HashSet<SourceId>,
483    ) {
484        let id = dep.source_id();
485        if visited.contains(&id) || !id.is_path() {
486            return;
487        }
488        let path = match id.url().to_file_path() {
489            Ok(p) => p.join("Cargo.toml"),
490            Err(_) => return,
491        };
492        let Ok(pkg) = ws.load(&path) else { return };
493        ret.entry(pkg.package_id().name().to_string())
494            .or_insert_with(HashMap::new)
495            .insert(
496                pkg.package_id().version().clone(),
497                pkg.package_id().source_id(),
498            );
499        visited.insert(pkg.package_id().source_id());
500        build_pkg(&pkg, ws, ret, visited);
501    }
502}
503
504impl ser::Serialize for Resolve {
505    #[tracing::instrument(skip_all)]
506    fn serialize<S>(&self, s: S) -> Result<S::Ok, S::Error>
507    where
508        S: ser::Serializer,
509    {
510        let mut ids: Vec<_> = self.iter().collect();
511        ids.sort();
512
513        let state = EncodeState::new(self);
514
515        let encodable = ids
516            .iter()
517            .map(|&id| encodable_resolve_node(id, self, &state))
518            .collect::<Vec<_>>();
519
520        let mut metadata = self.metadata().clone();
521
522        if self.version() == ResolveVersion::V1 {
523            for &id in ids.iter().filter(|id| !id.source_id().is_path()) {
524                let checksum = match self.checksums()[&id] {
525                    Some(ref s) => &s[..],
526                    None => "<none>",
527                };
528                let id = encodable_package_id(id, &state, self.version());
529                metadata.insert(format!("checksum {}", id), checksum.to_string());
530            }
531        }
532
533        let metadata = if metadata.is_empty() {
534            None
535        } else {
536            Some(metadata)
537        };
538
539        let patch = TomlLockfilePatch {
540            unused: self
541                .unused_patches()
542                .iter()
543                .map(|id| TomlLockfileDependency {
544                    name: id.name().to_string(),
545                    version: id.version().to_string(),
546                    source: encodable_source_id(id.source_id(), self.version()),
547                    dependencies: None,
548                    replace: None,
549                    checksum: if self.version() >= ResolveVersion::V2 {
550                        self.checksums().get(id).and_then(|x| x.clone())
551                    } else {
552                        None
553                    },
554                })
555                .collect(),
556        };
557        TomlLockfile {
558            package: Some(encodable),
559            root: None,
560            metadata,
561            patch,
562            version: match self.version() {
563                ResolveVersion::V5 => Some(5),
564                ResolveVersion::V4 => Some(4),
565                ResolveVersion::V3 => Some(3),
566                ResolveVersion::V2 | ResolveVersion::V1 => None,
567            },
568        }
569        .serialize(s)
570    }
571}
572
573pub struct EncodeState<'a> {
574    counts: Option<HashMap<InternedString, HashMap<&'a semver::Version, usize>>>,
575}
576
577impl<'a> EncodeState<'a> {
578    pub fn new(resolve: &'a Resolve) -> EncodeState<'a> {
579        let counts = if resolve.version() >= ResolveVersion::V2 {
580            let mut map = HashMap::new();
581            for id in resolve.iter() {
582                let slot = map
583                    .entry(id.name())
584                    .or_insert_with(HashMap::new)
585                    .entry(id.version())
586                    .or_insert(0);
587                *slot += 1;
588            }
589            Some(map)
590        } else {
591            None
592        };
593        EncodeState { counts }
594    }
595}
596
597fn encodable_resolve_node(
598    id: PackageId,
599    resolve: &Resolve,
600    state: &EncodeState<'_>,
601) -> TomlLockfileDependency {
602    let (replace, deps) = match resolve.replacement(id) {
603        Some(id) => (
604            Some(encodable_package_id(id, state, resolve.version())),
605            None,
606        ),
607        None => {
608            let mut deps = resolve
609                .deps_not_replaced(id)
610                .map(|(id, _)| encodable_package_id(id, state, resolve.version()))
611                .collect::<Vec<_>>();
612            deps.sort();
613            (None, Some(deps))
614        }
615    };
616
617    TomlLockfileDependency {
618        name: id.name().to_string(),
619        version: id.version().to_string(),
620        source: encodable_source_id(id.source_id(), resolve.version()),
621        dependencies: deps,
622        replace,
623        checksum: if resolve.version() >= ResolveVersion::V2 {
624            resolve.checksums().get(&id).and_then(|s| s.clone())
625        } else {
626            None
627        },
628    }
629}
630
631pub fn encodable_package_id(
632    id: PackageId,
633    state: &EncodeState<'_>,
634    resolve_version: ResolveVersion,
635) -> TomlLockfilePackageId {
636    let mut version = Some(id.version().to_string());
637    let mut id_to_encode = id.source_id();
638    if resolve_version <= ResolveVersion::V2 {
639        if let Some(GitReference::Branch(b)) = id_to_encode.git_reference() {
640            if b == "master" {
641                id_to_encode =
642                    SourceId::for_git(id_to_encode.url(), GitReference::DefaultBranch).unwrap();
643            }
644        }
645    }
646    let mut source = encodable_source_id(id_to_encode.without_precise(), resolve_version);
647    if let Some(counts) = &state.counts {
648        let version_counts = &counts[&id.name()];
649        if version_counts[&id.version()] == 1 {
650            source = None;
651            if version_counts.len() == 1 {
652                version = None;
653            }
654        }
655    }
656    TomlLockfilePackageId {
657        name: id.name().to_string(),
658        version,
659        source,
660    }
661}
662
663fn encodable_source_id(id: SourceId, version: ResolveVersion) -> Option<TomlLockfileSourceId> {
664    if id.is_path() {
665        None
666    } else {
667        Some(
668            if version >= ResolveVersion::V4 {
669                TomlLockfileSourceId::new(id.as_encoded_url().to_string())
670            } else {
671                TomlLockfileSourceId::new(id.as_url().to_string())
672            }
673            .expect("source ID should have valid URLs"),
674        )
675    }
676}