cargo/core/resolver/encode.rs
1//! Definition of how to encode a `Resolve` into a TOML `Cargo.lock` file
2//!
3//! This module contains all machinery necessary to parse a `Resolve` from a
4//! `Cargo.lock` as well as serialize a `Resolve` to a `Cargo.lock`.
5//!
6//! ## Changing `Cargo.lock`
7//!
8//! In general Cargo is quite conservative about changing the format of
9//! `Cargo.lock`. Usage of new features in Cargo can change `Cargo.lock` at any
10//! time, but otherwise changing the serialization of `Cargo.lock` is a
11//! difficult operation to do that we typically avoid.
12//!
13//! The main problem with changing the format of `Cargo.lock` is that it can
14//! cause quite a bad experience for end users who use different versions of
15//! Cargo. If every PR to a project oscillates between the stable channel's
16//! encoding of Cargo.lock and the nightly channel's encoding then that's a
17//! pretty bad experience.
18//!
19//! We do, however, want to change `Cargo.lock` over time. (and we have!). To do
20//! this the rules that we currently have are:
21//!
22//! * Add support for the new format to Cargo. This involves code changes in
23//! Cargo itself, likely by adding a new variant of `ResolveVersion` and
24//! branching on that where necessary. This is accompanied with tests in the
25//! `lockfile_compat` module.
26//!
27//! * Do not update `ResolveVersion::default()`. The new lockfile format will
28//! not be used yet.
29//!
30//! * Preserve the new format if found. This means that if Cargo finds the new
31//! version it'll keep using it, but otherwise it continues to use whatever
32//! format it previously found.
33//!
34//! * Wait a "long time". This is at least until the changes here hit stable
35//! Rust. Often though we wait a little longer to let the changes percolate
36//! into one or two older stable releases.
37//!
38//! * Change the return value of `ResolveVersion::default()` to the new format.
39//! This will cause new lock files to use the latest encoding as well as
40//! causing any operation which updates the lock file to update to the new
41//! format.
42//!
43//! This migration scheme in general means that Cargo we'll get *support* for a
44//! new format into Cargo ASAP, but it won't be exercised yet (except in Cargo's
45//! own tests). Eventually when stable/beta/nightly all have support for the new
46//! format (and maybe a few previous stable versions) we flip the switch.
47//! Projects on nightly will quickly start seeing changes, but
48//! stable/beta/nightly will all understand this new format and will preserve
49//! it.
50//!
51//! While this does mean that projects' `Cargo.lock` changes over time, it's
52//! typically a pretty minimal effort change that's just "check in what's
53//! there".
54//!
55//! ## Historical changes to `Cargo.lock`
56//!
57//! Listed from most recent to oldest, these are some of the changes we've made
58//! to `Cargo.lock`'s serialization format:
59//!
60//! * A `version` marker is now at the top of the lock file which is a way for
61//! super-old Cargos (at least since this was implemented) to give a formal
62//! error if they see a lock file from a super-future Cargo. Additionally as
63//! part of this change the encoding of `git` dependencies in lock files
64//! changed where `branch = "master"` is now encoded with `branch=master`
65//! instead of with nothing at all.
66//!
67//! * The entries in `dependencies` arrays have been shortened and the
68//! `checksum` field now shows up directly in `[[package]]` instead of always
69//! at the end of the file. The goal of this change was to ideally reduce
70//! merge conflicts being generated on `Cargo.lock`. Updating a version of a
71//! package now only updates two lines in the file, the checksum and the
72//! version number, most of the time. Dependency edges are specified in a
73//! compact form where possible where just the name is listed. The
74//! version/source on dependency edges are only listed if necessary to
75//! disambiguate which version or which source is in use.
76//!
77//! * A comment at the top of the file indicates that the file is a generated
78//! file and contains the special symbol `@generated` to indicate to common
79//! review tools that it's a generated file.
80//!
81//! * A `[root]` entry for the "root crate" has been removed and instead now
82//! included in `[[package]]` like everything else.
83//!
84//! * All packages from registries contain a `checksum` which is a sha256
85//! checksum of the tarball the package is associated with. This is all stored
86//! in the `[metadata]` table of `Cargo.lock` which all versions of Cargo
87//! since 1.0 have preserved. The goal of this was to start recording
88//! checksums so mirror sources can be verified.
89//!
90//! ## Other oddities about `Cargo.lock`
91//!
92//! There's a few other miscellaneous weird things about `Cargo.lock` that you
93//! may want to be aware of when reading this file:
94//!
95//! * All packages have a `source` listed to indicate where they come from. For
96//! `path` dependencies, however, no `source` is listed. There's no way we
97//! could emit a filesystem path name and have that be portable across
98//! systems, so all packages from a `path` are not listed with a `source`.
99//! Note that this also means that all packages with `path` sources must have
100//! unique names.
101//!
102//! * The `[metadata]` table in `Cargo.lock` is intended to be a generic mapping
103//! of strings to strings that's simply preserved by Cargo. This was a very
104//! early effort to be forward compatible against changes to `Cargo.lock`'s
105//! format. This is nowadays sort of deemed a bad idea though and we don't
106//! really use it that much except for `checksum`s historically. It's not
107//! really recommended to use this.
108//!
109//! * The actual literal on-disk serialization is found in
110//! `src/cargo/ops/lockfile.rs` which basically renders a `toml::Value` in a
111//! special fashion to make sure we have strict control over the on-disk
112//! format.
113
114use super::{Resolve, ResolveVersion};
115use crate::core::{Dependency, GitReference, Package, PackageId, SourceId, Workspace};
116use crate::util::errors::CargoResult;
117use crate::util::interning::InternedString;
118use crate::util::{Graph, internal};
119use anyhow::{Context as _, bail};
120use cargo_util_schemas::lockfile::{
121 TomlLockfile, TomlLockfileDependency, TomlLockfilePackageId, TomlLockfilePatch,
122 TomlLockfileSourceId,
123};
124use serde::ser;
125use std::collections::{HashMap, HashSet};
126use tracing::debug;
127
128/// Convert a `Cargo.lock` to a Resolve.
129///
130/// Note that this `Resolve` is not "complete". For example, the
131/// dependencies do not know the difference between regular/dev/build
132/// dependencies, so they are not filled in. It also does not include
133/// `features`. Care should be taken when using this Resolve. One of the
134/// primary uses is to be used with `resolve_with_previous` to guide the
135/// resolver to create a complete Resolve.
136pub fn into_resolve(
137 resolve: TomlLockfile,
138 original: &str,
139 ws: &Workspace<'_>,
140) -> CargoResult<Resolve> {
141 let path_deps: HashMap<String, HashMap<semver::Version, SourceId>> = build_path_deps(ws)?;
142 let mut checksums = HashMap::new();
143
144 let mut version = match resolve.version {
145 Some(n @ 5) if ws.gctx().nightly_features_allowed => {
146 if ws.gctx().cli_unstable().next_lockfile_bump {
147 ResolveVersion::V5
148 } else {
149 anyhow::bail!("lock file version `{n}` requires `-Znext-lockfile-bump`");
150 }
151 }
152 Some(4) => ResolveVersion::V4,
153 Some(3) => ResolveVersion::V3,
154 Some(n) => bail!(
155 "lock file version `{}` was found, but this version of Cargo \
156 does not understand this lock file, perhaps Cargo needs \
157 to be updated?",
158 n,
159 ),
160 // Historically Cargo did not have a version indicator in lock
161 // files, so this could either be the V1 or V2 encoding. We assume
162 // an older format is being parsed until we see so otherwise.
163 None => ResolveVersion::V1,
164 };
165
166 let packages = {
167 let mut packages = resolve.package.unwrap_or_default();
168 if let Some(root) = resolve.root {
169 packages.insert(0, root);
170 }
171 packages
172 };
173
174 // `PackageId`s in the lock file don't include the `source` part
175 // for workspace members, so we reconstruct proper IDs.
176 let live_pkgs = {
177 let mut live_pkgs = HashMap::new();
178 let mut all_pkgs = HashSet::new();
179 for pkg in packages.iter() {
180 let enc_id = TomlLockfilePackageId {
181 name: pkg.name.clone(),
182 version: Some(pkg.version.clone()),
183 source: pkg.source.clone(),
184 };
185
186 if !all_pkgs.insert(enc_id.clone()) {
187 anyhow::bail!("package `{}` is specified twice in the lockfile", pkg.name);
188 }
189 let id = match pkg
190 .source
191 .as_ref()
192 .map(|source| SourceId::from_url(&source.source_str()))
193 .transpose()?
194 .or_else(|| get_source_id(&path_deps, &pkg).copied())
195 {
196 // We failed to find a local package in the workspace.
197 // It must have been removed and should be ignored.
198 None => {
199 debug!("path dependency now missing {} v{}", pkg.name, pkg.version);
200 continue;
201 }
202 Some(source) => PackageId::try_new(&pkg.name, &pkg.version, source)?,
203 };
204
205 // If a package has a checksum listed directly on it then record
206 // that here, and we also bump our version up to 2 since V1
207 // didn't ever encode this field.
208 if let Some(cksum) = &pkg.checksum {
209 version = version.max(ResolveVersion::V2);
210 checksums.insert(id, Some(cksum.clone()));
211 }
212
213 assert!(live_pkgs.insert(enc_id, (id, pkg)).is_none())
214 }
215 live_pkgs
216 };
217
218 // When decoding a V2 version the edges in `dependencies` aren't
219 // guaranteed to have either version or source information. This `map`
220 // is used to find package ids even if dependencies have missing
221 // information. This map is from name to version to source to actual
222 // package ID. (various levels to drill down step by step)
223 let mut map = HashMap::new();
224 for (id, _) in live_pkgs.values() {
225 map.entry(id.name().as_str())
226 .or_insert_with(HashMap::new)
227 .entry(id.version().to_string())
228 .or_insert_with(HashMap::new)
229 .insert(id.source_id(), *id);
230 }
231
232 let mut lookup_id = |enc_id: &TomlLockfilePackageId| -> Option<PackageId> {
233 // The name of this package should always be in the larger list of
234 // all packages.
235 let by_version = map.get(enc_id.name.as_str())?;
236
237 // If the version is provided, look that up. Otherwise if the
238 // version isn't provided this is a V2 manifest and we should only
239 // have one version for this name. If we have more than one version
240 // for the name then it's ambiguous which one we'd use. That
241 // shouldn't ever actually happen but in theory bad git merges could
242 // produce invalid lock files, so silently ignore these cases.
243 let by_source = match &enc_id.version {
244 Some(version) => by_version.get(version)?,
245 None => {
246 version = version.max(ResolveVersion::V2);
247 if by_version.len() == 1 {
248 by_version.values().next().unwrap()
249 } else {
250 return None;
251 }
252 }
253 };
254
255 // This is basically the same as above. Note though that `source` is
256 // always missing for path dependencies regardless of serialization
257 // format. That means we have to handle the `None` case a bit more
258 // carefully.
259 match &enc_id.source {
260 Some(source) => by_source
261 .get(&SourceId::from_url(&source.source_str()).unwrap())
262 .cloned(),
263 None => {
264 // Look through all possible packages ids for this
265 // name/version. If there's only one `path` dependency then
266 // we are hardcoded to use that since `path` dependencies
267 // can't have a source listed.
268 let mut path_packages = by_source.values().filter(|p| p.source_id().is_path());
269 if let Some(path) = path_packages.next() {
270 if path_packages.next().is_some() {
271 return None;
272 }
273 Some(*path)
274
275 // ... otherwise if there's only one then we must be
276 // implicitly using that one due to a V2 serialization of
277 // the lock file
278 } else if by_source.len() == 1 {
279 let id = by_source.values().next().unwrap();
280 version = version.max(ResolveVersion::V2);
281 Some(*id)
282
283 // ... and failing that we probably had a bad git merge of
284 // `Cargo.lock` or something like that, so just ignore this.
285 } else {
286 None
287 }
288 }
289 }
290 };
291
292 let mut g = Graph::new();
293
294 for (id, _) in live_pkgs.values() {
295 g.add(*id);
296 }
297
298 for &(ref id, pkg) in live_pkgs.values() {
299 let Some(ref deps) = pkg.dependencies else {
300 continue;
301 };
302
303 for edge in deps.iter() {
304 if let Some(to_depend_on) = lookup_id(edge) {
305 g.link(*id, to_depend_on);
306 }
307 }
308 }
309
310 let replacements = {
311 let mut replacements = HashMap::new();
312 for &(ref id, pkg) in live_pkgs.values() {
313 if let Some(ref replace) = pkg.replace {
314 assert!(pkg.dependencies.is_none());
315 if let Some(replace_id) = lookup_id(replace) {
316 replacements.insert(*id, replace_id);
317 }
318 }
319 }
320 replacements
321 };
322
323 let mut metadata = resolve.metadata.unwrap_or_default();
324
325 // In the V1 serialization formats all checksums were listed in the lock
326 // file in the `[metadata]` section, so if we're still V1 then look for
327 // that here.
328 let prefix = "checksum ";
329 let mut to_remove = Vec::new();
330 for (k, v) in metadata.iter().filter(|p| p.0.starts_with(prefix)) {
331 to_remove.push(k.to_string());
332 let k = k.strip_prefix(prefix).unwrap();
333 let enc_id: TomlLockfilePackageId = k
334 .parse()
335 .with_context(|| internal("invalid encoding of checksum in lockfile"))?;
336 let Some(id) = lookup_id(&enc_id) else {
337 continue;
338 };
339
340 let v = if v == "<none>" {
341 None
342 } else {
343 Some(v.to_string())
344 };
345 checksums.insert(id, v);
346 }
347 // If `checksum` was listed in `[metadata]` but we were previously
348 // listed as `V2` then assume some sort of bad git merge happened, so
349 // discard all checksums and let's regenerate them later.
350 if !to_remove.is_empty() && version >= ResolveVersion::V2 {
351 checksums.drain();
352 }
353 for k in to_remove {
354 metadata.remove(&k);
355 }
356
357 let mut unused_patches = Vec::new();
358 for pkg in resolve.patch.unused {
359 let id = match pkg
360 .source
361 .as_ref()
362 .map(|source| SourceId::from_url(&source.source_str()))
363 .transpose()?
364 .or_else(|| get_source_id(&path_deps, &pkg).copied())
365 {
366 Some(src) => PackageId::try_new(&pkg.name, &pkg.version, src)?,
367 None => continue,
368 };
369 unused_patches.push(id);
370 }
371
372 // We have a curious issue where in the "v1 format" we buggily had a
373 // trailing blank line at the end of lock files under some specific
374 // conditions.
375 //
376 // Cargo is trying to write new lockfies in the "v2 format" but if you
377 // have no dependencies, for example, then the lockfile encoded won't
378 // really have any indicator that it's in the new format (no
379 // dependencies or checksums listed). This means that if you type `cargo
380 // new` followed by `cargo build` it will generate a "v2 format" lock
381 // file since none previously existed. When reading this on the next
382 // `cargo build`, however, it generates a new lock file because when
383 // reading in that lockfile we think it's the v1 format.
384 //
385 // To help fix this issue we special case here. If our lockfile only has
386 // one trailing newline, not two, *and* it only has one package, then
387 // this is actually the v2 format.
388 if original.ends_with('\n')
389 && !original.ends_with("\n\n")
390 && version == ResolveVersion::V1
391 && g.iter().count() == 1
392 {
393 version = ResolveVersion::V2;
394 }
395
396 return Ok(Resolve::new(
397 g,
398 replacements,
399 HashMap::new(),
400 checksums,
401 metadata,
402 unused_patches,
403 version,
404 HashMap::new(),
405 ));
406
407 fn get_source_id<'a>(
408 path_deps: &'a HashMap<String, HashMap<semver::Version, SourceId>>,
409 pkg: &'a TomlLockfileDependency,
410 ) -> Option<&'a SourceId> {
411 path_deps.iter().find_map(|(name, version_source)| {
412 if name != &pkg.name || version_source.len() == 0 {
413 return None;
414 }
415 if version_source.len() == 1 {
416 return Some(version_source.values().next().unwrap());
417 }
418 // If there are multiple candidates for the same name, it needs to be determined by combining versions (See #13405).
419 if let Ok(pkg_version) = pkg.version.parse::<semver::Version>() {
420 if let Some(source_id) = version_source.get(&pkg_version) {
421 return Some(source_id);
422 }
423 }
424
425 None
426 })
427 }
428}
429
430fn build_path_deps(
431 ws: &Workspace<'_>,
432) -> CargoResult<HashMap<String, HashMap<semver::Version, SourceId>>> {
433 // If a crate is **not** a path source, then we're probably in a situation
434 // such as `cargo install` with a lock file from a remote dependency. In
435 // that case we don't need to fixup any path dependencies (as they're not
436 // actually path dependencies any more), so we ignore them.
437 let members = ws
438 .members()
439 .filter(|p| p.package_id().source_id().is_path())
440 .collect::<Vec<_>>();
441
442 let mut ret: HashMap<String, HashMap<semver::Version, SourceId>> = HashMap::new();
443 let mut visited = HashSet::new();
444 for member in members.iter() {
445 ret.entry(member.package_id().name().to_string())
446 .or_insert_with(HashMap::new)
447 .insert(
448 member.package_id().version().clone(),
449 member.package_id().source_id(),
450 );
451 visited.insert(member.package_id().source_id());
452 }
453 for member in members.iter() {
454 build_pkg(member, ws, &mut ret, &mut visited);
455 }
456 for deps in ws.root_patch()?.values() {
457 for dep in deps {
458 build_dep(dep, ws, &mut ret, &mut visited);
459 }
460 }
461 for (_, dep) in ws.root_replace() {
462 build_dep(dep, ws, &mut ret, &mut visited);
463 }
464
465 return Ok(ret);
466
467 fn build_pkg(
468 pkg: &Package,
469 ws: &Workspace<'_>,
470 ret: &mut HashMap<String, HashMap<semver::Version, SourceId>>,
471 visited: &mut HashSet<SourceId>,
472 ) {
473 for dep in pkg.dependencies() {
474 build_dep(dep, ws, ret, visited);
475 }
476 }
477
478 fn build_dep(
479 dep: &Dependency,
480 ws: &Workspace<'_>,
481 ret: &mut HashMap<String, HashMap<semver::Version, SourceId>>,
482 visited: &mut HashSet<SourceId>,
483 ) {
484 let id = dep.source_id();
485 if visited.contains(&id) || !id.is_path() {
486 return;
487 }
488 let path = match id.url().to_file_path() {
489 Ok(p) => p.join("Cargo.toml"),
490 Err(_) => return,
491 };
492 let Ok(pkg) = ws.load(&path) else { return };
493 ret.entry(pkg.package_id().name().to_string())
494 .or_insert_with(HashMap::new)
495 .insert(
496 pkg.package_id().version().clone(),
497 pkg.package_id().source_id(),
498 );
499 visited.insert(pkg.package_id().source_id());
500 build_pkg(&pkg, ws, ret, visited);
501 }
502}
503
504impl ser::Serialize for Resolve {
505 #[tracing::instrument(skip_all)]
506 fn serialize<S>(&self, s: S) -> Result<S::Ok, S::Error>
507 where
508 S: ser::Serializer,
509 {
510 let mut ids: Vec<_> = self.iter().collect();
511 ids.sort();
512
513 let state = EncodeState::new(self);
514
515 let encodable = ids
516 .iter()
517 .map(|&id| encodable_resolve_node(id, self, &state))
518 .collect::<Vec<_>>();
519
520 let mut metadata = self.metadata().clone();
521
522 if self.version() == ResolveVersion::V1 {
523 for &id in ids.iter().filter(|id| !id.source_id().is_path()) {
524 let checksum = match self.checksums()[&id] {
525 Some(ref s) => &s[..],
526 None => "<none>",
527 };
528 let id = encodable_package_id(id, &state, self.version());
529 metadata.insert(format!("checksum {}", id), checksum.to_string());
530 }
531 }
532
533 let metadata = if metadata.is_empty() {
534 None
535 } else {
536 Some(metadata)
537 };
538
539 let patch = TomlLockfilePatch {
540 unused: self
541 .unused_patches()
542 .iter()
543 .map(|id| TomlLockfileDependency {
544 name: id.name().to_string(),
545 version: id.version().to_string(),
546 source: encodable_source_id(id.source_id(), self.version()),
547 dependencies: None,
548 replace: None,
549 checksum: if self.version() >= ResolveVersion::V2 {
550 self.checksums().get(id).and_then(|x| x.clone())
551 } else {
552 None
553 },
554 })
555 .collect(),
556 };
557 TomlLockfile {
558 package: Some(encodable),
559 root: None,
560 metadata,
561 patch,
562 version: match self.version() {
563 ResolveVersion::V5 => Some(5),
564 ResolveVersion::V4 => Some(4),
565 ResolveVersion::V3 => Some(3),
566 ResolveVersion::V2 | ResolveVersion::V1 => None,
567 },
568 }
569 .serialize(s)
570 }
571}
572
573pub struct EncodeState<'a> {
574 counts: Option<HashMap<InternedString, HashMap<&'a semver::Version, usize>>>,
575}
576
577impl<'a> EncodeState<'a> {
578 pub fn new(resolve: &'a Resolve) -> EncodeState<'a> {
579 let counts = if resolve.version() >= ResolveVersion::V2 {
580 let mut map = HashMap::new();
581 for id in resolve.iter() {
582 let slot = map
583 .entry(id.name())
584 .or_insert_with(HashMap::new)
585 .entry(id.version())
586 .or_insert(0);
587 *slot += 1;
588 }
589 Some(map)
590 } else {
591 None
592 };
593 EncodeState { counts }
594 }
595}
596
597fn encodable_resolve_node(
598 id: PackageId,
599 resolve: &Resolve,
600 state: &EncodeState<'_>,
601) -> TomlLockfileDependency {
602 let (replace, deps) = match resolve.replacement(id) {
603 Some(id) => (
604 Some(encodable_package_id(id, state, resolve.version())),
605 None,
606 ),
607 None => {
608 let mut deps = resolve
609 .deps_not_replaced(id)
610 .map(|(id, _)| encodable_package_id(id, state, resolve.version()))
611 .collect::<Vec<_>>();
612 deps.sort();
613 (None, Some(deps))
614 }
615 };
616
617 TomlLockfileDependency {
618 name: id.name().to_string(),
619 version: id.version().to_string(),
620 source: encodable_source_id(id.source_id(), resolve.version()),
621 dependencies: deps,
622 replace,
623 checksum: if resolve.version() >= ResolveVersion::V2 {
624 resolve.checksums().get(&id).and_then(|s| s.clone())
625 } else {
626 None
627 },
628 }
629}
630
631pub fn encodable_package_id(
632 id: PackageId,
633 state: &EncodeState<'_>,
634 resolve_version: ResolveVersion,
635) -> TomlLockfilePackageId {
636 let mut version = Some(id.version().to_string());
637 let mut id_to_encode = id.source_id();
638 if resolve_version <= ResolveVersion::V2 {
639 if let Some(GitReference::Branch(b)) = id_to_encode.git_reference() {
640 if b == "master" {
641 id_to_encode =
642 SourceId::for_git(id_to_encode.url(), GitReference::DefaultBranch).unwrap();
643 }
644 }
645 }
646 let mut source = encodable_source_id(id_to_encode.without_precise(), resolve_version);
647 if let Some(counts) = &state.counts {
648 let version_counts = &counts[&id.name()];
649 if version_counts[&id.version()] == 1 {
650 source = None;
651 if version_counts.len() == 1 {
652 version = None;
653 }
654 }
655 }
656 TomlLockfilePackageId {
657 name: id.name().to_string(),
658 version,
659 source,
660 }
661}
662
663fn encodable_source_id(id: SourceId, version: ResolveVersion) -> Option<TomlLockfileSourceId> {
664 if id.is_path() {
665 None
666 } else {
667 Some(
668 if version >= ResolveVersion::V4 {
669 TomlLockfileSourceId::new(id.as_encoded_url().to_string())
670 } else {
671 TomlLockfileSourceId::new(id.as_url().to_string())
672 }
673 .expect("source ID should have valid URLs"),
674 )
675 }
676}