cargo/sources/registry/
mod.rs

1//! A `Source` for registry-based packages.
2//!
3//! # What's a Registry?
4//!
5//! [Registries] are central locations where packages can be uploaded to,
6//! discovered, and searched for. The purpose of a registry is to have a
7//! location that serves as permanent storage for versions of a crate over time.
8//!
9//! Compared to git sources (see [`GitSource`]), a registry provides many
10//! packages as well as many versions simultaneously. Git sources can also
11//! have commits deleted through rebasings where registries cannot have their
12//! versions deleted.
13//!
14//! In Cargo, [`RegistryData`] is an abstraction over each kind of actual
15//! registry, and [`RegistrySource`] connects those implementations to
16//! [`Source`] trait. Two prominent features these abstractions provide are
17//!
18//! * A way to query the metadata of a package from a registry. The metadata
19//!   comes from the index.
20//! * A way to download package contents (a.k.a source files) that are required
21//!   when building the package itself.
22//!
23//! We'll cover each functionality later.
24//!
25//! [Registries]: https://doc.rust-lang.org/nightly/cargo/reference/registries.html
26//! [`GitSource`]: super::GitSource
27//!
28//! # Different Kinds of Registries
29//!
30//! Cargo provides multiple kinds of registries. Each of them serves the index
31//! and package contents in a slightly different way. Namely,
32//!
33//! * [`LocalRegistry`] --- Serves the index and package contents entirely on
34//!   a local filesystem.
35//! * [`RemoteRegistry`] --- Serves the index ahead of time from a Git
36//!   repository, and package contents are downloaded as needed.
37//! * [`HttpRegistry`] --- Serves both the index and package contents on demand
38//!   over a HTTP-based registry API. This is the default starting from 1.70.0.
39//!
40//! Each registry has its own [`RegistryData`] implementation, and can be
41//! created from either [`RegistrySource::local`] or [`RegistrySource::remote`].
42//!
43//! [`LocalRegistry`]: local::LocalRegistry
44//! [`RemoteRegistry`]: remote::RemoteRegistry
45//! [`HttpRegistry`]: http_remote::HttpRegistry
46//!
47//! # The Index of a Registry
48//!
49//! One of the major difficulties with a registry is that hosting so many
50//! packages may quickly run into performance problems when dealing with
51//! dependency graphs. It's infeasible for cargo to download the entire contents
52//! of the registry just to resolve one package's dependencies, for example. As
53//! a result, cargo needs some efficient method of querying what packages are
54//! available on a registry, what versions are available, and what the
55//! dependencies for each version is.
56//!
57//! To solve the problem, a registry must provide an index of package metadata.
58//! The index of a registry is essentially an easily query-able version of the
59//! registry's database for a list of versions of a package as well as a list
60//! of dependencies for each version. The exact format of the index is
61//! described later.
62//!
63//! See the [`index`] module for topics about the management, parsing, caching,
64//! and versioning for the on-disk index.
65//!
66//! ## The Format of The Index
67//!
68//! The index is a store for the list of versions for all packages known, so its
69//! format on disk is optimized slightly to ensure that `ls registry` doesn't
70//! produce a list of all packages ever known. The index also wants to ensure
71//! that there's not a million files which may actually end up hitting
72//! filesystem limits at some point. To this end, a few decisions were made
73//! about the format of the registry:
74//!
75//! 1. Each crate will have one file corresponding to it. Each version for a
76//!    crate will just be a line in this file (see [`IndexPackage`] for its
77//!    representation).
78//! 2. There will be two tiers of directories for crate names, under which
79//!    crates corresponding to those tiers will be located.
80//!    (See [`cargo_util::registry::make_dep_path`] for the implementation of
81//!    this layout hierarchy.)
82//!
83//! As an example, this is an example hierarchy of an index:
84//!
85//! ```notrust
86//! .
87//! ├── 3
88//! │   └── u
89//! │       └── url
90//! ├── bz
91//! │   └── ip
92//! │       └── bzip2
93//! ├── config.json
94//! ├── en
95//! │   └── co
96//! │       └── encoding
97//! └── li
98//!     ├── bg
99//!     │   └── libgit2
100//!     └── nk
101//!         └── link-config
102//! ```
103//!
104//! The root of the index contains a `config.json` file with a few entries
105//! corresponding to the registry (see [`RegistryConfig`] below).
106//!
107//! Otherwise, there are three numbered directories (1, 2, 3) for crates with
108//! names 1, 2, and 3 characters in length. The 1/2 directories simply have the
109//! crate files underneath them, while the 3 directory is sharded by the first
110//! letter of the crate name.
111//!
112//! Otherwise the top-level directory contains many two-letter directory names,
113//! each of which has many sub-folders with two letters. At the end of all these
114//! are the actual crate files themselves.
115//!
116//! The purpose of this layout is to hopefully cut down on `ls` sizes as well as
117//! efficient lookup based on the crate name itself.
118//!
119//! See [The Cargo Book: Registry Index][registry-index] for the public
120//! interface on the index format.
121//!
122//! [registry-index]: https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html
123//!
124//! ## The Index Files
125//!
126//! Each file in the index is the history of one crate over time. Each line in
127//! the file corresponds to one version of a crate, stored in JSON format (see
128//! the [`IndexPackage`] structure).
129//!
130//! As new versions are published, new lines are appended to this file. **The
131//! only modifications to this file that should happen over time are yanks of a
132//! particular version.**
133//!
134//! # Downloading Packages
135//!
136//! The purpose of the index was to provide an efficient method to resolve the
137//! dependency graph for a package. After resolution has been performed, we need
138//! to download the contents of packages so we can read the full manifest and
139//! build the source code.
140//!
141//! To accomplish this, [`RegistryData::download`] will "make" an HTTP request
142//! per-package requested to download tarballs into a local cache. These
143//! tarballs will then be unpacked into a destination folder.
144//!
145//! Note that because versions uploaded to the registry are frozen forever that
146//! the HTTP download and unpacking can all be skipped if the version has
147//! already been downloaded and unpacked. This caching allows us to only
148//! download a package when absolutely necessary.
149//!
150//! # Filesystem Hierarchy
151//!
152//! Overall, the `$HOME/.cargo` looks like this when talking about the registry
153//! (remote registries, specifically):
154//!
155//! ```notrust
156//! # A folder under which all registry metadata is hosted (similar to
157//! # $HOME/.cargo/git)
158//! $HOME/.cargo/registry/
159//!
160//!     # For each registry that cargo knows about (keyed by hostname + hash)
161//!     # there is a folder which is the checked out version of the index for
162//!     # the registry in this location. Note that this is done so cargo can
163//!     # support multiple registries simultaneously
164//!     index/
165//!         registry1-<hash>/
166//!         registry2-<hash>/
167//!         ...
168//!
169//!     # This folder is a cache for all downloaded tarballs (`.crate` file)
170//!     # from a registry. Once downloaded and verified, a tarball never changes.
171//!     cache/
172//!         registry1-<hash>/<pkg>-<version>.crate
173//!         ...
174//!
175//!     # Location in which all tarballs are unpacked. Each tarball is known to
176//!     # be frozen after downloading, so transitively this folder is also
177//!     # frozen once its unpacked (it's never unpacked again)
178//!     # CAVEAT: They are not read-only. See rust-lang/cargo#9455.
179//!     src/
180//!         registry1-<hash>/<pkg>-<version>/...
181//!         ...
182//! ```
183//!
184//! [`IndexPackage`]: index::IndexPackage
185
186use std::collections::HashSet;
187use std::fs;
188use std::fs::{File, OpenOptions};
189use std::io;
190use std::io::Read;
191use std::io::Write;
192use std::path::{Path, PathBuf};
193use std::task::{ready, Poll};
194
195use anyhow::Context as _;
196use cargo_util::paths::{self, exclude_from_backups_and_indexing};
197use flate2::read::GzDecoder;
198use serde::Deserialize;
199use serde::Serialize;
200use tar::Archive;
201use tracing::debug;
202
203use crate::core::dependency::Dependency;
204use crate::core::global_cache_tracker;
205use crate::core::{Package, PackageId, SourceId};
206use crate::sources::source::MaybePackage;
207use crate::sources::source::QueryKind;
208use crate::sources::source::Source;
209use crate::sources::PathSource;
210use crate::util::cache_lock::CacheLockMode;
211use crate::util::interning::InternedString;
212use crate::util::network::PollExt;
213use crate::util::{hex, VersionExt};
214use crate::util::{restricted_names, CargoResult, Filesystem, GlobalContext, LimitErrorReader};
215
216/// The `.cargo-ok` file is used to track if the source is already unpacked.
217/// See [`RegistrySource::unpack_package`] for more.
218///
219/// Not to be confused with `.cargo-ok` file in git sources.
220const PACKAGE_SOURCE_LOCK: &str = ".cargo-ok";
221
222pub const CRATES_IO_INDEX: &str = "https://github.com/rust-lang/crates.io-index";
223pub const CRATES_IO_HTTP_INDEX: &str = "sparse+https://index.crates.io/";
224pub const CRATES_IO_REGISTRY: &str = "crates-io";
225pub const CRATES_IO_DOMAIN: &str = "crates.io";
226
227/// The content inside `.cargo-ok`.
228/// See [`RegistrySource::unpack_package`] for more.
229#[derive(Deserialize, Serialize)]
230#[serde(rename_all = "kebab-case")]
231struct LockMetadata {
232    /// The version of `.cargo-ok` file
233    v: u32,
234}
235
236/// A [`Source`] implementation for a local or a remote registry.
237///
238/// This contains common functionality that is shared between each registry
239/// kind, with the registry-specific logic implemented as part of the
240/// [`RegistryData`] trait referenced via the `ops` field.
241///
242/// For general concepts of registries, see the [module-level documentation](crate::sources::registry).
243pub struct RegistrySource<'gctx> {
244    /// A unique name of the source (typically used as the directory name
245    /// where its cached content is stored).
246    name: InternedString,
247    /// The unique identifier of this source.
248    source_id: SourceId,
249    /// The path where crate files are extracted (`$CARGO_HOME/registry/src/$REG-HASH`).
250    src_path: Filesystem,
251    /// Local reference to [`GlobalContext`] for convenience.
252    gctx: &'gctx GlobalContext,
253    /// Abstraction for interfacing to the different registry kinds.
254    ops: Box<dyn RegistryData + 'gctx>,
255    /// Interface for managing the on-disk index.
256    index: index::RegistryIndex<'gctx>,
257    /// A set of packages that should be allowed to be used, even if they are
258    /// yanked.
259    ///
260    /// This is populated from the entries in `Cargo.lock` to ensure that
261    /// `cargo update somepkg` won't unlock yanked entries in `Cargo.lock`.
262    /// Otherwise, the resolver would think that those entries no longer
263    /// exist, and it would trigger updates to unrelated packages.
264    yanked_whitelist: HashSet<PackageId>,
265    /// Yanked versions that have already been selected during queries.
266    ///
267    /// As of this writing, this is for not emitting the `--precise <yanked>`
268    /// warning twice, with the assumption of (`dep.package_name()` + `--precise`
269    /// version) being sufficient to uniquely identify the same query result.
270    selected_precise_yanked: HashSet<(InternedString, semver::Version)>,
271}
272
273/// The [`config.json`] file stored in the index.
274///
275/// The config file may look like:
276///
277/// ```json
278/// {
279///     "dl": "https://example.com/api/{crate}/{version}/download",
280///     "api": "https://example.com/api",
281///     "auth-required": false             # unstable feature (RFC 3139)
282/// }
283/// ```
284///
285/// [`config.json`]: https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html#index-configuration
286#[derive(Deserialize, Debug, Clone)]
287#[serde(rename_all = "kebab-case")]
288pub struct RegistryConfig {
289    /// Download endpoint for all crates.
290    ///
291    /// The string is a template which will generate the download URL for the
292    /// tarball of a specific version of a crate. The substrings `{crate}` and
293    /// `{version}` will be replaced with the crate's name and version
294    /// respectively.  The substring `{prefix}` will be replaced with the
295    /// crate's prefix directory name, and the substring `{lowerprefix}` will
296    /// be replaced with the crate's prefix directory name converted to
297    /// lowercase. The substring `{sha256-checksum}` will be replaced with the
298    /// crate's sha256 checksum.
299    ///
300    /// For backwards compatibility, if the string does not contain any
301    /// markers (`{crate}`, `{version}`, `{prefix}`, or `{lowerprefix}`), it
302    /// will be extended with `/{crate}/{version}/download` to
303    /// support registries like crates.io which were created before the
304    /// templating setup was created.
305    ///
306    /// For more on the template of the download URL, see [Index Configuration](
307    /// https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html#index-configuration).
308    pub dl: String,
309
310    /// API endpoint for the registry. This is what's actually hit to perform
311    /// operations like yanks, owner modifications, publish new crates, etc.
312    /// If this is None, the registry does not support API commands.
313    pub api: Option<String>,
314
315    /// Whether all operations require authentication. See [RFC 3139].
316    ///
317    /// [RFC 3139]: https://rust-lang.github.io/rfcs/3139-cargo-alternative-registry-auth.html
318    #[serde(default)]
319    pub auth_required: bool,
320}
321
322/// Result from loading data from a registry.
323pub enum LoadResponse {
324    /// The cache is valid. The cached data should be used.
325    CacheValid,
326
327    /// The cache is out of date. Returned data should be used.
328    Data {
329        raw_data: Vec<u8>,
330        /// Version of this data to determine whether it is out of date.
331        index_version: Option<String>,
332    },
333
334    /// The requested crate was found.
335    NotFound,
336}
337
338/// An abstract interface to handle both a local and remote registry.
339///
340/// This allows [`RegistrySource`] to abstractly handle each registry kind.
341///
342/// For general concepts of registries, see the [module-level documentation](crate::sources::registry).
343pub trait RegistryData {
344    /// Performs initialization for the registry.
345    ///
346    /// This should be safe to call multiple times, the implementation is
347    /// expected to not do any work if it is already prepared.
348    fn prepare(&self) -> CargoResult<()>;
349
350    /// Returns the path to the index.
351    ///
352    /// Note that different registries store the index in different formats
353    /// (remote = git, http & local = files).
354    fn index_path(&self) -> &Filesystem;
355
356    /// Loads the JSON for a specific named package from the index.
357    ///
358    /// * `root` is the root path to the index.
359    /// * `path` is the relative path to the package to load (like `ca/rg/cargo`).
360    /// * `index_version` is the version of the requested crate data currently
361    ///    in cache. This is useful for checking if a local cache is outdated.
362    fn load(
363        &mut self,
364        root: &Path,
365        path: &Path,
366        index_version: Option<&str>,
367    ) -> Poll<CargoResult<LoadResponse>>;
368
369    /// Loads the `config.json` file and returns it.
370    ///
371    /// Local registries don't have a config, and return `None`.
372    fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>>;
373
374    /// Invalidates locally cached data.
375    fn invalidate_cache(&mut self);
376
377    /// If quiet, the source should not display any progress or status messages.
378    fn set_quiet(&mut self, quiet: bool);
379
380    /// Is the local cached data up-to-date?
381    fn is_updated(&self) -> bool;
382
383    /// Prepare to start downloading a `.crate` file.
384    ///
385    /// Despite the name, this doesn't actually download anything. If the
386    /// `.crate` is already downloaded, then it returns [`MaybeLock::Ready`].
387    /// If it hasn't been downloaded, then it returns [`MaybeLock::Download`]
388    /// which contains the URL to download. The [`crate::core::package::Downloads`]
389    /// system handles the actual download process. After downloading, it
390    /// calls [`Self::finish_download`] to save the downloaded file.
391    ///
392    /// `checksum` is currently only used by local registries to verify the
393    /// file contents (because local registries never actually download
394    /// anything). Remote registries will validate the checksum in
395    /// `finish_download`. For already downloaded `.crate` files, it does not
396    /// validate the checksum, assuming the filesystem does not suffer from
397    /// corruption or manipulation.
398    fn download(&mut self, pkg: PackageId, checksum: &str) -> CargoResult<MaybeLock>;
399
400    /// Finish a download by saving a `.crate` file to disk.
401    ///
402    /// After [`crate::core::package::Downloads`] has finished a download,
403    /// it will call this to save the `.crate` file. This is only relevant
404    /// for remote registries. This should validate the checksum and save
405    /// the given data to the on-disk cache.
406    ///
407    /// Returns a [`File`] handle to the `.crate` file, positioned at the start.
408    fn finish_download(&mut self, pkg: PackageId, checksum: &str, data: &[u8])
409        -> CargoResult<File>;
410
411    /// Returns whether or not the `.crate` file is already downloaded.
412    fn is_crate_downloaded(&self, _pkg: PackageId) -> bool {
413        true
414    }
415
416    /// Validates that the global package cache lock is held.
417    ///
418    /// Given the [`Filesystem`], this will make sure that the package cache
419    /// lock is held. If not, it will panic. See
420    /// [`GlobalContext::acquire_package_cache_lock`] for acquiring the global lock.
421    ///
422    /// Returns the [`Path`] to the [`Filesystem`].
423    fn assert_index_locked<'a>(&self, path: &'a Filesystem) -> &'a Path;
424
425    /// Block until all outstanding `Poll::Pending` requests are `Poll::Ready`.
426    fn block_until_ready(&mut self) -> CargoResult<()>;
427}
428
429/// The status of [`RegistryData::download`] which indicates if a `.crate`
430/// file has already been downloaded, or if not then the URL to download.
431pub enum MaybeLock {
432    /// The `.crate` file is already downloaded. [`File`] is a handle to the
433    /// opened `.crate` file on the filesystem.
434    Ready(File),
435    /// The `.crate` file is not downloaded, here's the URL to download it from.
436    ///
437    /// `descriptor` is just a text string to display to the user of what is
438    /// being downloaded.
439    Download {
440        url: String,
441        descriptor: String,
442        authorization: Option<String>,
443    },
444}
445
446mod download;
447mod http_remote;
448pub(crate) mod index;
449pub use index::IndexSummary;
450mod local;
451mod remote;
452
453/// Generates a unique name for [`SourceId`] to have a unique path to put their
454/// index files.
455fn short_name(id: SourceId, is_shallow: bool) -> String {
456    // CAUTION: This should not change between versions. If you change how
457    // this is computed, it will orphan previously cached data, forcing the
458    // cache to be rebuilt and potentially wasting significant disk space. If
459    // you change it, be cautious of the impact. See `test_cratesio_hash` for
460    // a similar discussion.
461    let hash = hex::short_hash(&id);
462    let ident = id.url().host_str().unwrap_or("").to_string();
463    let mut name = format!("{}-{}", ident, hash);
464    if is_shallow {
465        name.push_str("-shallow");
466    }
467    name
468}
469
470impl<'gctx> RegistrySource<'gctx> {
471    /// Creates a [`Source`] of a "remote" registry.
472    /// It could be either an HTTP-based [`http_remote::HttpRegistry`] or
473    /// a Git-based [`remote::RemoteRegistry`].
474    ///
475    /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
476    pub fn remote(
477        source_id: SourceId,
478        yanked_whitelist: &HashSet<PackageId>,
479        gctx: &'gctx GlobalContext,
480    ) -> CargoResult<RegistrySource<'gctx>> {
481        assert!(source_id.is_remote_registry());
482        let name = short_name(
483            source_id,
484            gctx.cli_unstable()
485                .git
486                .map_or(false, |features| features.shallow_index)
487                && !source_id.is_sparse(),
488        );
489        let ops = if source_id.is_sparse() {
490            Box::new(http_remote::HttpRegistry::new(source_id, gctx, &name)?) as Box<_>
491        } else {
492            Box::new(remote::RemoteRegistry::new(source_id, gctx, &name)) as Box<_>
493        };
494
495        Ok(RegistrySource::new(
496            source_id,
497            gctx,
498            &name,
499            ops,
500            yanked_whitelist,
501        ))
502    }
503
504    /// Creates a [`Source`] of a local registry, with [`local::LocalRegistry`] under the hood.
505    ///
506    /// * `path` --- The root path of a local registry on the file system.
507    /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
508    pub fn local(
509        source_id: SourceId,
510        path: &Path,
511        yanked_whitelist: &HashSet<PackageId>,
512        gctx: &'gctx GlobalContext,
513    ) -> RegistrySource<'gctx> {
514        let name = short_name(source_id, false);
515        let ops = local::LocalRegistry::new(path, gctx, &name);
516        RegistrySource::new(source_id, gctx, &name, Box::new(ops), yanked_whitelist)
517    }
518
519    /// Creates a source of a registry. This is a inner helper function.
520    ///
521    /// * `name` --- Name of a path segment which may affect where `.crate`
522    ///   tarballs, the registry index and cache are stored. Expect to be unique.
523    /// * `ops` --- The underlying [`RegistryData`] type.
524    /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
525    fn new(
526        source_id: SourceId,
527        gctx: &'gctx GlobalContext,
528        name: &str,
529        ops: Box<dyn RegistryData + 'gctx>,
530        yanked_whitelist: &HashSet<PackageId>,
531    ) -> RegistrySource<'gctx> {
532        RegistrySource {
533            name: name.into(),
534            src_path: gctx.registry_source_path().join(name),
535            gctx,
536            source_id,
537            index: index::RegistryIndex::new(source_id, ops.index_path(), gctx),
538            yanked_whitelist: yanked_whitelist.clone(),
539            ops,
540            selected_precise_yanked: HashSet::new(),
541        }
542    }
543
544    /// Decode the [configuration](RegistryConfig) stored within the registry.
545    ///
546    /// This requires that the index has been at least checked out.
547    pub fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>> {
548        self.ops.config()
549    }
550
551    /// Unpacks a downloaded package into a location where it's ready to be
552    /// compiled.
553    ///
554    /// No action is taken if the source looks like it's already unpacked.
555    ///
556    /// # History of interruption detection with `.cargo-ok` file
557    ///
558    /// Cargo has always included a `.cargo-ok` file ([`PACKAGE_SOURCE_LOCK`])
559    /// to detect if extraction was interrupted, but it was originally empty.
560    ///
561    /// In 1.34, Cargo was changed to create the `.cargo-ok` file before it
562    /// started extraction to implement fine-grained locking. After it was
563    /// finished extracting, it wrote two bytes to indicate it was complete.
564    /// It would use the length check to detect if it was possibly interrupted.
565    ///
566    /// In 1.36, Cargo changed to not use fine-grained locking, and instead used
567    /// a global lock. The use of `.cargo-ok` was no longer needed for locking
568    /// purposes, but was kept to detect when extraction was interrupted.
569    ///
570    /// In 1.49, Cargo changed to not create the `.cargo-ok` file before it
571    /// started extraction to deal with `.crate` files that inexplicably had
572    /// a `.cargo-ok` file in them.
573    ///
574    /// In 1.64, Cargo changed to detect `.crate` files with `.cargo-ok` files
575    /// in them in response to [CVE-2022-36113], which dealt with malicious
576    /// `.crate` files making `.cargo-ok` a symlink causing cargo to write "ok"
577    /// to any arbitrary file on the filesystem it has permission to.
578    ///
579    /// In 1.71, `.cargo-ok` changed to contain a JSON `{ v: 1 }` to indicate
580    /// the version of it. A failure of parsing will result in a heavy-hammer
581    /// approach that unpacks the `.crate` file again. This is in response to a
582    /// security issue that the unpacking didn't respect umask on Unix systems.
583    ///
584    /// This is all a long-winded way of explaining the circumstances that might
585    /// cause a directory to contain a `.cargo-ok` file that is empty or
586    /// otherwise corrupted. Either this was extracted by a version of Rust
587    /// before 1.34, in which case everything should be fine. However, an empty
588    /// file created by versions 1.36 to 1.49 indicates that the extraction was
589    /// interrupted and that we need to start again.
590    ///
591    /// Another possibility is that the filesystem is simply corrupted, in
592    /// which case deleting the directory might be the safe thing to do. That
593    /// is probably unlikely, though.
594    ///
595    /// To be safe, we deletes the directory and starts over again if an empty
596    /// `.cargo-ok` file is found.
597    ///
598    /// [CVE-2022-36113]: https://blog.rust-lang.org/2022/09/14/cargo-cves.html#arbitrary-file-corruption-cve-2022-36113
599    fn unpack_package(&self, pkg: PackageId, tarball: &File) -> CargoResult<PathBuf> {
600        let package_dir = format!("{}-{}", pkg.name(), pkg.version());
601        let dst = self.src_path.join(&package_dir);
602        let path = dst.join(PACKAGE_SOURCE_LOCK);
603        let path = self
604            .gctx
605            .assert_package_cache_locked(CacheLockMode::DownloadExclusive, &path);
606        let unpack_dir = path.parent().unwrap();
607        match fs::read_to_string(path) {
608            Ok(ok) => match serde_json::from_str::<LockMetadata>(&ok) {
609                Ok(lock_meta) if lock_meta.v == 1 => {
610                    self.gctx
611                        .deferred_global_last_use()?
612                        .mark_registry_src_used(global_cache_tracker::RegistrySrc {
613                            encoded_registry_name: self.name,
614                            package_dir: package_dir.into(),
615                            size: None,
616                        });
617                    return Ok(unpack_dir.to_path_buf());
618                }
619                _ => {
620                    if ok == "ok" {
621                        tracing::debug!("old `ok` content found, clearing cache");
622                    } else {
623                        tracing::warn!("unrecognized .cargo-ok content, clearing cache: {ok}");
624                    }
625                    // See comment of `unpack_package` about why removing all stuff.
626                    paths::remove_dir_all(dst.as_path_unlocked())?;
627                }
628            },
629            Err(e) if e.kind() == io::ErrorKind::NotFound => {}
630            Err(e) => anyhow::bail!("unable to read .cargo-ok file at {path:?}: {e}"),
631        }
632        dst.create_dir()?;
633        let mut tar = {
634            let size_limit = max_unpack_size(self.gctx, tarball.metadata()?.len());
635            let gz = GzDecoder::new(tarball);
636            let gz = LimitErrorReader::new(gz, size_limit);
637            let mut tar = Archive::new(gz);
638            set_mask(&mut tar);
639            tar
640        };
641        let mut bytes_written = 0;
642        let prefix = unpack_dir.file_name().unwrap();
643        let parent = unpack_dir.parent().unwrap();
644        for entry in tar.entries()? {
645            let mut entry = entry.context("failed to iterate over archive")?;
646            let entry_path = entry
647                .path()
648                .context("failed to read entry path")?
649                .into_owned();
650
651            // We're going to unpack this tarball into the global source
652            // directory, but we want to make sure that it doesn't accidentally
653            // (or maliciously) overwrite source code from other crates. Cargo
654            // itself should never generate a tarball that hits this error, and
655            // crates.io should also block uploads with these sorts of tarballs,
656            // but be extra sure by adding a check here as well.
657            if !entry_path.starts_with(prefix) {
658                anyhow::bail!(
659                    "invalid tarball downloaded, contains \
660                     a file at {:?} which isn't under {:?}",
661                    entry_path,
662                    prefix
663                )
664            }
665            // Prevent unpacking the lockfile from the crate itself.
666            if entry_path
667                .file_name()
668                .map_or(false, |p| p == PACKAGE_SOURCE_LOCK)
669            {
670                continue;
671            }
672            // Unpacking failed
673            bytes_written += entry.size();
674            let mut result = entry.unpack_in(parent).map_err(anyhow::Error::from);
675            if cfg!(windows) && restricted_names::is_windows_reserved_path(&entry_path) {
676                result = result.with_context(|| {
677                    format!(
678                        "`{}` appears to contain a reserved Windows path, \
679                        it cannot be extracted on Windows",
680                        entry_path.display()
681                    )
682                });
683            }
684            result
685                .with_context(|| format!("failed to unpack entry at `{}`", entry_path.display()))?;
686        }
687
688        // Now that we've finished unpacking, create and write to the lock file to indicate that
689        // unpacking was successful.
690        let mut ok = OpenOptions::new()
691            .create_new(true)
692            .read(true)
693            .write(true)
694            .open(&path)
695            .with_context(|| format!("failed to open `{}`", path.display()))?;
696
697        let lock_meta = LockMetadata { v: 1 };
698        write!(ok, "{}", serde_json::to_string(&lock_meta).unwrap())?;
699
700        self.gctx
701            .deferred_global_last_use()?
702            .mark_registry_src_used(global_cache_tracker::RegistrySrc {
703                encoded_registry_name: self.name,
704                package_dir: package_dir.into(),
705                size: Some(bytes_written),
706            });
707
708        Ok(unpack_dir.to_path_buf())
709    }
710
711    /// Turns the downloaded `.crate` tarball file into a [`Package`].
712    ///
713    /// This unconditionally sets checksum for the returned package, so it
714    /// should only be called after doing integrity check. That is to say,
715    /// you need to call either [`RegistryData::download`] or
716    /// [`RegistryData::finish_download`] before calling this method.
717    fn get_pkg(&mut self, package: PackageId, path: &File) -> CargoResult<Package> {
718        let path = self
719            .unpack_package(package, path)
720            .with_context(|| format!("failed to unpack package `{}`", package))?;
721        let mut src = PathSource::new(&path, self.source_id, self.gctx);
722        src.load()?;
723        let mut pkg = match src.download(package)? {
724            MaybePackage::Ready(pkg) => pkg,
725            MaybePackage::Download { .. } => unreachable!(),
726        };
727
728        // After we've loaded the package configure its summary's `checksum`
729        // field with the checksum we know for this `PackageId`.
730        let cksum = self
731            .index
732            .hash(package, &mut *self.ops)
733            .expect("a downloaded dep now pending!?")
734            .expect("summary not found");
735        pkg.manifest_mut()
736            .summary_mut()
737            .set_checksum(cksum.to_string());
738
739        Ok(pkg)
740    }
741}
742
743impl<'gctx> Source for RegistrySource<'gctx> {
744    fn query(
745        &mut self,
746        dep: &Dependency,
747        kind: QueryKind,
748        f: &mut dyn FnMut(IndexSummary),
749    ) -> Poll<CargoResult<()>> {
750        let mut req = dep.version_req().clone();
751
752        // Handle `cargo update --precise` here.
753        if let Some((_, requested)) = self
754            .source_id
755            .precise_registry_version(dep.package_name().as_str())
756            .filter(|(c, to)| {
757                if to.is_prerelease() && self.gctx.cli_unstable().unstable_options {
758                    req.matches_prerelease(c)
759                } else {
760                    req.matches(c)
761                }
762            })
763        {
764            req.precise_to(&requested);
765        }
766
767        let mut called = false;
768        let callback = &mut |s| {
769            called = true;
770            f(s);
771        };
772
773        // If this is a locked dependency, then it came from a lock file and in
774        // theory the registry is known to contain this version. If, however, we
775        // come back with no summaries, then our registry may need to be
776        // updated, so we fall back to performing a lazy update.
777        if kind == QueryKind::Exact && req.is_locked() && !self.ops.is_updated() {
778            debug!("attempting query without update");
779            ready!(self
780                .index
781                .query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| {
782                    if matches!(s, IndexSummary::Candidate(_) | IndexSummary::Yanked(_))
783                        && dep.matches(s.as_summary())
784                    {
785                        // We are looking for a package from a lock file so we do not care about yank
786                        callback(s)
787                    }
788                },))?;
789            if called {
790                Poll::Ready(Ok(()))
791            } else {
792                debug!("falling back to an update");
793                self.invalidate_cache();
794                Poll::Pending
795            }
796        } else {
797            let mut precise_yanked_in_use = false;
798            ready!(self
799                .index
800                .query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| {
801                    let matched = match kind {
802                        QueryKind::Exact | QueryKind::RejectedVersions => {
803                            if req.is_precise() && self.gctx.cli_unstable().unstable_options {
804                                dep.matches_prerelease(s.as_summary())
805                            } else {
806                                dep.matches(s.as_summary())
807                            }
808                        }
809                        QueryKind::AlternativeNames => true,
810                        QueryKind::Normalized => true,
811                    };
812                    if !matched {
813                        return;
814                    }
815                    // Next filter out all yanked packages. Some yanked packages may
816                    // leak through if they're in a whitelist (aka if they were
817                    // previously in `Cargo.lock`
818                    match s {
819                        s @ _ if kind == QueryKind::RejectedVersions => callback(s),
820                        s @ IndexSummary::Candidate(_) => callback(s),
821                        s @ IndexSummary::Yanked(_) => {
822                            if self.yanked_whitelist.contains(&s.package_id()) {
823                                callback(s);
824                            } else if req.is_precise() {
825                                precise_yanked_in_use = true;
826                                callback(s);
827                            }
828                        }
829                        IndexSummary::Unsupported(summary, v) => {
830                            tracing::debug!(
831                                "unsupported schema version {} ({} {})",
832                                v,
833                                summary.name(),
834                                summary.version()
835                            );
836                        }
837                        IndexSummary::Invalid(summary) => {
838                            tracing::debug!("invalid ({} {})", summary.name(), summary.version());
839                        }
840                        IndexSummary::Offline(summary) => {
841                            tracing::debug!("offline ({} {})", summary.name(), summary.version());
842                        }
843                    }
844                }))?;
845            if precise_yanked_in_use {
846                let name = dep.package_name();
847                let version = req
848                    .precise_version()
849                    .expect("--precise <yanked-version> in use");
850                if self.selected_precise_yanked.insert((name, version.clone())) {
851                    let mut shell = self.gctx.shell();
852                    shell.warn(format_args!(
853                        "selected package `{name}@{version}` was yanked by the author"
854                    ))?;
855                    shell.note("if possible, try a compatible non-yanked version")?;
856                }
857            }
858            if called {
859                return Poll::Ready(Ok(()));
860            }
861            let mut any_pending = false;
862            if kind == QueryKind::AlternativeNames || kind == QueryKind::Normalized {
863                // Attempt to handle misspellings by searching for a chain of related
864                // names to the original name. The resolver will later
865                // reject any candidates that have the wrong name, and with this it'll
866                // along the way produce helpful "did you mean?" suggestions.
867                // For now we only try the canonical lysing `-` to `_` and vice versa.
868                // More advanced fuzzy searching become in the future.
869                for name_permutation in [
870                    dep.package_name().replace('-', "_"),
871                    dep.package_name().replace('_', "-"),
872                ] {
873                    let name_permutation = InternedString::new(&name_permutation);
874                    if name_permutation == dep.package_name() {
875                        continue;
876                    }
877                    any_pending |= self
878                        .index
879                        .query_inner(name_permutation, &req, &mut *self.ops, &mut |s| {
880                            if !s.is_yanked() {
881                                f(s);
882                            } else if kind == QueryKind::AlternativeNames {
883                                f(s);
884                            }
885                        })?
886                        .is_pending();
887                }
888            }
889            if any_pending {
890                Poll::Pending
891            } else {
892                Poll::Ready(Ok(()))
893            }
894        }
895    }
896
897    fn supports_checksums(&self) -> bool {
898        true
899    }
900
901    fn requires_precise(&self) -> bool {
902        false
903    }
904
905    fn source_id(&self) -> SourceId {
906        self.source_id
907    }
908
909    fn invalidate_cache(&mut self) {
910        self.index.clear_summaries_cache();
911        self.ops.invalidate_cache();
912    }
913
914    fn set_quiet(&mut self, quiet: bool) {
915        self.ops.set_quiet(quiet);
916    }
917
918    fn download(&mut self, package: PackageId) -> CargoResult<MaybePackage> {
919        let hash = loop {
920            match self.index.hash(package, &mut *self.ops)? {
921                Poll::Pending => self.block_until_ready()?,
922                Poll::Ready(hash) => break hash,
923            }
924        };
925        match self.ops.download(package, hash)? {
926            MaybeLock::Ready(file) => self.get_pkg(package, &file).map(MaybePackage::Ready),
927            MaybeLock::Download {
928                url,
929                descriptor,
930                authorization,
931            } => Ok(MaybePackage::Download {
932                url,
933                descriptor,
934                authorization,
935            }),
936        }
937    }
938
939    fn finish_download(&mut self, package: PackageId, data: Vec<u8>) -> CargoResult<Package> {
940        let hash = loop {
941            match self.index.hash(package, &mut *self.ops)? {
942                Poll::Pending => self.block_until_ready()?,
943                Poll::Ready(hash) => break hash,
944            }
945        };
946        let file = self.ops.finish_download(package, hash, &data)?;
947        self.get_pkg(package, &file)
948    }
949
950    fn fingerprint(&self, pkg: &Package) -> CargoResult<String> {
951        Ok(pkg.package_id().version().to_string())
952    }
953
954    fn describe(&self) -> String {
955        self.source_id.display_index()
956    }
957
958    fn add_to_yanked_whitelist(&mut self, pkgs: &[PackageId]) {
959        self.yanked_whitelist.extend(pkgs);
960    }
961
962    fn is_yanked(&mut self, pkg: PackageId) -> Poll<CargoResult<bool>> {
963        self.index.is_yanked(pkg, &mut *self.ops)
964    }
965
966    fn block_until_ready(&mut self) -> CargoResult<()> {
967        // Before starting to work on the registry, make sure that
968        // `<cargo_home>/registry` is marked as excluded from indexing and
969        // backups. Older versions of Cargo didn't do this, so we do it here
970        // regardless of whether `<cargo_home>` exists.
971        //
972        // This does not use `create_dir_all_excluded_from_backups_atomic` for
973        // the same reason: we want to exclude it even if the directory already
974        // exists.
975        //
976        // IO errors in creating and marking it are ignored, e.g. in case we're on a
977        // read-only filesystem.
978        let registry_base = self.gctx.registry_base_path();
979        let _ = registry_base.create_dir();
980        exclude_from_backups_and_indexing(&registry_base.into_path_unlocked());
981
982        self.ops.block_until_ready()
983    }
984}
985
986impl RegistryConfig {
987    /// File name of [`RegistryConfig`].
988    const NAME: &'static str = "config.json";
989}
990
991/// Get the maximum unpack size that Cargo permits
992/// based on a given `size` of your compressed file.
993///
994/// Returns the larger one between `size * max compression ratio`
995/// and a fixed max unpacked size.
996///
997/// In reality, the compression ratio usually falls in the range of 2:1 to 10:1.
998/// We choose 20:1 to cover almost all possible cases hopefully.
999/// Any ratio higher than this is considered as a zip bomb.
1000///
1001/// In the future we might want to introduce a configurable size.
1002///
1003/// Some of the real world data from common compression algorithms:
1004///
1005/// * <https://www.zlib.net/zlib_tech.html>
1006/// * <https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf>
1007/// * <https://blog.cloudflare.com/results-experimenting-brotli/>
1008/// * <https://tukaani.org/lzma/benchmarks.html>
1009fn max_unpack_size(gctx: &GlobalContext, size: u64) -> u64 {
1010    const SIZE_VAR: &str = "__CARGO_TEST_MAX_UNPACK_SIZE";
1011    const RATIO_VAR: &str = "__CARGO_TEST_MAX_UNPACK_RATIO";
1012    const MAX_UNPACK_SIZE: u64 = 512 * 1024 * 1024; // 512 MiB
1013    const MAX_COMPRESSION_RATIO: usize = 20; // 20:1
1014
1015    let max_unpack_size = if cfg!(debug_assertions) && gctx.get_env(SIZE_VAR).is_ok() {
1016        // For integration test only.
1017        gctx.get_env(SIZE_VAR)
1018            .unwrap()
1019            .parse()
1020            .expect("a max unpack size in bytes")
1021    } else {
1022        MAX_UNPACK_SIZE
1023    };
1024    let max_compression_ratio = if cfg!(debug_assertions) && gctx.get_env(RATIO_VAR).is_ok() {
1025        // For integration test only.
1026        gctx.get_env(RATIO_VAR)
1027            .unwrap()
1028            .parse()
1029            .expect("a max compression ratio in bytes")
1030    } else {
1031        MAX_COMPRESSION_RATIO
1032    };
1033
1034    u64::max(max_unpack_size, size * max_compression_ratio as u64)
1035}
1036
1037/// Set the current [`umask`] value for the given tarball. No-op on non-Unix
1038/// platforms.
1039///
1040/// On Windows, tar only looks at user permissions and tries to set the "read
1041/// only" attribute, so no-op as well.
1042///
1043/// [`umask`]: https://man7.org/linux/man-pages/man2/umask.2.html
1044#[allow(unused_variables)]
1045fn set_mask<R: Read>(tar: &mut Archive<R>) {
1046    #[cfg(unix)]
1047    tar.set_mask(crate::util::get_umask());
1048}