cargo/sources/registry/mod.rs
1//! A `Source` for registry-based packages.
2//!
3//! # What's a Registry?
4//!
5//! [Registries] are central locations where packages can be uploaded to,
6//! discovered, and searched for. The purpose of a registry is to have a
7//! location that serves as permanent storage for versions of a crate over time.
8//!
9//! Compared to git sources (see [`GitSource`]), a registry provides many
10//! packages as well as many versions simultaneously. Git sources can also
11//! have commits deleted through rebasings where registries cannot have their
12//! versions deleted.
13//!
14//! In Cargo, [`RegistryData`] is an abstraction over each kind of actual
15//! registry, and [`RegistrySource`] connects those implementations to
16//! [`Source`] trait. Two prominent features these abstractions provide are
17//!
18//! * A way to query the metadata of a package from a registry. The metadata
19//! comes from the index.
20//! * A way to download package contents (a.k.a source files) that are required
21//! when building the package itself.
22//!
23//! We'll cover each functionality later.
24//!
25//! [Registries]: https://doc.rust-lang.org/nightly/cargo/reference/registries.html
26//! [`GitSource`]: super::GitSource
27//!
28//! # Different Kinds of Registries
29//!
30//! Cargo provides multiple kinds of registries. Each of them serves the index
31//! and package contents in a slightly different way. Namely,
32//!
33//! * [`LocalRegistry`] --- Serves the index and package contents entirely on
34//! a local filesystem.
35//! * [`RemoteRegistry`] --- Serves the index ahead of time from a Git
36//! repository, and package contents are downloaded as needed.
37//! * [`HttpRegistry`] --- Serves both the index and package contents on demand
38//! over a HTTP-based registry API. This is the default starting from 1.70.0.
39//!
40//! Each registry has its own [`RegistryData`] implementation, and can be
41//! created from either [`RegistrySource::local`] or [`RegistrySource::remote`].
42//!
43//! [`LocalRegistry`]: local::LocalRegistry
44//! [`RemoteRegistry`]: remote::RemoteRegistry
45//! [`HttpRegistry`]: http_remote::HttpRegistry
46//!
47//! # The Index of a Registry
48//!
49//! One of the major difficulties with a registry is that hosting so many
50//! packages may quickly run into performance problems when dealing with
51//! dependency graphs. It's infeasible for cargo to download the entire contents
52//! of the registry just to resolve one package's dependencies, for example. As
53//! a result, cargo needs some efficient method of querying what packages are
54//! available on a registry, what versions are available, and what the
55//! dependencies for each version is.
56//!
57//! To solve the problem, a registry must provide an index of package metadata.
58//! The index of a registry is essentially an easily query-able version of the
59//! registry's database for a list of versions of a package as well as a list
60//! of dependencies for each version. The exact format of the index is
61//! described later.
62//!
63//! See the [`index`] module for topics about the management, parsing, caching,
64//! and versioning for the on-disk index.
65//!
66//! ## The Format of The Index
67//!
68//! The index is a store for the list of versions for all packages known, so its
69//! format on disk is optimized slightly to ensure that `ls registry` doesn't
70//! produce a list of all packages ever known. The index also wants to ensure
71//! that there's not a million files which may actually end up hitting
72//! filesystem limits at some point. To this end, a few decisions were made
73//! about the format of the registry:
74//!
75//! 1. Each crate will have one file corresponding to it. Each version for a
76//! crate will just be a line in this file (see [`cargo_util_schemas::index::IndexPackage`] for its
77//! representation).
78//! 2. There will be two tiers of directories for crate names, under which
79//! crates corresponding to those tiers will be located.
80//! (See [`cargo_util::registry::make_dep_path`] for the implementation of
81//! this layout hierarchy.)
82//!
83//! As an example, this is an example hierarchy of an index:
84//!
85//! ```notrust
86//! .
87//! ├── 3
88//! │ └── u
89//! │ └── url
90//! ├── bz
91//! │ └── ip
92//! │ └── bzip2
93//! ├── config.json
94//! ├── en
95//! │ └── co
96//! │ └── encoding
97//! └── li
98//! ├── bg
99//! │ └── libgit2
100//! └── nk
101//! └── link-config
102//! ```
103//!
104//! The root of the index contains a `config.json` file with a few entries
105//! corresponding to the registry (see [`RegistryConfig`] below).
106//!
107//! Otherwise, there are three numbered directories (1, 2, 3) for crates with
108//! names 1, 2, and 3 characters in length. The 1/2 directories simply have the
109//! crate files underneath them, while the 3 directory is sharded by the first
110//! letter of the crate name.
111//!
112//! Otherwise the top-level directory contains many two-letter directory names,
113//! each of which has many sub-folders with two letters. At the end of all these
114//! are the actual crate files themselves.
115//!
116//! The purpose of this layout is to hopefully cut down on `ls` sizes as well as
117//! efficient lookup based on the crate name itself.
118//!
119//! See [The Cargo Book: Registry Index][registry-index] for the public
120//! interface on the index format.
121//!
122//! [registry-index]: https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html
123//!
124//! ## The Index Files
125//!
126//! Each file in the index is the history of one crate over time. Each line in
127//! the file corresponds to one version of a crate, stored in JSON format (see
128//! the [`cargo_util_schemas::index::IndexPackage`] structure).
129//!
130//! As new versions are published, new lines are appended to this file. **The
131//! only modifications to this file that should happen over time are yanks of a
132//! particular version.**
133//!
134//! # Downloading Packages
135//!
136//! The purpose of the index was to provide an efficient method to resolve the
137//! dependency graph for a package. After resolution has been performed, we need
138//! to download the contents of packages so we can read the full manifest and
139//! build the source code.
140//!
141//! To accomplish this, [`RegistryData::download`] will "make" an HTTP request
142//! per-package requested to download tarballs into a local cache. These
143//! tarballs will then be unpacked into a destination folder.
144//!
145//! Note that because versions uploaded to the registry are frozen forever that
146//! the HTTP download and unpacking can all be skipped if the version has
147//! already been downloaded and unpacked. This caching allows us to only
148//! download a package when absolutely necessary.
149//!
150//! # Filesystem Hierarchy
151//!
152//! Overall, the `$HOME/.cargo` looks like this when talking about the registry
153//! (remote registries, specifically):
154//!
155//! ```notrust
156//! # A folder under which all registry metadata is hosted (similar to
157//! # $HOME/.cargo/git)
158//! $HOME/.cargo/registry/
159//!
160//! # For each registry that cargo knows about (keyed by hostname + hash)
161//! # there is a folder which is the checked out version of the index for
162//! # the registry in this location. Note that this is done so cargo can
163//! # support multiple registries simultaneously
164//! index/
165//! registry1-<hash>/
166//! registry2-<hash>/
167//! ...
168//!
169//! # This folder is a cache for all downloaded tarballs (`.crate` file)
170//! # from a registry. Once downloaded and verified, a tarball never changes.
171//! cache/
172//! registry1-<hash>/<pkg>-<version>.crate
173//! ...
174//!
175//! # Location in which all tarballs are unpacked. Each tarball is known to
176//! # be frozen after downloading, so transitively this folder is also
177//! # frozen once its unpacked (it's never unpacked again)
178//! # CAVEAT: They are not read-only. See rust-lang/cargo#9455.
179//! src/
180//! registry1-<hash>/<pkg>-<version>/...
181//! ...
182//! ```
183//!
184
185use std::collections::HashSet;
186use std::fs;
187use std::fs::{File, OpenOptions};
188use std::io;
189use std::io::Read;
190use std::io::Write;
191use std::path::{Path, PathBuf};
192use std::task::{Poll, ready};
193
194use anyhow::Context as _;
195use cargo_util::paths::{self, exclude_from_backups_and_indexing};
196use flate2::read::GzDecoder;
197use serde::Deserialize;
198use serde::Serialize;
199use tar::Archive;
200use tracing::debug;
201
202use crate::core::dependency::Dependency;
203use crate::core::global_cache_tracker;
204use crate::core::{Package, PackageId, SourceId};
205use crate::sources::PathSource;
206use crate::sources::source::MaybePackage;
207use crate::sources::source::QueryKind;
208use crate::sources::source::Source;
209use crate::util::cache_lock::CacheLockMode;
210use crate::util::interning::InternedString;
211use crate::util::network::PollExt;
212use crate::util::{CargoResult, Filesystem, GlobalContext, LimitErrorReader, restricted_names};
213use crate::util::{VersionExt, hex};
214
215/// The `.cargo-ok` file is used to track if the source is already unpacked.
216/// See [`RegistrySource::unpack_package`] for more.
217///
218/// Not to be confused with `.cargo-ok` file in git sources.
219const PACKAGE_SOURCE_LOCK: &str = ".cargo-ok";
220
221pub const CRATES_IO_INDEX: &str = "https://github.com/rust-lang/crates.io-index";
222pub const CRATES_IO_HTTP_INDEX: &str = "sparse+https://index.crates.io/";
223pub const CRATES_IO_REGISTRY: &str = "crates-io";
224pub const CRATES_IO_DOMAIN: &str = "crates.io";
225
226/// The content inside `.cargo-ok`.
227/// See [`RegistrySource::unpack_package`] for more.
228#[derive(Deserialize, Serialize)]
229#[serde(rename_all = "kebab-case")]
230struct LockMetadata {
231 /// The version of `.cargo-ok` file
232 v: u32,
233}
234
235/// A [`Source`] implementation for a local or a remote registry.
236///
237/// This contains common functionality that is shared between each registry
238/// kind, with the registry-specific logic implemented as part of the
239/// [`RegistryData`] trait referenced via the `ops` field.
240///
241/// For general concepts of registries, see the [module-level documentation](crate::sources::registry).
242pub struct RegistrySource<'gctx> {
243 /// A unique name of the source (typically used as the directory name
244 /// where its cached content is stored).
245 name: InternedString,
246 /// The unique identifier of this source.
247 source_id: SourceId,
248 /// The path where crate files are extracted (`$CARGO_HOME/registry/src/$REG-HASH`).
249 src_path: Filesystem,
250 /// Path to the cache of `.crate` files (`$CARGO_HOME/registry/cache/$REG-HASH`).
251 cache_path: Filesystem,
252 /// Local reference to [`GlobalContext`] for convenience.
253 gctx: &'gctx GlobalContext,
254 /// Abstraction for interfacing to the different registry kinds.
255 ops: Box<dyn RegistryData + 'gctx>,
256 /// Interface for managing the on-disk index.
257 index: index::RegistryIndex<'gctx>,
258 /// A set of packages that should be allowed to be used, even if they are
259 /// yanked.
260 ///
261 /// This is populated from the entries in `Cargo.lock` to ensure that
262 /// `cargo update somepkg` won't unlock yanked entries in `Cargo.lock`.
263 /// Otherwise, the resolver would think that those entries no longer
264 /// exist, and it would trigger updates to unrelated packages.
265 yanked_whitelist: HashSet<PackageId>,
266 /// Yanked versions that have already been selected during queries.
267 ///
268 /// As of this writing, this is for not emitting the `--precise <yanked>`
269 /// warning twice, with the assumption of (`dep.package_name()` + `--precise`
270 /// version) being sufficient to uniquely identify the same query result.
271 selected_precise_yanked: HashSet<(InternedString, semver::Version)>,
272}
273
274/// The [`config.json`] file stored in the index.
275///
276/// The config file may look like:
277///
278/// ```json
279/// {
280/// "dl": "https://example.com/api/{crate}/{version}/download",
281/// "api": "https://example.com/api",
282/// "auth-required": false # unstable feature (RFC 3139)
283/// }
284/// ```
285///
286/// [`config.json`]: https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html#index-configuration
287#[derive(Deserialize, Debug, Clone)]
288#[serde(rename_all = "kebab-case")]
289pub struct RegistryConfig {
290 /// Download endpoint for all crates.
291 ///
292 /// The string is a template which will generate the download URL for the
293 /// tarball of a specific version of a crate. The substrings `{crate}` and
294 /// `{version}` will be replaced with the crate's name and version
295 /// respectively. The substring `{prefix}` will be replaced with the
296 /// crate's prefix directory name, and the substring `{lowerprefix}` will
297 /// be replaced with the crate's prefix directory name converted to
298 /// lowercase. The substring `{sha256-checksum}` will be replaced with the
299 /// crate's sha256 checksum.
300 ///
301 /// For backwards compatibility, if the string does not contain any
302 /// markers (`{crate}`, `{version}`, `{prefix}`, or `{lowerprefix}`), it
303 /// will be extended with `/{crate}/{version}/download` to
304 /// support registries like crates.io which were created before the
305 /// templating setup was created.
306 ///
307 /// For more on the template of the download URL, see [Index Configuration](
308 /// https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html#index-configuration).
309 pub dl: String,
310
311 /// API endpoint for the registry. This is what's actually hit to perform
312 /// operations like yanks, owner modifications, publish new crates, etc.
313 /// If this is None, the registry does not support API commands.
314 pub api: Option<String>,
315
316 /// Whether all operations require authentication. See [RFC 3139].
317 ///
318 /// [RFC 3139]: https://rust-lang.github.io/rfcs/3139-cargo-alternative-registry-auth.html
319 #[serde(default)]
320 pub auth_required: bool,
321}
322
323/// Result from loading data from a registry.
324pub enum LoadResponse {
325 /// The cache is valid. The cached data should be used.
326 CacheValid,
327
328 /// The cache is out of date. Returned data should be used.
329 Data {
330 raw_data: Vec<u8>,
331 /// Version of this data to determine whether it is out of date.
332 index_version: Option<String>,
333 },
334
335 /// The requested crate was found.
336 NotFound,
337}
338
339/// An abstract interface to handle both a local and remote registry.
340///
341/// This allows [`RegistrySource`] to abstractly handle each registry kind.
342///
343/// For general concepts of registries, see the [module-level documentation](crate::sources::registry).
344pub trait RegistryData {
345 /// Performs initialization for the registry.
346 ///
347 /// This should be safe to call multiple times, the implementation is
348 /// expected to not do any work if it is already prepared.
349 fn prepare(&self) -> CargoResult<()>;
350
351 /// Returns the path to the index.
352 ///
353 /// Note that different registries store the index in different formats
354 /// (remote = git, http & local = files).
355 fn index_path(&self) -> &Filesystem;
356
357 /// Loads the JSON for a specific named package from the index.
358 ///
359 /// * `root` is the root path to the index.
360 /// * `path` is the relative path to the package to load (like `ca/rg/cargo`).
361 /// * `index_version` is the version of the requested crate data currently
362 /// in cache. This is useful for checking if a local cache is outdated.
363 fn load(
364 &mut self,
365 root: &Path,
366 path: &Path,
367 index_version: Option<&str>,
368 ) -> Poll<CargoResult<LoadResponse>>;
369
370 /// Loads the `config.json` file and returns it.
371 ///
372 /// Local registries don't have a config, and return `None`.
373 fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>>;
374
375 /// Invalidates locally cached data.
376 fn invalidate_cache(&mut self);
377
378 /// If quiet, the source should not display any progress or status messages.
379 fn set_quiet(&mut self, quiet: bool);
380
381 /// Is the local cached data up-to-date?
382 fn is_updated(&self) -> bool;
383
384 /// Prepare to start downloading a `.crate` file.
385 ///
386 /// Despite the name, this doesn't actually download anything. If the
387 /// `.crate` is already downloaded, then it returns [`MaybeLock::Ready`].
388 /// If it hasn't been downloaded, then it returns [`MaybeLock::Download`]
389 /// which contains the URL to download. The [`crate::core::package::Downloads`]
390 /// system handles the actual download process. After downloading, it
391 /// calls [`Self::finish_download`] to save the downloaded file.
392 ///
393 /// `checksum` is currently only used by local registries to verify the
394 /// file contents (because local registries never actually download
395 /// anything). Remote registries will validate the checksum in
396 /// `finish_download`. For already downloaded `.crate` files, it does not
397 /// validate the checksum, assuming the filesystem does not suffer from
398 /// corruption or manipulation.
399 fn download(&mut self, pkg: PackageId, checksum: &str) -> CargoResult<MaybeLock>;
400
401 /// Finish a download by saving a `.crate` file to disk.
402 ///
403 /// After [`crate::core::package::Downloads`] has finished a download,
404 /// it will call this to save the `.crate` file. This is only relevant
405 /// for remote registries. This should validate the checksum and save
406 /// the given data to the on-disk cache.
407 ///
408 /// Returns a [`File`] handle to the `.crate` file, positioned at the start.
409 fn finish_download(&mut self, pkg: PackageId, checksum: &str, data: &[u8])
410 -> CargoResult<File>;
411
412 /// Returns whether or not the `.crate` file is already downloaded.
413 fn is_crate_downloaded(&self, _pkg: PackageId) -> bool {
414 true
415 }
416
417 /// Validates that the global package cache lock is held.
418 ///
419 /// Given the [`Filesystem`], this will make sure that the package cache
420 /// lock is held. If not, it will panic. See
421 /// [`GlobalContext::acquire_package_cache_lock`] for acquiring the global lock.
422 ///
423 /// Returns the [`Path`] to the [`Filesystem`].
424 fn assert_index_locked<'a>(&self, path: &'a Filesystem) -> &'a Path;
425
426 /// Block until all outstanding `Poll::Pending` requests are `Poll::Ready`.
427 fn block_until_ready(&mut self) -> CargoResult<()>;
428}
429
430/// The status of [`RegistryData::download`] which indicates if a `.crate`
431/// file has already been downloaded, or if not then the URL to download.
432pub enum MaybeLock {
433 /// The `.crate` file is already downloaded. [`File`] is a handle to the
434 /// opened `.crate` file on the filesystem.
435 Ready(File),
436 /// The `.crate` file is not downloaded, here's the URL to download it from.
437 ///
438 /// `descriptor` is just a text string to display to the user of what is
439 /// being downloaded.
440 Download {
441 url: String,
442 descriptor: String,
443 authorization: Option<String>,
444 },
445}
446
447mod download;
448mod http_remote;
449pub(crate) mod index;
450pub use index::IndexSummary;
451mod local;
452mod remote;
453
454/// Generates a unique name for [`SourceId`] to have a unique path to put their
455/// index files.
456fn short_name(id: SourceId, is_shallow: bool) -> String {
457 // CAUTION: This should not change between versions. If you change how
458 // this is computed, it will orphan previously cached data, forcing the
459 // cache to be rebuilt and potentially wasting significant disk space. If
460 // you change it, be cautious of the impact. See `test_cratesio_hash` for
461 // a similar discussion.
462 let hash = hex::short_hash(&id);
463 let ident = id.url().host_str().unwrap_or("").to_string();
464 let mut name = format!("{}-{}", ident, hash);
465 if is_shallow {
466 name.push_str("-shallow");
467 }
468 name
469}
470
471impl<'gctx> RegistrySource<'gctx> {
472 /// Creates a [`Source`] of a "remote" registry.
473 /// It could be either an HTTP-based [`http_remote::HttpRegistry`] or
474 /// a Git-based [`remote::RemoteRegistry`].
475 ///
476 /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
477 pub fn remote(
478 source_id: SourceId,
479 yanked_whitelist: &HashSet<PackageId>,
480 gctx: &'gctx GlobalContext,
481 ) -> CargoResult<RegistrySource<'gctx>> {
482 assert!(source_id.is_remote_registry());
483 let name = short_name(
484 source_id,
485 gctx.cli_unstable()
486 .git
487 .map_or(false, |features| features.shallow_index)
488 && !source_id.is_sparse(),
489 );
490 let ops = if source_id.is_sparse() {
491 Box::new(http_remote::HttpRegistry::new(source_id, gctx, &name)?) as Box<_>
492 } else {
493 Box::new(remote::RemoteRegistry::new(source_id, gctx, &name)) as Box<_>
494 };
495
496 Ok(RegistrySource::new(
497 source_id,
498 gctx,
499 &name,
500 ops,
501 yanked_whitelist,
502 ))
503 }
504
505 /// Creates a [`Source`] of a local registry, with [`local::LocalRegistry`] under the hood.
506 ///
507 /// * `path` --- The root path of a local registry on the file system.
508 /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
509 pub fn local(
510 source_id: SourceId,
511 path: &Path,
512 yanked_whitelist: &HashSet<PackageId>,
513 gctx: &'gctx GlobalContext,
514 ) -> RegistrySource<'gctx> {
515 let name = short_name(source_id, false);
516 let ops = local::LocalRegistry::new(path, gctx, &name);
517 RegistrySource::new(source_id, gctx, &name, Box::new(ops), yanked_whitelist)
518 }
519
520 /// Creates a source of a registry. This is a inner helper function.
521 ///
522 /// * `name` --- Name of a path segment which may affect where `.crate`
523 /// tarballs, the registry index and cache are stored. Expect to be unique.
524 /// * `ops` --- The underlying [`RegistryData`] type.
525 /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
526 fn new(
527 source_id: SourceId,
528 gctx: &'gctx GlobalContext,
529 name: &str,
530 ops: Box<dyn RegistryData + 'gctx>,
531 yanked_whitelist: &HashSet<PackageId>,
532 ) -> RegistrySource<'gctx> {
533 RegistrySource {
534 name: name.into(),
535 src_path: gctx.registry_source_path().join(name),
536 cache_path: gctx.registry_cache_path().join(name),
537 gctx,
538 source_id,
539 index: index::RegistryIndex::new(source_id, ops.index_path(), gctx),
540 yanked_whitelist: yanked_whitelist.clone(),
541 ops,
542 selected_precise_yanked: HashSet::new(),
543 }
544 }
545
546 /// Decode the [configuration](RegistryConfig) stored within the registry.
547 ///
548 /// This requires that the index has been at least checked out.
549 pub fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>> {
550 self.ops.config()
551 }
552
553 /// Unpacks a downloaded package into a location where it's ready to be
554 /// compiled.
555 ///
556 /// No action is taken if the source looks like it's already unpacked.
557 ///
558 /// # History of interruption detection with `.cargo-ok` file
559 ///
560 /// Cargo has always included a `.cargo-ok` file ([`PACKAGE_SOURCE_LOCK`])
561 /// to detect if extraction was interrupted, but it was originally empty.
562 ///
563 /// In 1.34, Cargo was changed to create the `.cargo-ok` file before it
564 /// started extraction to implement fine-grained locking. After it was
565 /// finished extracting, it wrote two bytes to indicate it was complete.
566 /// It would use the length check to detect if it was possibly interrupted.
567 ///
568 /// In 1.36, Cargo changed to not use fine-grained locking, and instead used
569 /// a global lock. The use of `.cargo-ok` was no longer needed for locking
570 /// purposes, but was kept to detect when extraction was interrupted.
571 ///
572 /// In 1.49, Cargo changed to not create the `.cargo-ok` file before it
573 /// started extraction to deal with `.crate` files that inexplicably had
574 /// a `.cargo-ok` file in them.
575 ///
576 /// In 1.64, Cargo changed to detect `.crate` files with `.cargo-ok` files
577 /// in them in response to [CVE-2022-36113], which dealt with malicious
578 /// `.crate` files making `.cargo-ok` a symlink causing cargo to write "ok"
579 /// to any arbitrary file on the filesystem it has permission to.
580 ///
581 /// In 1.71, `.cargo-ok` changed to contain a JSON `{ v: 1 }` to indicate
582 /// the version of it. A failure of parsing will result in a heavy-hammer
583 /// approach that unpacks the `.crate` file again. This is in response to a
584 /// security issue that the unpacking didn't respect umask on Unix systems.
585 ///
586 /// This is all a long-winded way of explaining the circumstances that might
587 /// cause a directory to contain a `.cargo-ok` file that is empty or
588 /// otherwise corrupted. Either this was extracted by a version of Rust
589 /// before 1.34, in which case everything should be fine. However, an empty
590 /// file created by versions 1.36 to 1.49 indicates that the extraction was
591 /// interrupted and that we need to start again.
592 ///
593 /// Another possibility is that the filesystem is simply corrupted, in
594 /// which case deleting the directory might be the safe thing to do. That
595 /// is probably unlikely, though.
596 ///
597 /// To be safe, we delete the directory and start over again if an empty
598 /// `.cargo-ok` file is found.
599 ///
600 /// [CVE-2022-36113]: https://blog.rust-lang.org/2022/09/14/cargo-cves.html#arbitrary-file-corruption-cve-2022-36113
601 fn unpack_package(&self, pkg: PackageId, tarball: &File) -> CargoResult<PathBuf> {
602 let package_dir = format!("{}-{}", pkg.name(), pkg.version());
603 let dst = self.src_path.join(&package_dir);
604 let path = dst.join(PACKAGE_SOURCE_LOCK);
605 let path = self
606 .gctx
607 .assert_package_cache_locked(CacheLockMode::DownloadExclusive, &path);
608 let unpack_dir = path.parent().unwrap();
609 match fs::read_to_string(path) {
610 Ok(ok) => match serde_json::from_str::<LockMetadata>(&ok) {
611 Ok(lock_meta) if lock_meta.v == 1 => {
612 self.gctx
613 .deferred_global_last_use()?
614 .mark_registry_src_used(global_cache_tracker::RegistrySrc {
615 encoded_registry_name: self.name,
616 package_dir: package_dir.into(),
617 size: None,
618 });
619 return Ok(unpack_dir.to_path_buf());
620 }
621 _ => {
622 if ok == "ok" {
623 tracing::debug!("old `ok` content found, clearing cache");
624 } else {
625 tracing::warn!("unrecognized .cargo-ok content, clearing cache: {ok}");
626 }
627 // See comment of `unpack_package` about why removing all stuff.
628 paths::remove_dir_all(dst.as_path_unlocked())?;
629 }
630 },
631 Err(e) if e.kind() == io::ErrorKind::NotFound => {}
632 Err(e) => anyhow::bail!("unable to read .cargo-ok file at {path:?}: {e}"),
633 }
634 dst.create_dir()?;
635
636 let bytes_written = unpack(self.gctx, tarball, unpack_dir, &|_| true)?;
637
638 // Now that we've finished unpacking, create and write to the lock file to indicate that
639 // unpacking was successful.
640 let mut ok = OpenOptions::new()
641 .create_new(true)
642 .read(true)
643 .write(true)
644 .open(&path)
645 .with_context(|| format!("failed to open `{}`", path.display()))?;
646
647 let lock_meta = LockMetadata { v: 1 };
648 write!(ok, "{}", serde_json::to_string(&lock_meta).unwrap())?;
649
650 self.gctx
651 .deferred_global_last_use()?
652 .mark_registry_src_used(global_cache_tracker::RegistrySrc {
653 encoded_registry_name: self.name,
654 package_dir: package_dir.into(),
655 size: Some(bytes_written),
656 });
657
658 Ok(unpack_dir.to_path_buf())
659 }
660
661 /// Unpacks the `.crate` tarball of the package in a given directory.
662 ///
663 /// Returns the path to the crate tarball directory,
664 /// which is always `<unpack_dir>/<pkg>-<version>`.
665 ///
666 /// This holds an assumption that the associated tarball already exists.
667 pub fn unpack_package_in(
668 &self,
669 pkg: &PackageId,
670 unpack_dir: &Path,
671 include: &dyn Fn(&Path) -> bool,
672 ) -> CargoResult<PathBuf> {
673 let path = self.cache_path.join(pkg.tarball_name());
674 let path = self
675 .gctx
676 .assert_package_cache_locked(CacheLockMode::DownloadExclusive, &path);
677 let dst = unpack_dir.join(format!("{}-{}", pkg.name(), pkg.version()));
678 let tarball =
679 File::open(path).with_context(|| format!("failed to open {}", path.display()))?;
680 unpack(self.gctx, &tarball, &dst, include)?;
681 Ok(dst)
682 }
683
684 /// Turns the downloaded `.crate` tarball file into a [`Package`].
685 ///
686 /// This unconditionally sets checksum for the returned package, so it
687 /// should only be called after doing integrity check. That is to say,
688 /// you need to call either [`RegistryData::download`] or
689 /// [`RegistryData::finish_download`] before calling this method.
690 fn get_pkg(&mut self, package: PackageId, path: &File) -> CargoResult<Package> {
691 let path = self
692 .unpack_package(package, path)
693 .with_context(|| format!("failed to unpack package `{}`", package))?;
694 let mut src = PathSource::new(&path, self.source_id, self.gctx);
695 src.load()?;
696 let mut pkg = match src.download(package)? {
697 MaybePackage::Ready(pkg) => pkg,
698 MaybePackage::Download { .. } => unreachable!(),
699 };
700
701 // After we've loaded the package configure its summary's `checksum`
702 // field with the checksum we know for this `PackageId`.
703 let cksum = self
704 .index
705 .hash(package, &mut *self.ops)
706 .expect("a downloaded dep now pending!?")
707 .expect("summary not found");
708 pkg.manifest_mut()
709 .summary_mut()
710 .set_checksum(cksum.to_string());
711
712 Ok(pkg)
713 }
714}
715
716impl<'gctx> Source for RegistrySource<'gctx> {
717 fn query(
718 &mut self,
719 dep: &Dependency,
720 kind: QueryKind,
721 f: &mut dyn FnMut(IndexSummary),
722 ) -> Poll<CargoResult<()>> {
723 let mut req = dep.version_req().clone();
724
725 // Handle `cargo update --precise` here.
726 if let Some((_, requested)) = self
727 .source_id
728 .precise_registry_version(dep.package_name().as_str())
729 .filter(|(c, to)| {
730 if to.is_prerelease() && self.gctx.cli_unstable().unstable_options {
731 req.matches_prerelease(c)
732 } else {
733 req.matches(c)
734 }
735 })
736 {
737 req.precise_to(&requested);
738 }
739
740 let mut called = false;
741 let callback = &mut |s| {
742 called = true;
743 f(s);
744 };
745
746 // If this is a locked dependency, then it came from a lock file and in
747 // theory the registry is known to contain this version. If, however, we
748 // come back with no summaries, then our registry may need to be
749 // updated, so we fall back to performing a lazy update.
750 if kind == QueryKind::Exact && req.is_locked() && !self.ops.is_updated() {
751 debug!("attempting query without update");
752 ready!(
753 self.index
754 .query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| {
755 if matches!(s, IndexSummary::Candidate(_) | IndexSummary::Yanked(_))
756 && dep.matches(s.as_summary())
757 {
758 // We are looking for a package from a lock file so we do not care about yank
759 callback(s)
760 }
761 },)
762 )?;
763 if called {
764 Poll::Ready(Ok(()))
765 } else {
766 debug!("falling back to an update");
767 self.invalidate_cache();
768 Poll::Pending
769 }
770 } else {
771 let mut precise_yanked_in_use = false;
772 ready!(
773 self.index
774 .query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| {
775 let matched = match kind {
776 QueryKind::Exact | QueryKind::RejectedVersions => {
777 if req.is_precise() && self.gctx.cli_unstable().unstable_options {
778 dep.matches_prerelease(s.as_summary())
779 } else {
780 dep.matches(s.as_summary())
781 }
782 }
783 QueryKind::AlternativeNames => true,
784 QueryKind::Normalized => true,
785 };
786 if !matched {
787 return;
788 }
789 // Next filter out all yanked packages. Some yanked packages may
790 // leak through if they're in a whitelist (aka if they were
791 // previously in `Cargo.lock`
792 match s {
793 s @ _ if kind == QueryKind::RejectedVersions => callback(s),
794 s @ IndexSummary::Candidate(_) => callback(s),
795 s @ IndexSummary::Yanked(_) => {
796 if self.yanked_whitelist.contains(&s.package_id()) {
797 callback(s);
798 } else if req.is_precise() {
799 precise_yanked_in_use = true;
800 callback(s);
801 }
802 }
803 IndexSummary::Unsupported(summary, v) => {
804 tracing::debug!(
805 "unsupported schema version {} ({} {})",
806 v,
807 summary.name(),
808 summary.version()
809 );
810 }
811 IndexSummary::Invalid(summary) => {
812 tracing::debug!(
813 "invalid ({} {})",
814 summary.name(),
815 summary.version()
816 );
817 }
818 IndexSummary::Offline(summary) => {
819 tracing::debug!(
820 "offline ({} {})",
821 summary.name(),
822 summary.version()
823 );
824 }
825 }
826 })
827 )?;
828 if precise_yanked_in_use {
829 let name = dep.package_name();
830 let version = req
831 .precise_version()
832 .expect("--precise <yanked-version> in use");
833 if self.selected_precise_yanked.insert((name, version.clone())) {
834 let mut shell = self.gctx.shell();
835 shell.warn(format_args!(
836 "selected package `{name}@{version}` was yanked by the author"
837 ))?;
838 shell.note("if possible, try a compatible non-yanked version")?;
839 }
840 }
841 if called {
842 return Poll::Ready(Ok(()));
843 }
844 let mut any_pending = false;
845 if kind == QueryKind::AlternativeNames || kind == QueryKind::Normalized {
846 // Attempt to handle misspellings by searching for a chain of related
847 // names to the original name. The resolver will later
848 // reject any candidates that have the wrong name, and with this it'll
849 // along the way produce helpful "did you mean?" suggestions.
850 // For now we only try the canonical lysing `-` to `_` and vice versa.
851 // More advanced fuzzy searching become in the future.
852 for name_permutation in [
853 dep.package_name().replace('-', "_"),
854 dep.package_name().replace('_', "-"),
855 ] {
856 let name_permutation = name_permutation.into();
857 if name_permutation == dep.package_name() {
858 continue;
859 }
860 any_pending |= self
861 .index
862 .query_inner(name_permutation, &req, &mut *self.ops, &mut |s| {
863 if !s.is_yanked() {
864 f(s);
865 } else if kind == QueryKind::AlternativeNames {
866 f(s);
867 }
868 })?
869 .is_pending();
870 }
871 }
872 if any_pending {
873 Poll::Pending
874 } else {
875 Poll::Ready(Ok(()))
876 }
877 }
878 }
879
880 fn supports_checksums(&self) -> bool {
881 true
882 }
883
884 fn requires_precise(&self) -> bool {
885 false
886 }
887
888 fn source_id(&self) -> SourceId {
889 self.source_id
890 }
891
892 fn invalidate_cache(&mut self) {
893 self.index.clear_summaries_cache();
894 self.ops.invalidate_cache();
895 }
896
897 fn set_quiet(&mut self, quiet: bool) {
898 self.ops.set_quiet(quiet);
899 }
900
901 fn download(&mut self, package: PackageId) -> CargoResult<MaybePackage> {
902 let hash = loop {
903 match self.index.hash(package, &mut *self.ops)? {
904 Poll::Pending => self.block_until_ready()?,
905 Poll::Ready(hash) => break hash,
906 }
907 };
908 match self.ops.download(package, hash)? {
909 MaybeLock::Ready(file) => self.get_pkg(package, &file).map(MaybePackage::Ready),
910 MaybeLock::Download {
911 url,
912 descriptor,
913 authorization,
914 } => Ok(MaybePackage::Download {
915 url,
916 descriptor,
917 authorization,
918 }),
919 }
920 }
921
922 fn finish_download(&mut self, package: PackageId, data: Vec<u8>) -> CargoResult<Package> {
923 let hash = loop {
924 match self.index.hash(package, &mut *self.ops)? {
925 Poll::Pending => self.block_until_ready()?,
926 Poll::Ready(hash) => break hash,
927 }
928 };
929 let file = self.ops.finish_download(package, hash, &data)?;
930 self.get_pkg(package, &file)
931 }
932
933 fn fingerprint(&self, pkg: &Package) -> CargoResult<String> {
934 Ok(pkg.package_id().version().to_string())
935 }
936
937 fn describe(&self) -> String {
938 self.source_id.display_index()
939 }
940
941 fn add_to_yanked_whitelist(&mut self, pkgs: &[PackageId]) {
942 self.yanked_whitelist.extend(pkgs);
943 }
944
945 fn is_yanked(&mut self, pkg: PackageId) -> Poll<CargoResult<bool>> {
946 self.index.is_yanked(pkg, &mut *self.ops)
947 }
948
949 fn block_until_ready(&mut self) -> CargoResult<()> {
950 // Before starting to work on the registry, make sure that
951 // `<cargo_home>/registry` is marked as excluded from indexing and
952 // backups. Older versions of Cargo didn't do this, so we do it here
953 // regardless of whether `<cargo_home>` exists.
954 //
955 // This does not use `create_dir_all_excluded_from_backups_atomic` for
956 // the same reason: we want to exclude it even if the directory already
957 // exists.
958 //
959 // IO errors in creating and marking it are ignored, e.g. in case we're on a
960 // read-only filesystem.
961 let registry_base = self.gctx.registry_base_path();
962 let _ = registry_base.create_dir();
963 exclude_from_backups_and_indexing(®istry_base.into_path_unlocked());
964
965 self.ops.block_until_ready()
966 }
967}
968
969impl RegistryConfig {
970 /// File name of [`RegistryConfig`].
971 const NAME: &'static str = "config.json";
972}
973
974/// Get the maximum unpack size that Cargo permits
975/// based on a given `size` of your compressed file.
976///
977/// Returns the larger one between `size * max compression ratio`
978/// and a fixed max unpacked size.
979///
980/// In reality, the compression ratio usually falls in the range of 2:1 to 10:1.
981/// We choose 20:1 to cover almost all possible cases hopefully.
982/// Any ratio higher than this is considered as a zip bomb.
983///
984/// In the future we might want to introduce a configurable size.
985///
986/// Some of the real world data from common compression algorithms:
987///
988/// * <https://www.zlib.net/zlib_tech.html>
989/// * <https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf>
990/// * <https://blog.cloudflare.com/results-experimenting-brotli/>
991/// * <https://tukaani.org/lzma/benchmarks.html>
992fn max_unpack_size(gctx: &GlobalContext, size: u64) -> u64 {
993 const SIZE_VAR: &str = "__CARGO_TEST_MAX_UNPACK_SIZE";
994 const RATIO_VAR: &str = "__CARGO_TEST_MAX_UNPACK_RATIO";
995 const MAX_UNPACK_SIZE: u64 = 512 * 1024 * 1024; // 512 MiB
996 const MAX_COMPRESSION_RATIO: usize = 20; // 20:1
997
998 let max_unpack_size = if cfg!(debug_assertions) && gctx.get_env(SIZE_VAR).is_ok() {
999 // For integration test only.
1000 gctx.get_env(SIZE_VAR)
1001 .unwrap()
1002 .parse()
1003 .expect("a max unpack size in bytes")
1004 } else {
1005 MAX_UNPACK_SIZE
1006 };
1007 let max_compression_ratio = if cfg!(debug_assertions) && gctx.get_env(RATIO_VAR).is_ok() {
1008 // For integration test only.
1009 gctx.get_env(RATIO_VAR)
1010 .unwrap()
1011 .parse()
1012 .expect("a max compression ratio in bytes")
1013 } else {
1014 MAX_COMPRESSION_RATIO
1015 };
1016
1017 u64::max(max_unpack_size, size * max_compression_ratio as u64)
1018}
1019
1020/// Set the current [`umask`] value for the given tarball. No-op on non-Unix
1021/// platforms.
1022///
1023/// On Windows, tar only looks at user permissions and tries to set the "read
1024/// only" attribute, so no-op as well.
1025///
1026/// [`umask`]: https://man7.org/linux/man-pages/man2/umask.2.html
1027#[allow(unused_variables)]
1028fn set_mask<R: Read>(tar: &mut Archive<R>) {
1029 #[cfg(unix)]
1030 tar.set_mask(crate::util::get_umask());
1031}
1032
1033/// Unpack a tarball with zip bomb and overwrite protections.
1034fn unpack(
1035 gctx: &GlobalContext,
1036 tarball: &File,
1037 unpack_dir: &Path,
1038 include: &dyn Fn(&Path) -> bool,
1039) -> CargoResult<u64> {
1040 let mut tar = {
1041 let size_limit = max_unpack_size(gctx, tarball.metadata()?.len());
1042 let gz = GzDecoder::new(tarball);
1043 let gz = LimitErrorReader::new(gz, size_limit);
1044 let mut tar = Archive::new(gz);
1045 set_mask(&mut tar);
1046 tar
1047 };
1048 let mut bytes_written = 0;
1049 let prefix = unpack_dir.file_name().unwrap();
1050 let parent = unpack_dir.parent().unwrap();
1051 for entry in tar.entries()? {
1052 let mut entry = entry.context("failed to iterate over archive")?;
1053 let entry_path = entry
1054 .path()
1055 .context("failed to read entry path")?
1056 .into_owned();
1057
1058 if let Ok(path) = entry_path.strip_prefix(prefix) {
1059 if !include(path) {
1060 continue;
1061 }
1062 } else {
1063 // We're going to unpack this tarball into the global source
1064 // directory, but we want to make sure that it doesn't accidentally
1065 // (or maliciously) overwrite source code from other crates. Cargo
1066 // itself should never generate a tarball that hits this error, and
1067 // crates.io should also block uploads with these sorts of tarballs,
1068 // but be extra sure by adding a check here as well.
1069 anyhow::bail!(
1070 "invalid tarball downloaded, contains \
1071 a file at {entry_path:?} which isn't under {prefix:?}",
1072 )
1073 }
1074
1075 // Prevent unpacking the lockfile from the crate itself.
1076 if entry_path
1077 .file_name()
1078 .map_or(false, |p| p == PACKAGE_SOURCE_LOCK)
1079 {
1080 continue;
1081 }
1082 // Unpacking failed
1083 bytes_written += entry.size();
1084 let mut result = entry.unpack_in(parent).map_err(anyhow::Error::from);
1085 if cfg!(windows) && restricted_names::is_windows_reserved_path(&entry_path) {
1086 result = result.with_context(|| {
1087 format!(
1088 "`{}` appears to contain a reserved Windows path, \
1089 it cannot be extracted on Windows",
1090 entry_path.display()
1091 )
1092 });
1093 }
1094 result.with_context(|| format!("failed to unpack entry at `{}`", entry_path.display()))?;
1095 }
1096
1097 Ok(bytes_written)
1098}