Module cargo::sources::registry

source ·
Expand description

A Source for registry-based packages.

§What’s a Registry?

Registries are central locations where packages can be uploaded to, discovered, and searched for. The purpose of a registry is to have a location that serves as permanent storage for versions of a crate over time.

Compared to git sources (see GitSource), a registry provides many packages as well as many versions simultaneously. Git sources can also have commits deleted through rebasings where registries cannot have their versions deleted.

In Cargo, RegistryData is an abstraction over each kind of actual registry, and RegistrySource connects those implementations to Source trait. Two prominent features these abstractions provide are

  • A way to query the metadata of a package from a registry. The metadata comes from the index.
  • A way to download package contents (a.k.a source files) that are required when building the package itself.

We’ll cover each functionality later.

§Different Kinds of Registries

Cargo provides multiple kinds of registries. Each of them serves the index and package contents in a slightly different way. Namely,

  • LocalRegistry — Serves the index and package contents entirely on a local filesystem.
  • RemoteRegistry — Serves the index ahead of time from a Git repository, and package contents are downloaded as needed.
  • HttpRegistry — Serves both the index and package contents on demand over a HTTP-based registry API. This is the default starting from 1.70.0.

Each registry has its own RegistryData implementation, and can be created from either RegistrySource::local or RegistrySource::remote.

§The Index of a Registry

One of the major difficulties with a registry is that hosting so many packages may quickly run into performance problems when dealing with dependency graphs. It’s infeasible for cargo to download the entire contents of the registry just to resolve one package’s dependencies, for example. As a result, cargo needs some efficient method of querying what packages are available on a registry, what versions are available, and what the dependencies for each version is.

To solve the problem, a registry must provide an index of package metadata. The index of a registry is essentially an easily query-able version of the registry’s database for a list of versions of a package as well as a list of dependencies for each version. The exact format of the index is described later.

See the index module for topics about the management, parsing, caching, and versioning for the on-disk index.

§The Format of The Index

The index is a store for the list of versions for all packages known, so its format on disk is optimized slightly to ensure that ls registry doesn’t produce a list of all packages ever known. The index also wants to ensure that there’s not a million files which may actually end up hitting filesystem limits at some point. To this end, a few decisions were made about the format of the registry:

  1. Each crate will have one file corresponding to it. Each version for a crate will just be a line in this file (see IndexPackage for its representation).
  2. There will be two tiers of directories for crate names, under which crates corresponding to those tiers will be located. (See cargo_util::registry::make_dep_path for the implementation of this layout hierarchy.)

As an example, this is an example hierarchy of an index:

.
├── 3
│   └── u
│       └── url
├── bz
│   └── ip
│       └── bzip2
├── config.json
├── en
│   └── co
│       └── encoding
└── li
    ├── bg
    │   └── libgit2
    └── nk
        └── link-config

The root of the index contains a config.json file with a few entries corresponding to the registry (see RegistryConfig below).

Otherwise, there are three numbered directories (1, 2, 3) for crates with names 1, 2, and 3 characters in length. The 1/2 directories simply have the crate files underneath them, while the 3 directory is sharded by the first letter of the crate name.

Otherwise the top-level directory contains many two-letter directory names, each of which has many sub-folders with two letters. At the end of all these are the actual crate files themselves.

The purpose of this layout is to hopefully cut down on ls sizes as well as efficient lookup based on the crate name itself.

See The Cargo Book: Registry Index for the public interface on the index format.

§The Index Files

Each file in the index is the history of one crate over time. Each line in the file corresponds to one version of a crate, stored in JSON format (see the IndexPackage structure).

As new versions are published, new lines are appended to this file. The only modifications to this file that should happen over time are yanks of a particular version.

§Downloading Packages

The purpose of the index was to provide an efficient method to resolve the dependency graph for a package. After resolution has been performed, we need to download the contents of packages so we can read the full manifest and build the source code.

To accomplish this, RegistryData::download will “make” an HTTP request per-package requested to download tarballs into a local cache. These tarballs will then be unpacked into a destination folder.

Note that because versions uploaded to the registry are frozen forever that the HTTP download and unpacking can all be skipped if the version has already been downloaded and unpacked. This caching allows us to only download a package when absolutely necessary.

§Filesystem Hierarchy

Overall, the $HOME/.cargo looks like this when talking about the registry (remote registries, specifically):

# A folder under which all registry metadata is hosted (similar to
# $HOME/.cargo/git)
$HOME/.cargo/registry/

    # For each registry that cargo knows about (keyed by hostname + hash)
    # there is a folder which is the checked out version of the index for
    # the registry in this location. Note that this is done so cargo can
    # support multiple registries simultaneously
    index/
        registry1-<hash>/
        registry2-<hash>/
        ...

    # This folder is a cache for all downloaded tarballs (`.crate` file)
    # from a registry. Once downloaded and verified, a tarball never changes.
    cache/
        registry1-<hash>/<pkg>-<version>.crate
        ...

    # Location in which all tarballs are unpacked. Each tarball is known to
    # be frozen after downloading, so transitively this folder is also
    # frozen once its unpacked (it's never unpacked again)
    # CAVEAT: They are not read-only. See rust-lang/cargo#9455.
    src/
        registry1-<hash>/<pkg>-<version>/...
        ...

Modules§

Structs§

Enums§

  • A parsed representation of a summary from the index. This is usually parsed from a line from a raw index file, or a JSON blob from on-disk index cache.
  • Result from loading data from a registry.
  • The status of RegistryData::download which indicates if a .crate file has already been downloaded, or if not then the URL to download.

Constants§

Traits§

  • An abstract interface to handle both a local and remote registry.

Functions§

  • Get the maximum upack size that Cargo permits based on a given size of your compressed file.
  • set_mask 🔒
    Set the current umask value for the given tarball. No-op on non-Unix platforms.
  • short_name 🔒
    Generates a unique name for SourceId to have a unique path to put their index files.