v0 Symbol Format

The v0 mangling format was introduced in RFC 2603. It has the following properties:

  • It provides an unambiguous string encoding for everything that can end up in a binary's symbol table.
  • It encodes information about generic parameters in a reversible way.
  • The mangled symbols are decodable such that the demangled form should be easily identifiable as some concrete instance of e.g. a polymorphic function.
  • It has a consistent definition that does not rely on pretty-printing certain language constructs.
  • Symbols can be restricted to only consist of the characters A-Z, a-z, 0-9, and _. This helps ensure that it is platform-independent, where other characters might have special meaning in some context (e.g. . for MSVC DEF files). Unicode symbols are optionally supported.
  • It tries to stay efficient, avoiding unnecessarily long names, and avoiding computationally expensive operations to demangle.

The v0 format is not intended to be compatible with other mangling schemes (such as C++).

The v0 format is not presented as a stable ABI for Rust. This format is currently intended to be well-defined enough that a demangler can produce a reasonable human-readable form of the symbol. There are several implementation-defined portions that result in it not being possible to entirely predict how a given Rust entity will be encoded.

The sections below define the encoding of a v0 symbol. There is no standardized demangled form of the symbols, though suggestions are provided for how to demangle a symbol. Implementers may choose to demangle in different ways.

Extensions

This format may be extended in the future to add new tags as Rust is extended with new language items. To be forward compatible, demanglers should gracefully handle symbols that have encodings where it encounters a tag character not described in this document. For example, they may fall back to displaying the mangled symbol. The format may be extended anywhere there is a tag character, such as the type rule. The meaning of existing tags and encodings will not be changed.

Grammar notation

The format of an encoded symbol is illustrated as a context free grammar in an extended BNF-like syntax. A consolidated summary can be found in the Symbol grammar summary.

NameSyntaxExampleDescription
RuleA → B CA production.
ConcatenationwhitespaceA → B C DIndividual elements in sequence left-to-right.
Alternative|A → B | CMatches either one or the other.
Grouping()A → B (C | D) EGroups multiple elements as one.
Repetition{}A → {B}Repeats the enclosed zero or more times.
OptionoptA → Bopt CAn optional element.
LiteralmonospaceA → GA terminal matching the exact characters case-sensitive.

Symbol name

symbol-name → _R decimal-numberopt path instantiating-crateopt vendor-specific-suffixopt

A mangled symbol starts with the two characters _R which is a prefix to identify the symbol as a Rust symbol. The prefix can optionally be followed by a decimal-number which specifies the encoding version. This number is currently not used, and is never present in the current encoding. Following that is a path which encodes the path to an entity. The path is followed by an optional instantiating-crate which helps to disambiguate entities which may be instantiated multiple times in separate crates. The final part is an optional vendor-specific-suffix.

Recommended Demangling

A symbol-name should be displayed as the path. The instantiating-crate and the vendor-specific-suffix usually need not be displayed.

Example:

std::path::PathBuf::new();

The symbol for PathBuf::new in crate mycrate is:

_RNvMsr_NtCs3ssYzQotkvD_3std4pathNtB5_7PathBuf3newCs15kBYyAo9fc_7mycrate
├┘└───────────────────────┬──────────────────────┘└──────────┬─────────┘
│                         │                                  │
│                         │                                  └── instantiating-crate path "mycrate"
│                         └───────────────────────────────────── path to std::path::PathBuf::new
└─────────────────────────────────────────────────────────────── `_R` symbol prefix

Recommended demangling: <std::path::PathBuf>::new

Symbol path

path →
      crate-root
   | inherent-impl
   | trait-impl
   | trait-definition
   | nested-path
   | generic-args
   | backref

A path represents a variant of a Rust path to some entity. In addition to typical Rust path segments using identifiers, it uses extra elements to represent unnameable entities (like an impl) or generic arguments for monomorphized items.

The initial tag character can be used to determine which kind of path it represents:

TagRuleDescription
Ccrate-rootThe root of a crate path.
Minherent-implAn inherent implementation.
Xtrait-implA trait implementation.
Ytrait-definitionA trait definition.
Nnested-pathA nested path.
Igeneric-argsGeneric arguments.
BbackrefA back reference.

Path: Crate root

crate-root → C identifier

A crate-root indicates a path referring to the root of a crate's module tree. It consists of the character C followed by the crate name as an identifier.

The crate name is the name as seen from the defining crate. Since Rust supports linking multiple crates with the same name, the disambiguator is used to make the name unique across the crate graph.

Recommended Demangling

A crate-root can be displayed as the identifier such as mycrate.

Usually the disambiguator in the identifier need not be displayed, but as an alternate form the disambiguator can be shown in hex such as mycrate[ca63f166dbe9294].

Example:

fn example() {}

The symbol for example in crate mycrate is:

_RNvCs15kBYyAo9fc_7mycrate7example
    │└────┬─────┘││└──┬──┘
    │     │      ││   │
    │     │      ││   └── crate-root identifier "mycrate"
    │     │      │└────── length 7 of "mycrate"
    │     │      └─────── end of base-62-number
    │     └────────────── disambiguator for crate-root "mycrate" 0xca63f166dbe9293 + 1
    └──────────────────── crate-root

Recommended demangling: mycrate::example

Note: The compiler may re-use the crate-root form to express arbitrary unscoped, undisambiguated identifiers, such as for new basic types that have not been added to the grammar yet. To achieve that, it will emit a crate-root without an explicit disambiguator, relying on the fact that such an undisambiguated crate name cannot occur in practice. For example, the basic type f128 would be encode as C4f128. For this to have the desired effect, demanglers are expected to never render zero disambiguators of crate roots. I.e. C4f128 is expected to be displayed as f128 and not f128[0].

Path: Inherent impl

inherent-impl → M impl-path type

An inherent-impl indicates a path to an inherent implementation. It consists of the character M followed by an impl-path, which uniquely identifies the impl block the item is defined in. Following that is a type representing the Self type of the impl.

Recommended Demangling

An inherent-impl can be displayed as a qualified path segment to the type within angled brackets. The impl-path usually need not be displayed.

Example:

struct Example;
impl Example {
    fn foo() {}
}

The symbol for foo in the impl for Example is:

_RNvMs_Cs4Cv8Wi1oAIB_7mycrateNtB4_7Example3foo
    │├┘└─────────┬──────────┘└────┬──────┘
    ││           │                │
    ││           │                └── Self type "Example"
    ││           └─────────────────── path to the impl's parent "mycrate"
    │└─────────────────────────────── disambiguator 1
    └──────────────────────────────── inherent-impl

Recommended demangling: <mycrate::Example>::foo

Path: Trait impl

trait-impl → X impl-path type path

A trait-impl indicates a path to a trait implementation. It consists of the character X followed by an impl-path to the impl's parent followed by the type representing the Self type of the impl followed by a path to the trait.

Recommended Demangling

A trait-impl can be displayed as a qualified path segment using the < type as path > syntax. The impl-path usually need not be displayed.

Example:

struct Example;
trait Trait {
    fn foo();
}
impl Trait for Example {
    fn foo() {}
}

The symbol for foo in the trait impl for Example is:

_RNvXCs15kBYyAo9fc_7mycrateNtB2_7ExampleNtB2_5Trait3foo
    │└─────────┬──────────┘└─────┬─────┘└────┬────┘
    │          │                 │           │
    │          │                 │           └── path to the trait "Trait"
    │          │                 └────────────── Self type "Example"
    │          └──────────────────────────────── path to the impl's parent "mycrate"
    └─────────────────────────────────────────── trait-impl

Recommended demangling: <mycrate::Example as mycrate::Trait>::foo

Path: Impl

impl-path → disambiguatoropt path

An impl-path is a path used for inherent-impl and trait-impl to indicate the path to parent of an implementation. It consists of an optional disambiguator followed by a path. The path is the path to the parent that contains the impl. The disambiguator can be used to distinguish between multiple impls within the same parent.

Recommended Demangling

An impl-path usually need not be displayed (unless the location of the impl is desired).

Example:

struct Example;
impl Example {
    fn foo() {}
}
impl Example {
    fn bar() {}
}

The symbol for foo in the impl for Example is:

_RNvMCs7qp2U7fqm6G_7mycrateNtB2_7Example3foo
     └─────────┬──────────┘
               │
               └── path to the impl's parent crate-root "mycrate"

The symbol for bar is similar, though it has a disambiguator to indicate it is in a different impl block.

_RNvMs_Cs7qp2U7fqm6G_7mycrateNtB4_7Example3bar
     ├┘└─────────┬──────────┘
     │           │
     │           └── path to the impl's parent crate-root "mycrate"
     └────────────── disambiguator 1

Recommended demangling:

  • foo: <mycrate::Example>::foo
  • bar: <mycrate::Example>::bar

Path: Trait definition

trait-definition → Y type path

A trait-definition is a path to a trait definition. It consists of the character Y followed by the type which is the Self type of the referrer, followed by the path to the trait definition.

Recommended Demangling

A trait-definition can be displayed as a qualified path segment using the < type as path > syntax.

Example:

trait Trait {
    fn example() {}
}
struct Example;
impl Trait for Example {}

The symbol for example in the trait Trait implemented for Example is:

_RNvYNtCs15kBYyAo9fc_7mycrate7ExampleNtB4_5Trait7exampleB4_
    │└──────────────┬───────────────┘└────┬────┘
    │               │                     │
    │               │                     └── path to the trait "Trait"
    │               └──────────────────────── path to the implementing type "mycrate::Example"
    └──────────────────────────────────────── trait-definition

Recommended demangling: <mycrate::Example as mycrate::Trait>::example

Path: Nested path

nested-path → N namespace path identifier

A nested-path is a path representing an optionally named entity. It consists of the character N followed by a namespace indicating the namespace of the entity, followed by a path which is a path representing the parent of the entity, followed by an identifier of the entity.

The identifier of the entity may have a length of 0 when the entity is not named. For example, entities like closures, tuple-like struct constructors, and anonymous constants may not have a name. The identifier may still have a disambiguator unless the disambiguator is 0.

Recommended Demangling

A nested-path can be displayed by first displaying the path followed by a :: separator followed by the identifier. If the identifier is empty, then the separating :: should not be displayed.

If a namespace is specified, then extra context may be added such as:
path ::{ namespace (: identifier)opt # disambiguatoras base-10 number }

Here the namespace C may be printed as closure and S as shim. Others may be printed by their character tag. The : name portion may be skipped if the name is empty.

The disambiguator in the identifier may be displayed if a namespace is specified. In other situations, it is usually not necessary to display the disambiguator. If it is displayed, it is recommended to place it in brackets, for example [284a76a8b41a7fd3]. If the disambiguator is not present, then its value is 0 and it can always be omitted from display.

Example:

fn main() {
    let x = || {};
    let y = || {};
    x();
    y();
}

The symbol for the closure x in crate mycrate is:

_RNCNvCsgStHSCytQ6I_7mycrate4main0B3_
  ││└─────────────┬─────────────┘│
  ││              │              │
  ││              │              └── identifier with length 0
  ││              └───────────────── path to "mycrate::main"
  │└──────────────────────────────── closure namespace
  └───────────────────────────────── nested-path

The symbol for the closure y is similar, with a disambiguator:

_RNCNvCsgStHSCytQ6I_7mycrate4mains_0B3_
                                 ││
                                 │└── base-62-number 0
                                 └─── disambiguator 1 (base-62-number+1)

Recommended demangling:

  • x: mycrate::main::{closure#0}
  • y: mycrate::main::{closure#1}

Path: Generic arguments

generic-args → I path {generic-arg} E

generic-arg →
      lifetime
   | type
   | K const

A generic-args is a path representing a list of generic arguments. It consists of the character I followed by a path to the defining entity, followed by zero or more generic-args terminated by the character E.

Each generic-arg is either a lifetime (starting with the character L), a type, or the character K followed by a const representing a const argument.

Recommended Demangling

A generic-args may be printed as: path ::opt < comma-separated list of args > The :: separator may be elided for type paths (similar to Rust's rules).

Example:

fn main() {
    example([123]);
}

fn example<T, const N: usize>(x: [T; N]) {}

The symbol for the function example is:

_RINvCsgStHSCytQ6I_7mycrate7examplelKj1_EB2_
  │└──────────────┬───────────────┘││││││
  │               │                │││││└── end of generic-args
  │               │                ││││└─── end of const-data
  │               │                │││└──── const value `1`
  │               │                ││└───── const type `usize`
  │               │                │└────── const generic
  │               │                └─────── generic type i32
  │               └──────────────────────── path to "mycrate::example"
  └──────────────────────────────────────── generic-args

Recommended demangling: mycrate::example::<i32, 1>

Namespace

namespace → lower | upper

A namespace is used to segregate names into separate logical groups, allowing identical names to otherwise avoid collisions. It consists of a single character of an upper or lowercase ASCII letter. Lowercase letters are reserved for implementation-internal disambiguation categories (and demanglers should never show them). Uppercase letters are used for special namespaces which demanglers may display in a special way.

Uppercase namespaces are:

  • C — A closure.
  • S — A shim. Shims are added by the compiler in some situations where an intermediate is needed. For example, a fn() pointer to a function with the #[track_caller] attribute needs a shim to deal with the implicit caller location.

Recommended Demangling

See nested-path for recommended demangling.

Identifier

identifier → disambiguatoropt undisambiguated-identifier

undisambiguated-identifier → uopt decimal-number _opt bytes

bytes → {UTF-8 bytes}

An identifier is a named label used in a path to refer to an entity. It consists of an optional disambiguator followed by an undisambiguated-identifier.

The disambiguator is used to disambiguate identical identifiers that should not otherwise be considered the same. For example, closures have no name, so the disambiguator is the only differentiating element between two different closures in the same parent path.

The undisambiguated-identifier starts with an optional u character, which indicates that the identifier is encoded in Punycode. The next part is a decimal-number which indicates the length of the bytes.

Following the identifier size is an optional _ character which is used to separate the length value from the identifier itself. The _ is mandatory if the bytes starts with a decimal digit or _ in order to keep it unambiguous where the decimal-number ends and the bytes starts.

bytes is the identifier itself encoded in UTF-8.

Recommended Demangling

The display of an identifier can depend on its context. If it is Punycode-encoded, then it may first be decoded before being displayed.

The disambiguator may or may not be displayed; see recommendations for rules that use identifier.

Punycode identifiers

Because some environments are restricted to ASCII alphanumerics and _, Rust's Unicode identifiers may be encoded using a modified version of Punycode.

For example, the function:

mod gödel {
  mod escher {
    fn bach() {}
  }
}

would be mangled as:

_RNvNtNtCsgOH4LzxkuMq_7mycrateu8gdel_5qa6escher4bach
                              ││└───┬──┘
                              ││    │
                              ││    └── gdel_5qa translates to gödel
                              │└─────── 8 is the length
                              └──────── `u` indicates it is a Unicode identifier

Standard Punycode generates strings of the form ([[:ascii:]]+-)?[[:alnum:]]+. This is problematic because the - character (which is used to separate the ASCII part from the base-36 encoding) is not in the supported character set for symbols. For this reason, - characters in the Punycode encoding are replaced with _.

Here are some examples:

OriginalPunycodePunycode + Encoding
føøf-5gaaf_5gaa
α_ω_-ylb7e__ylb7e
铁锈n84amfn84amf
🤦fq9hfq9h
ρυστ2xaedc2xaedc

Note: It is up to the compiler to decide whether or not to encode identifiers using Punycode or not. Some platforms may have native support for UTF-8 symbols, and the compiler may decide to use the UTF-8 encoding directly. Demanglers should be prepared to support either form.

Disambiguator

disambiguator → s base-62-number

A disambiguator is used in various parts of a symbol path to uniquely identify path elements that would otherwise be identical but should not be considered the same. It starts with the character s and is followed by a base-62-number.

If the disambiguator is not specified, then its value can be assumed to be zero. Otherwise, when demangling, the value 1 should be added to the base-62-number (thus a base-62-number of zero encoded as _ has a value of 1). This allows disambiguators that are encoded sequentially to use minimal bytes.

Recommended Demangling

The disambiguator may or may not be displayed; see recommendations for rules that use disambiguator. Generally, it is recommended that zero disambiguators are never displayed unless their accompanying identifier is empty (like is the case for unnamed items such as closures). When rendering a disambiguator, it can be shortened to a length reasonable for the context, similar to how git commit hashes are rarely displayed in full.

Lifetime

lifetime → L base-62-number

A lifetime is used to encode an anonymous (numbered) lifetime, either erased or higher-ranked. It starts with the character L and is followed by a base-62-number. Index 0 is always erased. Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime bound by one of the enclosing binders.

Recommended Demangling

A lifetime may be displayed like a Rust lifetime using a single quote.

Index 0 should be displayed as '_. Index 0 should not be displayed for lifetimes in a ref-type, mut-ref-type, or dyn-trait-type.

A lifetime can be displayed by converting the De Bruijn index to a De Bruijn level (level = number of bound lifetimes - index) and selecting a unique name for each level. For example, starting with single lowercase letters such as 'a for level 0. Levels over 25 may consider printing the numeric lifetime as in '_123. See binder for more on lifetime indexes and ordering.

Example:

fn main() {
    example::<fn(&u8, &u16)>();
}

pub fn example<T>() {}

The symbol for the function example is:

_RINvCs7qp2U7fqm6G_7mycrate7exampleFG0_RL1_hRL0_tEuEB2_
                                   │└┬┘│└┬┘││└┬┘││
                                   │ │ │ │ ││ │ │└── end of input types
                                   │ │ │ │ ││ │ └─── type u16
                                   │ │ │ │ ││ └───── lifetime #1 'b
                                   │ │ │ │ │└─────── reference type
                                   │ │ │ │ └──────── type u8
                                   │ │ │ └────────── lifetime #2 'a
                                   │ │ └──────────── reference type
                                   │ └────────────── binder with 2 lifetimes
                                   └──────────────── function type

Recommended demangling: mycrate::example::<for<'a, 'b> fn(&'a u8, &'b u16)>

Const

const →
      type const-data
   | p
   | backref

const-data → nopt {hex-digit} _

hex-digitdigit | a | b | c | d | e | f

A const is used to encode a const value used in generics and types. It has the following forms:

  • A constant value encoded as a type which represents the type of the constant and const-data which is the constant value, followed by _ to terminate the const.
  • The character p which represents a placeholder.
  • A backref to a previously encoded const of the same value.

The encoding of the const-data depends on the type:

  • bool — The value false is encoded as 0_, the value true is encoded as 1_.
  • char — The Unicode scalar value of the character is encoded in hexadecimal.
  • Unsigned integers — The value is encoded in hexadecimal.
  • Signed integers — The character n is a prefix to indicate that it is negative, followed by the absolute value encoded in hexadecimal.

Recommended Demangling

A const may be displayed by the const value depending on the type.

The p placeholder should be displayed as the _ character.

For specific types:

  • b (bool) — Display as true or false.
  • c (char) — Display the character in as a Rust character (such as 'A' or '\n').
  • integers — Display the integer (either in decimal or hex).

Example:

fn main() {
    example::<0x12345678>();
}

pub fn example<const N: u64>() {}

The symbol for function example is:

_RINvCs7qp2U7fqm6G_7mycrate7exampleKy12345678_EB2_
                                   ││└───┬───┘
                                   ││    │
                                   ││    └── const-data 0x12345678
                                   │└─────── const type u64
                                   └──────── const generic arg

Recommended demangling: mycrate::example::<305419896>

Placeholders

A placeholder may occur in circumstances where a type or const value is not relevant.

Example:

pub struct Example<T, const N: usize>([T; N]);

impl<T, const N: usize> Example<T, N> {
    pub fn foo() -> &'static () {
        static EXAMPLE_STATIC: () = ();
        &EXAMPLE_STATIC
    }
}

In this example, the static EXAMPLE_STATIC would not be monomorphized by the type or const parameters T and N. Those will use the placeholder for those generic arguments. Its symbol is:

_RNvNvMCsd9PVOYlP1UU_7mycrateINtB4_7ExamplepKpE3foo14EXAMPLE_STATIC
                             │             │││
                             │             ││└── const placeholder
                             │             │└─── const generic argument
                             │             └──── type placeholder
                             └────────────────── generic-args

Recommended demangling: <mycrate::Example<_, _>>::foo::EXAMPLE_STATIC

Type

type →
      basic-type
   | array-type
   | slice-type
   | tuple-type
   | ref-type
   | mut-ref-type
   | const-ptr-type
   | mut-ptr-type
   | fn-type
   | dyn-trait-type
   | path
   | backref

A type represents a Rust type. The initial character can be used to distinguish which type is encoded. The type encodings based on the initial tag character are:

  • A basic-type is encoded as a single character:
    • ai8
    • bbool
    • cchar
    • df64
    • estr
    • ff32
    • hu8
    • iisize
    • jusize
    • li32
    • mu32
    • ni128
    • ou128
    • si16
    • tu16
    • u — unit ()
    • v — variadic ...
    • xi64
    • yu64
    • z!
    • pplaceholder _

Remaining primitives are encoded as a crate production, e.g. C4f128.

  • A — An array [T; N].

    array-typeA type const

    The tag A is followed by the type of the array followed by a const for the array size.

  • S — A slice [T].

    slice-typeS type

    The tag S is followed by the type of the slice.

  • T — A tuple (T1, T2, T3, ...).

    tuple-typeT {type} E

    The tag T is followed by one or more types indicating the type of each field, followed by a terminating E character.

    Note that a zero-length tuple (unit) is encoded with the u basic-type.

  • R — A reference &T.

    ref-typeR lifetimeopt type

    The tag R is followed by an optional lifetime followed by the type of the reference. The lifetime is not included if it has been erased.

  • Q — A mutable reference &mut T.

    mut-ref-typeQ lifetimeopt type

    The tag Q is followed by an optional lifetime followed by the type of the mutable reference. The lifetime is not included if it has been erased.

  • P — A constant raw pointer *const T.

    The tag P is followed by the type of the pointer.

    const-ptr-typeP type

  • O — A mutable raw pointer *mut T.

    mut-ptr-typeO type

    The tag O is followed by the type of the pointer.

  • F — A function pointer fn(…) -> ….

    fn-typeF fn-sig

    fn-sigbinderopt Uopt (K abi)opt {type} E type

    abi
          C
       | undisambiguated-identifier

    The tag F is followed by a fn-sig of the function signature. A fn-sig is the signature for a function pointer.

    It starts with an optional binder which represents the higher-ranked trait bounds (for<…>).

    Following that is an optional U character which is present for an unsafe function.

    Following that is an optional K character which indicates that an abi is specified. If the ABI is not specified, it is assumed to be the "Rust" ABI.

    The abi can be the letter C to indicate it is the "C" ABI. Otherwise it is an undisambiguated-identifier of the ABI string with dashes converted to underscores.

    Following that is zero or more types which indicate the input parameters of the function.

    Following that is the character E and then the type of the return value.

  • A path to a named type.

  • A backref to refer to a previously encoded type.

Recommended Demangling

A type may be displayed as the type it represents, using typical Rust syntax to represent the type.

Example:

fn main() {
    example::<[u16; 8]>();
}

pub fn example<T>() {}

The symbol for function example is:

_RINvCs7qp2U7fqm6G_7mycrate7exampleAtj8_EB2_
                                   │││├┘│
                                   ││││ └─── end of generic args
                                   │││└───── const data 8
                                   ││└────── const type usize
                                   │└─────── array element type u16
                                   └──────── array type

Recommended demangling: mycrate::example::<[u16; 8]>

Binder

binder → G base-62-number

A binder represents the number of higher-ranked trait bound lifetimes to bind. It consists of the character G followed by a base-62-number. The value 1 should be added to the base-62-number when decoding (such that the base-62-number encoding of _ is interpreted as having 1 binder).

A lifetime rule can then refer to these numbered lifetimes. The lowest indices represent the innermost lifetimes. The number of bound lifetimes is the value of base-62-number plus one.

For example, in for<'a, 'b> fn(for<'c> fn (...)), any lifetimes in ... (but not inside more binders) will observe the indices 1, 2, and 3 to refer to 'c, 'b, and 'a, respectively.

Recommended Demangling

A binder may be printed using for<…> syntax listing the lifetimes as recommended in lifetime. See lifetime for an example.

Backref

backref → B base-62-number

A backref is used to refer to a previous part of the mangled symbol. This provides a simple form of compression to reduce the length of the mangled symbol. This can help reduce the amount of work and resources needed by the compiler, linker, and loader.

It consists of the character B followed by a base-62-number. The number indicates the 0-based offset in bytes starting from just after the _R prefix of the symbol. The backref represents the corresponding element starting at that position.

backrefs always refer to a position before the backref itself.

The backref compression relies on the fact that all substitutable symbol elements have a self-terminating mangled form. Given the start position of the encoded node, the grammar guarantees that it is always unambiguous where the node ends. This is ensured by not allowing optional or repeating elements at the end of substitutable productions.

Recommended Demangling

A backref should be demangled by rendering the element that it points to. Care should be considered when handling deeply nested backrefs to avoid using too much stack.

Example:

fn main() {
    example::<Example, Example>();
}

struct Example;

pub fn example<T, U>() {}

The symbol for function example is:

_RINvCs7qp2U7fqm6G_7mycrate7exampleNtB2_7ExampleBw_EB2_
                                     │├┘        │├┘ │├┘
                                     ││         ││  ││
                                     ││         ││  │└── backref to offset 3 (crate-root)
                                     ││         ││  └─── backref for instantiating-crate path
                                     ││         │└────── backref to offset 33 (path to Example)
                                     ││         └─────── backref for second generic-arg
                                     │└───────────────── backref to offset 3 (crate-root)
                                     └────────────────── backref for first generic-arg (first segment of Example path)

Recommended demangling: mycrate::example::<mycrate::Example, mycrate::Example>

Instantiating crate

instantiating-crate → path

The instantiating-crate is an optional element of the symbol-name which can be used to indicate which crate is instantiating the symbol. It consists of a single path.

This helps differentiate symbols that would otherwise be identical, for example the monomorphization of a function from an external crate may result in a duplicate if another crate is also instantiating the same generic function with the same types.

In practice, the instantiating crate is also often the crate where the symbol is defined, so it is usually encoded as a backref to the crate-root encoded elsewhere in the symbol.

Recommended Demangling

The instantiating-crate usually need not be displayed.

Example:

std::path::Path::new("example");

The symbol for Path::new::<str> instantiated from the mycrate crate is:

_RINvMsY_NtCseXNvpPnDBDp_3std4pathNtB6_4Path3neweECs7qp2U7fqm6G_7mycrate
                                                                └──┬───┘
                                                                   │
                                                                   └── instantiating crate identifier `mycrate`

Recommended demangling: <std::path::Path>::new::<str>

Vendor-specific suffix

vendor-specific-suffix → (. | $) suffix

suffix → {byte}

The vendor-specific-suffix is an optional element at the end of the symbol-name. It consists of either a . or $ character followed by zero or more bytes. There are no restrictions on the characters following the period or dollar sign.

This suffix is added as needed by the implementation. One example where this can happen is when locally unique names need to become globally unique. LLVM can append a .llvm.<numbers> suffix during LTO to ensure a unique name, and $ can be used for thread-local data on Mach-O. In these situations it's generally fine to ignore the suffix; the suffixed name has the same semantics as the original.

Recommended Demangling

The vendor-specific-suffix usually need not be displayed.

Example:

use std::cell::RefCell;
thread_local! {
    pub static EXAMPLE: RefCell<u32> = RefCell::new(1);
}

The symbol for EXAMPLE on macOS may have the following for thread-local data:

_RNvNvNvCs7qp2U7fqm6G_7mycrate7EXAMPLE7___getit5___KEY$tlv$init
                                                      └───┬───┘
                                                          │
                                                          └── vendor-specific-suffix

Recommended demangling: mycrate::EXAMPLE::__getit::__KEY

Common rules

decimal-number
      0
   | non-zero-digit {digit}

non-zero-digit1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit0 | non-zero-digit

lowera |b |c |d |e |f |g |h |i |j |k |l |m |n |o |p |q |r |s |t |u |v |w |x |y |z

upperA | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A decimal-number is encoded as one or more digits indicating a numeric value in decimal.

The value zero is encoded as a single byte 0. Beware that there are situations where 0 may be followed by another digit that should not be decoded as part of the decimal-number. For example, a zero-length identifier within a nested-path which is in turn inside another nested-path will result in two identifiers in a row, where the first one only has the encoding of 0.

A digit is an ASCII number.

A lower and upper is an ASCII lower and uppercase letter respectively.

base-62-number

base-62-number → { digit | lower | upper } _

A base-62-number is an encoding of a numeric value. It uses ASCII numbers and lowercase and uppercase letters. The value is terminated with the _ character. If the value is 0, then the encoding is the _ character without any digits. Otherwise, one is subtracted from the value, and it is encoded with the mapping:

  • 0-9 maps to 0-9
  • a-z maps to 10 to 35
  • A-Z maps to 36 to 61

The number is repeatedly divided by 62 (with integer division round towards zero) to choose the next character in the sequence. The remainder of each division is used in the mapping to choose the next character. This is repeated until the number is 0. The final sequence of characters is then reversed.

Decoding is a similar process in reverse.

Examples:

ValueEncoding
0_
10_
11a_
62Z_
6310_
1000g7_

Symbol grammar summary

The following is a summary of all of the productions of the symbol grammar.

symbol-name_R decimal-numberopt path instantiating-crateopt vendor-specific-suffixopt

path
      crate-root
   | inherent-impl
   | trait-impl
   | trait-definition
   | nested-path
   | generic-args
   | backref

crate-rootC identifier
inherent-implM impl-path type
trait-implX impl-path type path
trait-definitionY type path
nested-pathN namespace path identifier
generic-argsI path {generic-arg} E

identifierdisambiguatoropt undisambiguated-identifier
undisambiguated-identifieruopt decimal-number _opt bytes
bytes → {UTF-8 bytes}

disambiguators base-62-number

impl-pathdisambiguatoropt path

type
      basic-type
   | array-type
   | slice-type
   | tuple-type
   | ref-type
   | mut-ref-type
   | const-ptr-type
   | mut-ptr-type
   | fn-type
   | dyn-trait-type
   | path
   | backref

basic-typelower
array-typeA type const
slice-typeS type
tuple-typeT {type} E
ref-typeR lifetimeopt type
mut-ref-typeQ lifetimeopt type
const-ptr-typeP type
mut-ptr-typeO type
fn-typeF fn-sig
dyn-trait-typeD dyn-bounds lifetime

namespacelower | upper

generic-arg
      lifetime
   | type
   | K const

lifetimeL base-62-number

const
      type const-data
   | p
   | backref

const-datanopt {hex-digit} _

hex-digitdigit | a | b | c | d | e | f

fn-sigbinderopt Uopt (K abi)opt {type} E type

abi
      C
   | undisambiguated-identifier

dyn-boundsbinderopt {dyn-trait} E
dyn-traitpath {dyn-trait-assoc-binding}
dyn-trait-assoc-bindingp undisambiguated-identifier type

binderG base-62-number

backrefB base-62-number

instantiating-cratepath

vendor-specific-suffix → (. | $) suffix
suffix → {byte}

decimal-number
      0
   | non-zero-digit {digit}

base-62-number → { digit | lower | upper } _

non-zero-digit1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit0 | non-zero-digit
lowera |b |c |d |e |f |g |h |i |j |k |l |m |n |o |p |q |r |s |t |u |v |w |x |y |z
upperA | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

Encoding of Rust entities

The following are guidelines for how Rust entities are encoded in a symbol. The compiler has some latitude in how an entity is encoded as long as the symbol is unambiguous.

  • Named functions, methods, and statics shall be represented by a path production.

  • Paths should be rooted at the inner-most entity that can act as a path root. Roots can be crate-ids, inherent impls, trait impls, and (for items within default methods) trait definitions.

  • The compiler is free to choose disambiguation indices and namespace tags from the reserved ranges as long as it ascertains identifier unambiguity.

  • Generic arguments that are equal to the default should not be encoded in order to save space.