Module rustc_data_structures::profiling

source ·
Expand description

§Rust Compiler Self-Profiling

This module implements the basic framework for the compiler’s self- profiling support. It provides the SelfProfiler type which enables recording “events”. An event is something that starts and ends at a given point in time and has an ID and a kind attached to it. This allows for tracing the compiler’s activity.

Internally this module uses the custom tailored measureme crate for efficiently recording events to disk in a compact format that can be post-processed and analyzed by the suite of tools in the measureme project. The highest priority for the tracing framework is on incurring as little overhead as possible.

§Event Overview

Events have a few properties:

  • The event_kind designates the broad category of an event (e.g. does it correspond to the execution of a query provider or to loading something from the incr. comp. on-disk cache, etc).
  • The event_id designates the query invocation or function call it corresponds to, possibly including the query key or function arguments.
  • Each event stores the ID of the thread it was recorded on.
  • The timestamp stores beginning and end of the event, or the single point in time it occurred at for “instant” events.

§Event Filtering

Event generation can be filtered by event kind. Recording all possible events generates a lot of data, much of which is not needed for most kinds of analysis. So, in order to keep overhead as low as possible for a given use case, the SelfProfiler will only record the kinds of events that pass the filter specified as a command line argument to the compiler.

§event_id Assignment

As far as measureme is concerned, event_ids are just strings. However, it would incur too much overhead to generate and persist each event_id string at the point where the event is recorded. In order to make this more efficient measureme has two features:

  • Strings can share their content, so that re-occurring parts don’t have to be copied over and over again. One allocates a string in measureme and gets back a StringId. This StringId is then used to refer to that string. measureme strings are actually DAGs of string components so that arbitrary sharing of substrings can be done efficiently. This is useful because event_ids contain lots of redundant text like query names or def-path components.

  • StringIds can be “virtual” which means that the client picks a numeric ID according to some application-specific scheme and can later make that ID be mapped to an actual string. This is used to cheaply generate event_ids while the events actually occur, causing little timing distortion, and then later map those StringIds, in bulk, to actual event_id strings. This way the largest part of the tracing overhead is localized to one contiguous chunk of time.

How are these event_ids generated in the compiler? For things that occur infrequently (e.g. “generic activities”), we just allocate the string the first time it is used and then keep the StringId in a hash table. This is implemented in SelfProfiler::get_or_alloc_cached_string().

For queries it gets more interesting: First we need a unique numeric ID for each query invocation (the QueryInvocationId). This ID is used as the virtual StringId we use as event_id for a given event. This ID has to be available both when the query is executed and later, together with the query key, when we allocate the actual event_id strings in bulk.

We could make the compiler generate and keep track of such an ID for each query invocation but luckily we already have something that fits all the the requirements: the query’s DepNodeIndex. So we use the numeric value of the DepNodeIndex as event_id when recording the event and then, just before the query context is dropped, we walk the entire query cache (which stores the DepNodeIndex along with the query key for each invocation) and allocate the corresponding strings together with a mapping for DepNodeIndex as StringId.