fn thin_lto(
cgcx: &CodegenContext<LlvmCodegenBackend>,
dcx: DiagCtxtHandle<'_>,
modules: Vec<(String, ThinBuffer)>,
serialized_modules: Vec<(SerializedModule<ModuleBuffer>, CString)>,
cached_modules: Vec<(SerializedModule<ModuleBuffer>, WorkProduct)>,
symbols_below_threshold: &[*const c_char],
) -> Result<(Vec<LtoModuleCodegen<LlvmCodegenBackend>>, Vec<WorkProduct>), FatalError>
Expand description
Prepare “thin” LTO to get run on these modules.
The general structure of ThinLTO is quite different from the structure of “fat” LTO above. With “fat” LTO all LLVM modules in question are merged into one giant LLVM module, and then we run more optimization passes over this big module after internalizing most symbols. Thin LTO, on the other hand, avoid this large bottleneck through more targeted optimization.
At a high level Thin LTO looks like:
- Prepare a “summary” of each LLVM module in question which describes the values inside, cost of the values, etc.
- Merge the summaries of all modules in question into one “index”
- Perform some global analysis on this index
- For each module, use the index and analysis calculated previously to perform local transformations on the module, for example inlining small functions from other modules.
- Run thin-specific optimization passes over each module, and then code generate everything at the end.
The summary for each module is intended to be quite cheap, and the global index is relatively quite cheap to create as well. As a result, the goal of ThinLTO is to reduce the bottleneck on LTO and enable LTO to be used in more situations. For example one cheap optimization is that we can parallelize all codegen modules, easily making use of all the cores on a machine.
With all that in mind, the function here is designed at specifically just
calculating the index for ThinLTO. This index will then be shared amongst
all of the LtoModuleCodegen
units returned below and destroyed once
they all go out of scope.