Expand description
Implementation of C++11-consistent weak memory emulation using store buffers based on Dynamic Race Detection for C++ (“the paper”): https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2017/POPL.pdf
This implementation will never generate weak memory behaviours forbidden by the C++11 model, but it is incapable of producing all possible weak behaviours allowed by the model. There are certain weak behaviours observable on real hardware but not while using this.
Note that this implementation does not fully take into account of C++20’s memory model revision to SC accesses and fences introduced by P0668 (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html). This implementation is not fully correct under the revised C++20 model and may generate behaviours C++20 disallows (https://github.com/rust-lang/miri/issues/2301).
Modifications are made to the paper’s model to address C++20 changes:
- If an SC load reads from an atomic store of any ordering, then a later SC load cannot read from an earlier store in the location’s modification order. This is to prevent creating a backwards S edge from the second load to the first, as a result of C++20’s coherence-ordered before rules. (This seems to rule out behaviors that were actually permitted by the RC11 model that C++20 intended to copy (https://plv.mpi-sws.org/scfix/paper.pdf); a change was introduced when translating the math to English. According to Viktor Vafeiadis, this difference is harmless. So we stick to what the standard says, and allow fewer behaviors.)
- SC fences are treated like AcqRel RMWs to a global clock, to ensure they induce enough synchronization with the surrounding accesses. This rules out legal behavior, but it is really hard to be more precise here.
Rust follows the C++20 memory model (except for the Consume ordering and some operations not performable through C++’s
std::atomic<T>
API). It is therefore possible for this implementation to generate behaviours never observable when the
same program is compiled and run natively. Unfortunately, no literature exists at the time of writing which proposes
an implementable and C++20-compatible relaxed memory model that supports all atomic operation existing in Rust. The closest one is
A Promising Semantics for Relaxed-Memory Concurrency by Jeehoon Kang et al. (https://www.cs.tau.ac.il/~orilahav/papers/popl17.pdf)
However, this model lacks SC accesses and is therefore unusable by Miri (SC accesses are everywhere in library code).
If you find anything that proposes a relaxed memory model that is C++20-consistent, supports all orderings Rust’s atomic accesses and fences accept, and is implementable (with operational semantics), please open a GitHub issue!
One characteristic of this implementation, in contrast to some other notable operational models such as ones proposed in Taming Release-Acquire Consistency by Ori Lahav et al. (https://plv.mpi-sws.org/sra/paper.pdf) or Promising Semantics noted above, is that this implementation does not require each thread to hold an isolated view of the entire memory. Here, store buffers are per-location and shared across all threads. This is more memory efficient but does require store elements (representing writes to a location) to record information about reads, whereas in the other two models it is the other way round: reads points to the write it got its value from. Additionally, writes in our implementation do not have globally unique timestamps attached. In the other two models this timestamp is used to make sure a value in a thread’s view is not overwritten by a write that occurred earlier than the one in the existing view. In our implementation, this is detected using read information attached to store elements, as there is no data structure representing reads.
The C++ memory model is built around the notion of an ‘atomic object’, so it would be natural to attach store buffers to atomic objects. However, Rust follows LLVM in that it only has ‘atomic accesses’. Therefore Miri cannot know when and where atomic ‘objects’ are being created or destroyed, to manage its store buffers. Instead, we hence lazily create an atomic object on the first atomic write to a given region, and we destroy that object on the next non-atomic or imperfectly overlapping atomic write to that region. These lazy (de)allocations happen in memory_accessed() on non-atomic accesses, and get_or_create_store_buffer_mut() on atomic writes.
One consequence of this difference is that safe/sound Rust allows for more operations on atomic locations
than the C++20 atomic API was intended to allow, such as non-atomically accessing
a previously atomically accessed location, or accessing previously atomically accessed locations with a differently sized operation
(such as accessing the top 16 bits of an AtomicU32). These scenarios are generally undiscussed in formalizations of C++ memory model.
In Rust, these operations can only be done through a &mut AtomicFoo
reference or one derived from it, therefore these operations
can only happen after all previous accesses on the same locations. This implementation is adapted to allow these operations.
A mixed atomicity read that races with writes, or a write that races with reads or writes will still cause UBs to be thrown.
Mixed size atomic accesses must not race with any other atomic access, whether read or write, or a UB will be thrown.
You can refer to test cases in weak_memory/extra_cpp.rs and weak_memory/extra_cpp_unsafe.rs for examples of these operations.
Structs§
- Load
Info 🔒
Enums§
- Whether a load returned the latest value or not.