Module std::rt::kill

Facilities related to task failure, killing, and death. Facilities related to task failure, killing, and death. Task death: asynchronous killing, linked failure, exit code propagation.

This file implements two orthogonal building-blocks for communicating failure between tasks. One is 'linked failure' or 'task killing', that is, a failing task causing other tasks to fail promptly (even those that are blocked on pipes or I/O). The other is 'exit code propagation', which affects the result observed by the parent of a task::try task that itself spawns child tasks (such as any #[test] function). In both cases the data structures live in KillHandle.

I. Task killing.

The model for killing involves two atomic flags, the "kill flag" and the "unkillable flag". Operations on the kill flag include:

Operations on the unkillable flag include:

Why do we not need acquire/release barriers on any of the kill flag swaps? This is because barriers establish orderings between accesses on different memory locations, but each kill-related operation is only a swap on a single location, so atomicity is all that matters. The exception is kill(), which does a swap on both flags in sequence. kill() needs no barriers because it does not matter if its two accesses are seen reordered on another CPU: if a killer does perform both writes, it means it saw a KILL_RUNNING in the unkillable flag, which means an unkillable task will see KILL_KILLED and fail immediately (rendering the subsequent write to the kill flag unnecessary).

II. Exit code propagation.

The basic model for exit code propagation, which is used with the "watched" spawn mode (on by default for linked spawns, off for supervised and unlinked spawns), is that a parent will wait for all its watched children to exit before reporting whether it succeeded or failed. A watching parent will only report success if it succeeded and all its children also reported success; otherwise, it will report failure. This is most useful for writing test cases:

#[test]
fn test_something_in_another_task {
    do spawn {
        assert!(collatz_conjecture_is_false());
    }
}

Here, as the child task will certainly outlive the parent task, we might miss the failure of the child when deciding whether or not the test case passed. The watched spawn mode avoids this problem.

In order to propagate exit codes from children to their parents, any 'watching' parent must wait for all of its children to exit before it can report its final exit status. We achieve this by using an UnsafeArc, using the reference counting to track how many children are still alive, and using the unwrap() operation in the parent's exit path to wait for all children to exit. The UnsafeArc referred to here is actually the KillHandle itself.

This also works transitively, as if a "middle" watched child task is itself watching a grandchild task, the "middle" task will do unwrap() on its own KillHandle (thereby waiting for the grandchild to exit) before dropping its reference to its watching parent (which will alert the parent).

While UnsafeArc::unwrap() accomplishes the synchronization, there remains the matter of reporting the exit codes themselves. This is easiest when an exiting watched task has no watched children of its own:

However, if a "middle" watched task with watched children of its own exits before its child exits, we need to ensure that the grandparent task may still see a failure from the grandchild task. While we could achieve this by having each intermediate task block on its handle, this keeps around the other resources the task was using. To be more efficient, this is accomplished via "tombstones".

A tombstone is a closure, ~fn() -> bool, which will perform any waiting necessary to collect the exit code of descendant tasks. In its environment is captured the KillHandle of whichever task created the tombstone, and perhaps also any tombstones that that task itself had, and finally also another tombstone, effectively creating a lazy-list of heap closures.

When a child wishes to exit early and leave tombstones behind for its parent, it must use a LittleLock (pthread mutex) to synchronize with any possible sibling tasks which are trying to do the same thing with the same parent. However, on the other side, when the parent is ready to pull on the tombstones, it need not use this lock, because the unwrap() serves as a barrier that ensures no children will remain with references to the handle.

The main logic for creating and assigning tombstones can be found in the function reparent_children_to() in the impl for KillHandle.

IIA. Issues with exit code propagation.

There are two known issues with the current scheme for exit code propagation.

Structs

Death

Per-task state related to task death, killing, failure, etc.

KillHandle

State shared between tasks used for task killing during linked failure.

Enums

BlockedTask

A handle to a blocked task. Usually this means having the ~Task pointer by ownership, but if the task is killable, a killer can steal it at any time.