core::arch::amdgpu

Function sched_group_barrier

pub unsafe fn sched_group_barrier<const MASK: u32, const SIZE: u32, const SYNC_ID: u32>()

🔬This is a nightly-only experimental API. (stdarch_amdgpu #149988)

Available on target_arch=amdgpu only.

Expand description

Creates schedule groups with specific properties to create custom scheduling pipelines.

The ordering between groups is enforced by the instruction scheduler. The intrinsic applies to the code that precedes the intrinsic. The intrinsic takes three values that control the behavior of the schedule groups.

mask: Classify instruction groups using the sched_barrier mask values.
size: The number of instructions that are in the group.
sync_id: Order is enforced between groups with matching values.

The mask can include multiple instruction types. It is undefined behavior to set values beyond the range of valid masks.

Combining multiple sched_group_barrier intrinsics enables an ordering of specific instruction types during instruction scheduling. For example, the following enforces a sequence of 1 VMEM read, followed by 1 VALU instruction, followed by 5 MFMA instructions.

// 1 VMEM read
sched_group_barrier::<32, 1, 0>()
// 1 VALU
sched_group_barrier::<2, 1, 0>()
// 5 MFMA
sched_group_barrier::<8, 5, 0>()

sched_group_barrier

Function sched_group_barrier Copy item path

Function sched_group_barrier