Module amdgpu

Source

🔬This is a nightly-only experimental API. (stdarch_amdgpu #149988)

Available on target_arch=amdgpu only.

Expand description

Platform-specific intrinsics for the amdgpu platform.

See the module documentation for more details.

Functions§

ballotExperimental: Returns a bitfield (u32 or u64) containing the result of its i1 argument in all active lanes, and zero in all inactive lanes.
dispatch_idExperimental: Returns the id of the dispatch that is currently executed.
ds_bpermute^⚠Experimental: Gather data across all lanes in a wavefront.
ds_permute^⚠Experimental: Scatter data across all lanes in a wavefront.
endpgmExperimental: Stop execution of the wavefront.
groupstaticsizeExperimental: Returns the size of statically allocated shared memory for this program in bytes.
inverse_ballotExperimental: Indexes into the value with the current lane id and returns for each lane if the corresponding bit is set.
mbcnt_hiExperimental: Masked bit count, high 32 lanes.
mbcnt_loExperimental: Masked bit count, low 32 lanes.
perm^⚠Experimental: Permute a 64-bit value.
permlane16_swap^⚠Experimental: Provide direct access to v_permlane16_swap_b32 instruction on supported targets.
permlane16_u32^⚠Experimental: Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane16_var^⚠Experimental: Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane32_swap^⚠Experimental: Provide direct access to v_permlane32_swap_b32 instruction on supported targets.
permlane64_u32^⚠Experimental: Swap value between upper and lower 32 lanes in a wavefront.
permlanex16_u32^⚠Experimental: Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
permlanex16_var^⚠Experimental: Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
readfirstlane_u32Experimental: Get value from the first active lane in the wavefront.
readfirstlane_u64Experimental: Get value from the first active lane in the wavefront.
readlane_u32^⚠Experimental: Get value from the lane at index lane in the wavefront.
readlane_u64^⚠Experimental: Get value from the lane at index lane in the wavefront.
s_barrierExperimental: Synchronize all wavefronts in a workgroup.
s_barrier_signal^⚠Experimental: Signal a specific barrier type.
s_barrier_signal_isfirst^⚠Experimental: Signal a specific barrier type.
s_barrier_wait^⚠Experimental: Wait for a specific barrier type.
s_get_barrier_state^⚠Experimental: Get the state of a specific barrier type.
s_get_waveid_in_workgroupExperimental: Get the index of the current wavefront in the workgroup.
s_getpcExperimental: Returns the current process counter.
s_memrealtimeExperimental: Measures time based on a fixed frequency.
s_sethaltExperimental: Stop execution of the kernel.
s_sleepExperimental: Sleeps for approximately COUNT * 64 cycles.
sched_barrier^⚠Experimental: Prevent movement of some instruction types.
sched_group_barrier^⚠Experimental: Creates schedule groups with specific properties to create custom scheduling pipelines.
update_dpp^⚠Experimental: The update_dpp intrinsic represents the update.dpp operation in AMDGPU. It takes an old value, a source operand, a DPP control operand, a row mask, a bank mask, and a bound control. This operation is equivalent to a sequence of v_mov_b32 operations.
wave_barrierExperimental: A barrier for only the threads within the current wavefront.
wave_idExperimental: Get the index of the current wavefront in the workgroup.
wave_reduce_addExperimental: Performs an arithmetic add reduction on the values provided by each lane in the wavefront.
wave_reduce_andExperimental: Performs a logical and reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_maxExperimental: Performs an arithmetic max reduction on the signed values provided by each lane in the wavefront.
wave_reduce_minExperimental: Performs an arithmetic min reduction on the signed values provided by each lane in the wavefront.
wave_reduce_orExperimental: Performs a logical or reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_umaxExperimental: Performs an arithmetic max reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_uminExperimental: Performs an arithmetic min reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_xorExperimental: Performs a logical xor reduction on the unsigned values provided by each lane in the wavefront.
wavefrontsizeExperimental: Returns the number of threads in a wavefront.
workgroup_id_xExperimental: Returns the x coordinate of the workgroup index within the dispatch.
workgroup_id_yExperimental: Returns the y coordinate of the workgroup index within the dispatch.
workgroup_id_zExperimental: Returns the z coordinate of the workgroup index within the dispatch.
workitem_id_xExperimental: Returns the x coordinate of the workitem index within the workgroup.
workitem_id_yExperimental: Returns the y coordinate of the workitem index within the workgroup.
workitem_id_zExperimental: Returns the z coordinate of the workitem index within the workgroup.
writelane_u32^⚠Experimental: Return value for the lane at index lane in the wavefront. Return default for all other lanes.
writelane_u64^⚠Experimental: Return value for the lane at index lane in the wavefront. Return default for all other lanes.

Module amdgpu

Module amdgpu Copy item path

Functions§

Module amdgpu