Module amdgpu

Module amdgpu 

Source
🔬This is a nightly-only experimental API. (stdarch_amdgpu #149988)
Available on target_arch=amdgpu only.
Expand description

Platform-specific intrinsics for the amdgpu platform.

See the module documentation for more details.

Functions§

ballotExperimental
Returns a bitfield (u32 or u64) containing the result of its i1 argument in all active lanes, and zero in all inactive lanes.
dispatch_idExperimental
Returns the id of the dispatch that is currently executed.
ds_bpermuteâš Experimental
Gather data across all lanes in a wavefront.
ds_permuteâš Experimental
Scatter data across all lanes in a wavefront.
endpgmExperimental
Stop execution of the wavefront.
groupstaticsizeExperimental
Returns the size of statically allocated shared memory for this program in bytes.
inverse_ballotExperimental
Indexes into the value with the current lane id and returns for each lane if the corresponding bit is set.
mbcnt_hiExperimental
Masked bit count, high 32 lanes.
mbcnt_loExperimental
Masked bit count, low 32 lanes.
permâš Experimental
Permute a 64-bit value.
permlane16_swapâš Experimental
Provide direct access to v_permlane16_swap_b32 instruction on supported targets.
permlane16_u32âš Experimental
Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane16_varâš Experimental
Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
permlane32_swapâš Experimental
Provide direct access to v_permlane32_swap_b32 instruction on supported targets.
permlane64_u32âš Experimental
Swap value between upper and lower 32 lanes in a wavefront.
permlanex16_u32âš Experimental
Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
permlanex16_varâš Experimental
Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
readfirstlane_u32Experimental
Get value from the first active lane in the wavefront.
readfirstlane_u64Experimental
Get value from the first active lane in the wavefront.
readlane_u32âš Experimental
Get value from the lane at index lane in the wavefront.
readlane_u64âš Experimental
Get value from the lane at index lane in the wavefront.
s_barrierExperimental
Synchronize all wavefronts in a workgroup.
s_barrier_signalâš Experimental
Signal a specific barrier type.
s_barrier_signal_isfirstâš Experimental
Signal a specific barrier type.
s_barrier_waitâš Experimental
Wait for a specific barrier type.
s_get_barrier_stateâš Experimental
Get the state of a specific barrier type.
s_get_waveid_in_workgroupExperimental
Get the index of the current wavefront in the workgroup.
s_getpcExperimental
Returns the current process counter.
s_memrealtimeExperimental
Measures time based on a fixed frequency.
s_sethaltExperimental
Stop execution of the kernel.
s_sleepExperimental
Sleeps for approximately COUNT * 64 cycles.
sched_barrierâš Experimental
Prevent movement of some instruction types.
sched_group_barrierâš Experimental
Creates schedule groups with specific properties to create custom scheduling pipelines.
update_dppâš Experimental
The update_dpp intrinsic represents the update.dpp operation in AMDGPU. It takes an old value, a source operand, a DPP control operand, a row mask, a bank mask, and a bound control. This operation is equivalent to a sequence of v_mov_b32 operations.
wave_barrierExperimental
A barrier for only the threads within the current wavefront.
wave_idExperimental
Get the index of the current wavefront in the workgroup.
wave_reduce_addExperimental
Performs an arithmetic add reduction on the values provided by each lane in the wavefront.
wave_reduce_andExperimental
Performs a logical and reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_maxExperimental
Performs an arithmetic max reduction on the signed values provided by each lane in the wavefront.
wave_reduce_minExperimental
Performs an arithmetic min reduction on the signed values provided by each lane in the wavefront.
wave_reduce_orExperimental
Performs a logical or reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_umaxExperimental
Performs an arithmetic max reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_uminExperimental
Performs an arithmetic min reduction on the unsigned values provided by each lane in the wavefront.
wave_reduce_xorExperimental
Performs a logical xor reduction on the unsigned values provided by each lane in the wavefront.
wavefrontsizeExperimental
Returns the number of threads in a wavefront.
workgroup_id_xExperimental
Returns the x coordinate of the workgroup index within the dispatch.
workgroup_id_yExperimental
Returns the y coordinate of the workgroup index within the dispatch.
workgroup_id_zExperimental
Returns the z coordinate of the workgroup index within the dispatch.
workitem_id_xExperimental
Returns the x coordinate of the workitem index within the workgroup.
workitem_id_yExperimental
Returns the y coordinate of the workitem index within the workgroup.
workitem_id_zExperimental
Returns the z coordinate of the workitem index within the workgroup.
writelane_u32âš Experimental
Return value for the lane at index lane in the wavefront. Return default for all other lanes.
writelane_u64âš Experimental
Return value for the lane at index lane in the wavefront. Return default for all other lanes.