🔬This is a nightly-only experimental API. (
stdarch_amdgpu #149988)Available on
target_arch=amdgpu only.Expand description
Platform-specific intrinsics for the amdgpu platform.
See the module documentation for more details.
Functions§
- ballot
Experimental - Returns a bitfield (
u32oru64) containing the result of its i1 argument in all active lanes, and zero in all inactive lanes. - dispatch_
id Experimental - Returns the id of the dispatch that is currently executed.
- ds_
bpermute âšExperimental - Gather data across all lanes in a wavefront.
- ds_
permute âšExperimental - Scatter data across all lanes in a wavefront.
- endpgm
Experimental - Stop execution of the wavefront.
- groupstaticsize
Experimental - Returns the size of statically allocated shared memory for this program in bytes.
- inverse_
ballot Experimental - Indexes into the
valuewith the current lane id and returns for each lane if the corresponding bit is set. - mbcnt_
hi Experimental - Masked bit count, high 32 lanes.
- mbcnt_
lo Experimental - Masked bit count, low 32 lanes.
- permâš
Experimental - Permute a 64-bit value.
- permlane16_
swap âšExperimental - Provide direct access to
v_permlane16_swap_b32instruction on supported targets. - permlane16_
u32 âšExperimental - Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
- permlane16_
var âšExperimental - Performs arbitrary gather-style operation within a row (16 contiguous lanes) of the second input operand.
- permlane32_
swap âšExperimental - Provide direct access to
v_permlane32_swap_b32instruction on supported targets. - permlane64_
u32 âšExperimental - Swap
valuebetween upper and lower 32 lanes in a wavefront. - permlanex16_
u32 âšExperimental - Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
- permlanex16_
var âšExperimental - Performs arbitrary gather-style operation across two rows (16 contiguous lanes) of the second input operand.
- readfirstlane_
u32 Experimental - Get
valuefrom the first active lane in the wavefront. - readfirstlane_
u64 Experimental - Get
valuefrom the first active lane in the wavefront. - readlane_
u32 âšExperimental - Get
valuefrom the lane at indexlanein the wavefront. - readlane_
u64 âšExperimental - Get
valuefrom the lane at indexlanein the wavefront. - s_
barrier Experimental - Synchronize all wavefronts in a workgroup.
- s_
barrier_ âšsignal Experimental - Signal a specific barrier type.
- s_
barrier_ âšsignal_ isfirst Experimental - Signal a specific barrier type.
- s_
barrier_ âšwait Experimental - Wait for a specific barrier type.
- s_
get_ âšbarrier_ state Experimental - Get the state of a specific barrier type.
- s_
get_ waveid_ in_ workgroup Experimental - Get the index of the current wavefront in the workgroup.
- s_getpc
Experimental - Returns the current process counter.
- s_
memrealtime Experimental - Measures time based on a fixed frequency.
- s_
sethalt Experimental - Stop execution of the kernel.
- s_sleep
Experimental - Sleeps for approximately
COUNT * 64cycles. - sched_
barrier âšExperimental - Prevent movement of some instruction types.
- sched_
group_ âšbarrier Experimental - Creates schedule groups with specific properties to create custom scheduling pipelines.
- update_
dpp âšExperimental - The
update_dppintrinsic represents theupdate.dppoperation in AMDGPU. It takes an old value, a source operand, a DPP control operand, a row mask, a bank mask, and a bound control. This operation is equivalent to a sequence ofv_mov_b32operations. - wave_
barrier Experimental - A barrier for only the threads within the current wavefront.
- wave_id
Experimental - Get the index of the current wavefront in the workgroup.
- wave_
reduce_ add Experimental - Performs an arithmetic add reduction on the values provided by each lane in the wavefront.
- wave_
reduce_ and Experimental - Performs a logical and reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ max Experimental - Performs an arithmetic max reduction on the signed values provided by each lane in the wavefront.
- wave_
reduce_ min Experimental - Performs an arithmetic min reduction on the signed values provided by each lane in the wavefront.
- wave_
reduce_ or Experimental - Performs a logical or reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ umax Experimental - Performs an arithmetic max reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ umin Experimental - Performs an arithmetic min reduction on the unsigned values provided by each lane in the wavefront.
- wave_
reduce_ xor Experimental - Performs a logical xor reduction on the unsigned values provided by each lane in the wavefront.
- wavefrontsize
Experimental - Returns the number of threads in a wavefront.
- workgroup_
id_ x Experimental - Returns the x coordinate of the workgroup index within the dispatch.
- workgroup_
id_ y Experimental - Returns the y coordinate of the workgroup index within the dispatch.
- workgroup_
id_ z Experimental - Returns the z coordinate of the workgroup index within the dispatch.
- workitem_
id_ x Experimental - Returns the x coordinate of the workitem index within the workgroup.
- workitem_
id_ y Experimental - Returns the y coordinate of the workitem index within the workgroup.
- workitem_
id_ z Experimental - Returns the z coordinate of the workitem index within the workgroup.
- writelane_
u32 âšExperimental - Return
valuefor the lane at indexlanein the wavefront. Returndefaultfor all other lanes. - writelane_
u64 âšExperimental - Return
valuefor the lane at indexlanein the wavefront. Returndefaultfor all other lanes.