Available on x86 only.
Expand description
Platform-specific intrinsics for the x86 platform.
See the module documentation for more details.
Structs§
- CpuidResult 
- Result of the cpuidinstruction.
- __m128
- 128-bit wide set of four f32types, x86-specific
- __m256
- 256-bit wide set of eight f32types, x86-specific
- __m512
- 512-bit wide set of sixteen f32types, x86-specific
- __m128bh 
- 128-bit wide set of eight u16types, x86-specific
- __m128d
- 128-bit wide set of two f64types, x86-specific
- __m128i
- 128-bit wide integer vector type, x86-specific
- __m256bh 
- 256-bit wide set of 16 u16types, x86-specific
- __m256d
- 256-bit wide set of four f64types, x86-specific
- __m256i
- 256-bit wide integer vector type, x86-specific
- __m512bh 
- 512-bit wide set of 32 u16types, x86-specific
- __m512d
- 512-bit wide set of eight f64types, x86-specific
- __m512i
- 512-bit wide integer vector type, x86-specific
- __m128hExperimental 
- 128-bit wide set of 8 f16types, x86-specific
- __m256hExperimental 
- 256-bit wide set of 16 f16types, x86-specific
- __m512hExperimental 
- 512-bit wide set of 32 f16types, x86-specific
- bf16Experimental 
- The BFloat16 type used in AVX-512 intrinsics.
Constants§
- _CMP_EQ_ OQ 
- Equal (ordered, non-signaling)
- _CMP_EQ_ OS 
- Equal (ordered, signaling)
- _CMP_EQ_ UQ 
- Equal (unordered, non-signaling)
- _CMP_EQ_ US 
- Equal (unordered, signaling)
- _CMP_FALSE_ OQ 
- False (ordered, non-signaling)
- _CMP_FALSE_ OS 
- False (ordered, signaling)
- _CMP_GE_ OQ 
- Greater-than-or-equal (ordered, non-signaling)
- _CMP_GE_ OS 
- Greater-than-or-equal (ordered, signaling)
- _CMP_GT_ OQ 
- Greater-than (ordered, non-signaling)
- _CMP_GT_ OS 
- Greater-than (ordered, signaling)
- _CMP_LE_ OQ 
- Less-than-or-equal (ordered, non-signaling)
- _CMP_LE_ OS 
- Less-than-or-equal (ordered, signaling)
- _CMP_LT_ OQ 
- Less-than (ordered, non-signaling)
- _CMP_LT_ OS 
- Less-than (ordered, signaling)
- _CMP_NEQ_ OQ 
- Not-equal (ordered, non-signaling)
- _CMP_NEQ_ OS 
- Not-equal (ordered, signaling)
- _CMP_NEQ_ UQ 
- Not-equal (unordered, non-signaling)
- _CMP_NEQ_ US 
- Not-equal (unordered, signaling)
- _CMP_NGE_ UQ 
- Not-greater-than-or-equal (unordered, non-signaling)
- _CMP_NGE_ US 
- Not-greater-than-or-equal (unordered, signaling)
- _CMP_NGT_ UQ 
- Not-greater-than (unordered, non-signaling)
- _CMP_NGT_ US 
- Not-greater-than (unordered, signaling)
- _CMP_NLE_ UQ 
- Not-less-than-or-equal (unordered, non-signaling)
- _CMP_NLE_ US 
- Not-less-than-or-equal (unordered, signaling)
- _CMP_NLT_ UQ 
- Not-less-than (unordered, non-signaling)
- _CMP_NLT_ US 
- Not-less-than (unordered, signaling)
- _CMP_ORD_ Q 
- Ordered (non-signaling)
- _CMP_ORD_ S 
- Ordered (signaling)
- _CMP_TRUE_ UQ 
- True (unordered, non-signaling)
- _CMP_TRUE_ US 
- True (unordered, signaling)
- _CMP_UNORD_ Q 
- Unordered (non-signaling)
- _CMP_UNORD_ S 
- Unordered (signaling)
- _MM_CMPINT_ EQ 
- Equal
- _MM_CMPINT_ FALSE 
- False
- _MM_CMPINT_ LE 
- Less-than-or-equal
- _MM_CMPINT_ LT 
- Less-than
- _MM_CMPINT_ NE 
- Not-equal
- _MM_CMPINT_ NLE 
- Not less-than-or-equal
- _MM_CMPINT_ NLT 
- Not less-than
- _MM_CMPINT_ TRUE 
- True
- _MM_EXCEPT_ DENORM 
- See _mm_setcsr
- _MM_EXCEPT_ DIV_ ZERO 
- See _mm_setcsr
- _MM_EXCEPT_ INEXACT 
- See _mm_setcsr
- _MM_EXCEPT_ INVALID 
- See _mm_setcsr
- _MM_EXCEPT_ MASK 
- See _MM_GET_EXCEPTION_STATE
- _MM_EXCEPT_ OVERFLOW 
- See _mm_setcsr
- _MM_EXCEPT_ UNDERFLOW 
- See _mm_setcsr
- _MM_FLUSH_ ZERO_ MASK 
- See _MM_GET_FLUSH_ZERO_MODE
- _MM_FLUSH_ ZERO_ OFF 
- See _mm_setcsr
- _MM_FLUSH_ ZERO_ ON 
- See _mm_setcsr
- _MM_FROUND_ CEIL 
- round up and do not suppress exceptions
- _MM_FROUND_ CUR_ DIRECTION 
- use MXCSR.RC; see vendor::_MM_SET_ROUNDING_MODE
- _MM_FROUND_ FLOOR 
- round down and do not suppress exceptions
- _MM_FROUND_ NEARBYINT 
- use MXCSR.RC and suppress exceptions; see vendor::_MM_SET_ROUNDING_MODE
- _MM_FROUND_ NINT 
- round to nearest and do not suppress exceptions
- _MM_FROUND_ NO_ EXC 
- suppress exceptions
- _MM_FROUND_ RAISE_ EXC 
- do not suppress exceptions
- _MM_FROUND_ RINT 
- use MXCSR.RC and do not suppress exceptions; see
vendor::_MM_SET_ROUNDING_MODE
- _MM_FROUND_ TO_ NEAREST_ INT 
- round to nearest
- _MM_FROUND_ TO_ NEG_ INF 
- round down
- _MM_FROUND_ TO_ POS_ INF 
- round up
- _MM_FROUND_ TO_ ZERO 
- truncate
- _MM_FROUND_ TRUNC 
- truncate and do not suppress exceptions
- _MM_HINT_ ET0 
- See _mm_prefetch.
- _MM_HINT_ ET1 
- See _mm_prefetch.
- _MM_HINT_ NTA 
- See _mm_prefetch.
- _MM_HINT_ T0 
- See _mm_prefetch.
- _MM_HINT_ T1 
- See _mm_prefetch.
- _MM_HINT_ T2 
- See _mm_prefetch.
- _MM_MANT_ NORM_ 1_ 2 
- interval [1, 2)
- _MM_MANT_ NORM_ P5_ 1 
- interval [0.5, 1)
- _MM_MANT_ NORM_ P5_ 2 
- interval [0.5, 2)
- _MM_MANT_ NORM_ P75_ 1P5 
- interval [0.75, 1.5)
- _MM_MANT_ SIGN_ NAN 
- DEST = NaN if sign(SRC) = 1
- _MM_MANT_ SIGN_ SRC 
- sign = sign(SRC)
- _MM_MANT_ SIGN_ ZERO 
- sign = 0
- _MM_MASK_ DENORM 
- See _mm_setcsr
- _MM_MASK_ DIV_ ZERO 
- See _mm_setcsr
- _MM_MASK_ INEXACT 
- See _mm_setcsr
- _MM_MASK_ INVALID 
- See _mm_setcsr
- _MM_MASK_ MASK 
- See _MM_GET_EXCEPTION_MASK
- _MM_MASK_ OVERFLOW 
- See _mm_setcsr
- _MM_MASK_ UNDERFLOW 
- See _mm_setcsr
- _MM_PERM_ AAAA 
- _MM_PERM_ AAAB 
- _MM_PERM_ AAAC 
- _MM_PERM_ AAAD 
- _MM_PERM_ AABA 
- _MM_PERM_ AABB 
- _MM_PERM_ AABC 
- _MM_PERM_ AABD 
- _MM_PERM_ AACA 
- _MM_PERM_ AACB 
- _MM_PERM_ AACC 
- _MM_PERM_ AACD 
- _MM_PERM_ AADA 
- _MM_PERM_ AADB 
- _MM_PERM_ AADC 
- _MM_PERM_ AADD 
- _MM_PERM_ ABAA 
- _MM_PERM_ ABAB 
- _MM_PERM_ ABAC 
- _MM_PERM_ ABAD 
- _MM_PERM_ ABBA 
- _MM_PERM_ ABBB 
- _MM_PERM_ ABBC 
- _MM_PERM_ ABBD 
- _MM_PERM_ ABCA 
- _MM_PERM_ ABCB 
- _MM_PERM_ ABCC 
- _MM_PERM_ ABCD 
- _MM_PERM_ ABDA 
- _MM_PERM_ ABDB 
- _MM_PERM_ ABDC 
- _MM_PERM_ ABDD 
- _MM_PERM_ ACAA 
- _MM_PERM_ ACAB 
- _MM_PERM_ ACAC 
- _MM_PERM_ ACAD 
- _MM_PERM_ ACBA 
- _MM_PERM_ ACBB 
- _MM_PERM_ ACBC 
- _MM_PERM_ ACBD 
- _MM_PERM_ ACCA 
- _MM_PERM_ ACCB 
- _MM_PERM_ ACCC 
- _MM_PERM_ ACCD 
- _MM_PERM_ ACDA 
- _MM_PERM_ ACDB 
- _MM_PERM_ ACDC 
- _MM_PERM_ ACDD 
- _MM_PERM_ ADAA 
- _MM_PERM_ ADAB 
- _MM_PERM_ ADAC 
- _MM_PERM_ ADAD 
- _MM_PERM_ ADBA 
- _MM_PERM_ ADBB 
- _MM_PERM_ ADBC 
- _MM_PERM_ ADBD 
- _MM_PERM_ ADCA 
- _MM_PERM_ ADCB 
- _MM_PERM_ ADCC 
- _MM_PERM_ ADCD 
- _MM_PERM_ ADDA 
- _MM_PERM_ ADDB 
- _MM_PERM_ ADDC 
- _MM_PERM_ ADDD 
- _MM_PERM_ BAAA 
- _MM_PERM_ BAAB 
- _MM_PERM_ BAAC 
- _MM_PERM_ BAAD 
- _MM_PERM_ BABA 
- _MM_PERM_ BABB 
- _MM_PERM_ BABC 
- _MM_PERM_ BABD 
- _MM_PERM_ BACA 
- _MM_PERM_ BACB 
- _MM_PERM_ BACC 
- _MM_PERM_ BACD 
- _MM_PERM_ BADA 
- _MM_PERM_ BADB 
- _MM_PERM_ BADC 
- _MM_PERM_ BADD 
- _MM_PERM_ BBAA 
- _MM_PERM_ BBAB 
- _MM_PERM_ BBAC 
- _MM_PERM_ BBAD 
- _MM_PERM_ BBBA 
- _MM_PERM_ BBBB 
- _MM_PERM_ BBBC 
- _MM_PERM_ BBBD 
- _MM_PERM_ BBCA 
- _MM_PERM_ BBCB 
- _MM_PERM_ BBCC 
- _MM_PERM_ BBCD 
- _MM_PERM_ BBDA 
- _MM_PERM_ BBDB 
- _MM_PERM_ BBDC 
- _MM_PERM_ BBDD 
- _MM_PERM_ BCAA 
- _MM_PERM_ BCAB 
- _MM_PERM_ BCAC 
- _MM_PERM_ BCAD 
- _MM_PERM_ BCBA 
- _MM_PERM_ BCBB 
- _MM_PERM_ BCBC 
- _MM_PERM_ BCBD 
- _MM_PERM_ BCCA 
- _MM_PERM_ BCCB 
- _MM_PERM_ BCCC 
- _MM_PERM_ BCCD 
- _MM_PERM_ BCDA 
- _MM_PERM_ BCDB 
- _MM_PERM_ BCDC 
- _MM_PERM_ BCDD 
- _MM_PERM_ BDAA 
- _MM_PERM_ BDAB 
- _MM_PERM_ BDAC 
- _MM_PERM_ BDAD 
- _MM_PERM_ BDBA 
- _MM_PERM_ BDBB 
- _MM_PERM_ BDBC 
- _MM_PERM_ BDBD 
- _MM_PERM_ BDCA 
- _MM_PERM_ BDCB 
- _MM_PERM_ BDCC 
- _MM_PERM_ BDCD 
- _MM_PERM_ BDDA 
- _MM_PERM_ BDDB 
- _MM_PERM_ BDDC 
- _MM_PERM_ BDDD 
- _MM_PERM_ CAAA 
- _MM_PERM_ CAAB 
- _MM_PERM_ CAAC 
- _MM_PERM_ CAAD 
- _MM_PERM_ CABA 
- _MM_PERM_ CABB 
- _MM_PERM_ CABC 
- _MM_PERM_ CABD 
- _MM_PERM_ CACA 
- _MM_PERM_ CACB 
- _MM_PERM_ CACC 
- _MM_PERM_ CACD 
- _MM_PERM_ CADA 
- _MM_PERM_ CADB 
- _MM_PERM_ CADC 
- _MM_PERM_ CADD 
- _MM_PERM_ CBAA 
- _MM_PERM_ CBAB 
- _MM_PERM_ CBAC 
- _MM_PERM_ CBAD 
- _MM_PERM_ CBBA 
- _MM_PERM_ CBBB 
- _MM_PERM_ CBBC 
- _MM_PERM_ CBBD 
- _MM_PERM_ CBCA 
- _MM_PERM_ CBCB 
- _MM_PERM_ CBCC 
- _MM_PERM_ CBCD 
- _MM_PERM_ CBDA 
- _MM_PERM_ CBDB 
- _MM_PERM_ CBDC 
- _MM_PERM_ CBDD 
- _MM_PERM_ CCAA 
- _MM_PERM_ CCAB 
- _MM_PERM_ CCAC 
- _MM_PERM_ CCAD 
- _MM_PERM_ CCBA 
- _MM_PERM_ CCBB 
- _MM_PERM_ CCBC 
- _MM_PERM_ CCBD 
- _MM_PERM_ CCCA 
- _MM_PERM_ CCCB 
- _MM_PERM_ CCCC 
- _MM_PERM_ CCCD 
- _MM_PERM_ CCDA 
- _MM_PERM_ CCDB 
- _MM_PERM_ CCDC 
- _MM_PERM_ CCDD 
- _MM_PERM_ CDAA 
- _MM_PERM_ CDAB 
- _MM_PERM_ CDAC 
- _MM_PERM_ CDAD 
- _MM_PERM_ CDBA 
- _MM_PERM_ CDBB 
- _MM_PERM_ CDBC 
- _MM_PERM_ CDBD 
- _MM_PERM_ CDCA 
- _MM_PERM_ CDCB 
- _MM_PERM_ CDCC 
- _MM_PERM_ CDCD 
- _MM_PERM_ CDDA 
- _MM_PERM_ CDDB 
- _MM_PERM_ CDDC 
- _MM_PERM_ CDDD 
- _MM_PERM_ DAAA 
- _MM_PERM_ DAAB 
- _MM_PERM_ DAAC 
- _MM_PERM_ DAAD 
- _MM_PERM_ DABA 
- _MM_PERM_ DABB 
- _MM_PERM_ DABC 
- _MM_PERM_ DABD 
- _MM_PERM_ DACA 
- _MM_PERM_ DACB 
- _MM_PERM_ DACC 
- _MM_PERM_ DACD 
- _MM_PERM_ DADA 
- _MM_PERM_ DADB 
- _MM_PERM_ DADC 
- _MM_PERM_ DADD 
- _MM_PERM_ DBAA 
- _MM_PERM_ DBAB 
- _MM_PERM_ DBAC 
- _MM_PERM_ DBAD 
- _MM_PERM_ DBBA 
- _MM_PERM_ DBBB 
- _MM_PERM_ DBBC 
- _MM_PERM_ DBBD 
- _MM_PERM_ DBCA 
- _MM_PERM_ DBCB 
- _MM_PERM_ DBCC 
- _MM_PERM_ DBCD 
- _MM_PERM_ DBDA 
- _MM_PERM_ DBDB 
- _MM_PERM_ DBDC 
- _MM_PERM_ DBDD 
- _MM_PERM_ DCAA 
- _MM_PERM_ DCAB 
- _MM_PERM_ DCAC 
- _MM_PERM_ DCAD 
- _MM_PERM_ DCBA 
- _MM_PERM_ DCBB 
- _MM_PERM_ DCBC 
- _MM_PERM_ DCBD 
- _MM_PERM_ DCCA 
- _MM_PERM_ DCCB 
- _MM_PERM_ DCCC 
- _MM_PERM_ DCCD 
- _MM_PERM_ DCDA 
- _MM_PERM_ DCDB 
- _MM_PERM_ DCDC 
- _MM_PERM_ DCDD 
- _MM_PERM_ DDAA 
- _MM_PERM_ DDAB 
- _MM_PERM_ DDAC 
- _MM_PERM_ DDAD 
- _MM_PERM_ DDBA 
- _MM_PERM_ DDBB 
- _MM_PERM_ DDBC 
- _MM_PERM_ DDBD 
- _MM_PERM_ DDCA 
- _MM_PERM_ DDCB 
- _MM_PERM_ DDCC 
- _MM_PERM_ DDCD 
- _MM_PERM_ DDDA 
- _MM_PERM_ DDDB 
- _MM_PERM_ DDDC 
- _MM_PERM_ DDDD 
- _MM_ROUND_ DOWN 
- See _mm_setcsr
- _MM_ROUND_ MASK 
- See _MM_GET_ROUNDING_MODE
- _MM_ROUND_ NEAREST 
- See _mm_setcsr
- _MM_ROUND_ TOWARD_ ZERO 
- See _mm_setcsr
- _MM_ROUND_ UP 
- See _mm_setcsr
- _SIDD_BIT_ MASK 
- Mask only: return the bit mask
- _SIDD_CMP_ EQUAL_ ANY 
- For each character in a, find if it is inb(Default)
- _SIDD_CMP_ EQUAL_ EACH 
- The strings defined by aandbare equal
- _SIDD_CMP_ EQUAL_ ORDERED 
- Search for the defined substring in the target
- _SIDD_CMP_ RANGES 
- For each character in a, determine ifb[0] <= c <= b[1] or b[1] <= c <= b[2]...
- _SIDD_LEAST_ SIGNIFICANT 
- Index only: return the least significant bit (Default)
- _SIDD_MASKED_ NEGATIVE_ POLARITY 
- Negates results only before the end of the string
- _SIDD_MASKED_ POSITIVE_ POLARITY 
- Do not negate results before the end of the string
- _SIDD_MOST_ SIGNIFICANT 
- Index only: return the most significant bit
- _SIDD_NEGATIVE_ POLARITY 
- Negates results
- _SIDD_POSITIVE_ POLARITY 
- Do not negate results (Default)
- _SIDD_SBYTE_ OPS 
- String contains signed 8-bit characters
- _SIDD_SWORD_ OPS 
- String contains unsigned 16-bit characters
- _SIDD_UBYTE_ OPS 
- String contains unsigned 8-bit characters (Default)
- _SIDD_UNIT_ MASK 
- Mask only: return the byte mask
- _SIDD_UWORD_ OPS 
- String contains unsigned 16-bit characters
- _XCR_XFEATURE_ ENABLED_ MASK 
- XFEATURE_ENABLED_MASKfor- XCR
- _XABORT_CAPACITY Experimental 
- Transaction abort due to the transaction using too much memory.
- _XABORT_CONFLICT Experimental 
- Transaction abort due to a memory conflict with another thread.
- _XABORT_DEBUG Experimental 
- Transaction abort due to a debug trap.
- _XABORT_EXPLICIT Experimental 
- Transaction explicitly aborted with xabort. The parameter passed to xabort is available with
_xabort_code(status).
- _XABORT_NESTED Experimental 
- Transaction abort in a inner nested transaction.
- _XABORT_RETRY Experimental 
- Transaction retry is possible.
- _XBEGIN_STARTED Experimental 
- Transaction successfully started.
Functions§
- _MM_GET_ ⚠EXCEPTION_ MASK Deprecated sse
- See _mm_setcsr
- _MM_GET_ ⚠EXCEPTION_ STATE Deprecated sse
- See _mm_setcsr
- _MM_GET_ ⚠FLUSH_ ZERO_ MODE Deprecated sse
- See _mm_setcsr
- _MM_GET_ ⚠ROUNDING_ MODE Deprecated sse
- See _mm_setcsr
- _MM_SET_ ⚠EXCEPTION_ MASK Deprecated sse
- See _mm_setcsr
- _MM_SET_ ⚠EXCEPTION_ STATE Deprecated sse
- See _mm_setcsr
- _MM_SET_ ⚠FLUSH_ ZERO_ MODE Deprecated sse
- See _mm_setcsr
- _MM_SET_ ⚠ROUNDING_ MODE Deprecated sse
- See _mm_setcsr
- _MM_TRANSPOS E4_ PS sse
- Transpose the 4x4 matrix formed by 4 rows of __m128 in place.
- __cpuid⚠
- See __cpuid_count.
- __cpuid_ ⚠count 
- Returns the result of the cpuidinstruction for a givenleaf(EAX) andsub_leaf(ECX).
- __get_ ⚠cpuid_ max 
- Returns the highest-supported leaf(EAX) and sub-leaf (ECX)cpuidvalues.
- __rdtscp ⚠
- Reads the current value of the processor’s time-stamp counter and
the IA32_TSC_AUX MSR.
- _addcarry_u32 ⚠
- Adds unsigned 32-bit integers aandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 32-bit result inout, and the carry-out is returned (carry or overflow flag).
- _addcarryx_u32 ⚠adx
- Adds unsigned 32-bit integers aandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 32-bit result inout, and the carry-out is returned (carry or overflow flag).
- _andn_u32 bmi1
- Bitwise logical ANDof invertedawithb.
- _bextr2_u32 bmi1
- Extracts bits of aspecified bycontrolinto the least significant bits of the result.
- _bextr_u32 bmi1
- Extracts bits in range [start,start+length) fromainto the least significant bits of the result.
- _bextri_u32 ⚠tbm
- Extracts bits of aspecified bycontrolinto the least significant bits of the result.
- _bittest⚠
- Returns the bit in position bof the memory addressed byp.
- _bittestandcomplement⚠
- Returns the bit in position bof the memory addressed byp, then inverts that bit.
- _bittestandreset⚠
- Returns the bit in position bof the memory addressed byp, then resets that bit to0.
- _bittestandset⚠
- Returns the bit in position bof the memory addressed byp, then sets the bit to1.
- _blcfill_u32 ⚠tbm
- Clears all bits below the least significant zero bit of x.
- _blci_u32 ⚠tbm
- Sets all bits of xto 1 except for the least significant zero bit.
- _blcic_u32 ⚠tbm
- Sets the least significant zero bit of xand clears all other bits.
- _blcmsk_u32 ⚠tbm
- Sets the least significant zero bit of xand clears all bits above that bit.
- _blcs_u32 ⚠tbm
- Sets the least significant zero bit of x.
- _blsfill_u32 ⚠tbm
- Sets all bits of xbelow the least significant one.
- _blsi_u32 bmi1
- Extracts lowest set isolated bit.
- _blsic_u32 ⚠tbm
- Clears least significant bit and sets all other bits.
- _blsmsk_u32 bmi1
- Gets mask up to lowest set bit.
- _blsr_u32 bmi1
- Resets the lowest set bit of x.
- _bswap⚠
- Returns an integer with the reversed byte order of x
- _bzhi_u32 bmi2
- Zeroes higher bits of a>=index.
- _cvtmask8_u32 avx512dq
- Convert 8-bit mask a to a 32-bit integer value and store the result in dst.
- _cvtmask16_u32 avx512f
- Convert 16-bit mask a into an integer value, and store the result in dst.
- _cvtmask32_u32 avx512bw
- Convert 32-bit mask a into an integer value, and store the result in dst.
- _cvtu32_mask8 avx512dq
- Convert 32-bit integer value a to an 8-bit mask and store the result in dst.
- _cvtu32_mask16 avx512f
- Convert 32-bit integer value a to an 16-bit mask and store the result in dst.
- _cvtu32_mask32 avx512bw
- Convert integer value a into an 32-bit mask, and store the result in k.
- _fxrstor⚠fxsr
- Restores the XMM,MMX,MXCSR, andx87FPU registers from the 512-byte-long 16-byte-aligned memory regionmem_addr.
- _fxsave⚠fxsr
- Saves the x87FPU,MMXtechnology,XMM, andMXCSRregisters to the 512-byte-long 16-byte-aligned memory regionmem_addr.
- _kadd_mask8 avx512dq
- Add 8-bit masks a and b, and store the result in dst.
- _kadd_mask16 avx512dq
- Add 16-bit masks a and b, and store the result in dst.
- _kadd_mask32 avx512bw
- Add 32-bit masks in a and b, and store the result in k.
- _kadd_mask64 avx512bw
- Add 64-bit masks in a and b, and store the result in k.
- _kand_mask8 avx512dq
- Bitwise AND of 8-bit masks a and b, and store the result in dst.
- _kand_mask16 avx512f
- Compute the bitwise AND of 16-bit masks a and b, and store the result in k.
- _kand_mask32 avx512bw
- Compute the bitwise AND of 32-bit masks a and b, and store the result in k.
- _kand_mask64 avx512bw
- Compute the bitwise AND of 64-bit masks a and b, and store the result in k.
- _kandn_mask8 avx512dq
- Bitwise AND NOT of 8-bit masks a and b, and store the result in dst.
- _kandn_mask16 avx512f
- Compute the bitwise NOT of 16-bit masks a and then AND with b, and store the result in k.
- _kandn_mask32 avx512bw
- Compute the bitwise NOT of 32-bit masks a and then AND with b, and store the result in k.
- _kandn_mask64 avx512bw
- Compute the bitwise NOT of 64-bit masks a and then AND with b, and store the result in k.
- _knot_mask8 avx512dq
- Bitwise NOT of 8-bit mask a, and store the result in dst.
- _knot_mask16 avx512f
- Compute the bitwise NOT of 16-bit mask a, and store the result in k.
- _knot_mask32 avx512bw
- Compute the bitwise NOT of 32-bit mask a, and store the result in k.
- _knot_mask64 avx512bw
- Compute the bitwise NOT of 64-bit mask a, and store the result in k.
- _kor_mask8 avx512dq
- Bitwise OR of 8-bit masks a and b, and store the result in dst.
- _kor_mask16 avx512f
- Compute the bitwise OR of 16-bit masks a and b, and store the result in k.
- _kor_mask32 avx512bw
- Compute the bitwise OR of 32-bit masks a and b, and store the result in k.
- _kor_mask64 avx512bw
- Compute the bitwise OR of 64-bit masks a and b, and store the result in k.
- _kortest_mask8_ ⚠u8 avx512dq
- Compute the bitwise OR of 8-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortest_mask16_ ⚠u8 avx512f
- Compute the bitwise OR of 16-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortest_mask32_ ⚠u8 avx512bw
- Compute the bitwise OR of 32-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortest_mask64_ ⚠u8 avx512bw
- Compute the bitwise OR of 64-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortestc_mask8_ u8 avx512dq
- Compute the bitwise OR of 8-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestc_mask16_ u8 avx512f
- Compute the bitwise OR of 16-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestc_mask32_ u8 avx512bw
- Compute the bitwise OR of 32-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestc_mask64_ u8 avx512bw
- Compute the bitwise OR of 64-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask8_ u8 avx512dq
- Compute the bitwise OR of 8-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask16_ u8 avx512f
- Compute the bitwise OR of 16-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask32_ u8 avx512bw
- Compute the bitwise OR of 32-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask64_ u8 avx512bw
- Compute the bitwise OR of 64-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kshiftli_mask8 avx512dq
- Shift 8-bit mask a left by count bits while shifting in zeros, and store the result in dst.
- _kshiftli_mask16 avx512f
- Shift 16-bit mask a left by count bits while shifting in zeros, and store the result in dst.
- _kshiftli_mask32 avx512bw
- Shift the bits of 32-bit mask a left by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _kshiftli_mask64 avx512bw
- Shift the bits of 64-bit mask a left by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _kshiftri_mask8 avx512dq
- Shift 8-bit mask a right by count bits while shifting in zeros, and store the result in dst.
- _kshiftri_mask16 avx512f
- Shift 16-bit mask a right by count bits while shifting in zeros, and store the result in dst.
- _kshiftri_mask32 avx512bw
- Shift the bits of 32-bit mask a right by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _kshiftri_mask64 avx512bw
- Shift the bits of 64-bit mask a right by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _ktest_mask8_ ⚠u8 avx512dq
- Compute the bitwise AND of 8-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktest_mask16_ ⚠u8 avx512dq
- Compute the bitwise AND of 16-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktest_mask32_ ⚠u8 avx512bw
- Compute the bitwise AND of 32-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktest_mask64_ ⚠u8 avx512bw
- Compute the bitwise AND of 64-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktestc_mask8_ u8 avx512dq
- Compute the bitwise NOT of 8-bit mask a and then AND with 8-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestc_mask16_ u8 avx512dq
- Compute the bitwise NOT of 16-bit mask a and then AND with 16-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestc_mask32_ u8 avx512bw
- Compute the bitwise NOT of 32-bit mask a and then AND with 16-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestc_mask64_ u8 avx512bw
- Compute the bitwise NOT of 64-bit mask a and then AND with 8-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask8_ u8 avx512dq
- Compute the bitwise AND of 8-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask16_ u8 avx512dq
- Compute the bitwise AND of 16-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask32_ u8 avx512bw
- Compute the bitwise AND of 32-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask64_ u8 avx512bw
- Compute the bitwise AND of 64-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kxnor_mask8 avx512dq
- Bitwise XNOR of 8-bit masks a and b, and store the result in dst.
- _kxnor_mask16 avx512f
- Compute the bitwise XNOR of 16-bit masks a and b, and store the result in k.
- _kxnor_mask32 avx512bw
- Compute the bitwise XNOR of 32-bit masks a and b, and store the result in k.
- _kxnor_mask64 avx512bw
- Compute the bitwise XNOR of 64-bit masks a and b, and store the result in k.
- _kxor_mask8 avx512dq
- Bitwise XOR of 8-bit masks a and b, and store the result in dst.
- _kxor_mask16 avx512f
- Compute the bitwise XOR of 16-bit masks a and b, and store the result in k.
- _kxor_mask32 avx512bw
- Compute the bitwise XOR of 32-bit masks a and b, and store the result in k.
- _kxor_mask64 avx512bw
- Compute the bitwise XOR of 64-bit masks a and b, and store the result in k.
- _load_mask8 ⚠avx512dq
- Load 8-bit mask from memory
- _load_mask16 ⚠avx512f
- Load 16-bit mask from memory
- _load_mask32 ⚠avx512bw
- Load 32-bit mask from memory into k.
- _load_mask64 ⚠avx512bw
- Load 64-bit mask from memory into k.
- _lzcnt_u32 lzcnt
- Counts the leading most significant zero bits.
- _mm256_abs_ epi8 avx2
- Computes the absolute values of packed 8-bit integers in a.
- _mm256_abs_ epi16 avx2
- Computes the absolute values of packed 16-bit integers in a.
- _mm256_abs_ epi32 avx2
- Computes the absolute values of packed 32-bit integers in a.
- _mm256_abs_ epi64 avx512fandavx512vl
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst.
- _mm256_add_ epi8 avx2
- Adds packed 8-bit integers in aandb.
- _mm256_add_ epi16 avx2
- Adds packed 16-bit integers in aandb.
- _mm256_add_ epi32 avx2
- Adds packed 32-bit integers in aandb.
- _mm256_add_ epi64 avx2
- Adds packed 64-bit integers in aandb.
- _mm256_add_ pd avx
- Adds packed double-precision (64-bit) floating-point elements
in aandb.
- _mm256_add_ ps avx
- Adds packed single-precision (32-bit) floating-point elements in aandb.
- _mm256_adds_ epi8 avx2
- Adds packed 8-bit integers in aandbusing saturation.
- _mm256_adds_ epi16 avx2
- Adds packed 16-bit integers in aandbusing saturation.
- _mm256_adds_ epu8 avx2
- Adds packed unsigned 8-bit integers in aandbusing saturation.
- _mm256_adds_ epu16 avx2
- Adds packed unsigned 16-bit integers in aandbusing saturation.
- _mm256_addsub_ pd avx
- Alternatively adds and subtracts packed double-precision (64-bit)
floating-point elements in ato/from packed elements inb.
- _mm256_addsub_ ps avx
- Alternatively adds and subtracts packed single-precision (32-bit)
floating-point elements in ato/from packed elements inb.
- _mm256_aesdec_ epi128 vaes
- Performs one round of an AES decryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm256_aesdeclast_ epi128 vaes
- Performs the last round of an AES decryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm256_aesenc_ epi128 vaes
- Performs one round of an AES encryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm256_aesenclast_ epi128 vaes
- Performs the last round of an AES encryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm256_alignr_ epi8 avx2
- Concatenates pairs of 16-byte blocks in aandbinto a 32-byte temporary result, shifts the result right bynbytes, and returns the low 16 bytes.
- _mm256_alignr_ epi32 avx512fandavx512vl
- Concatenate a and b into a 64-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 32 bytes (8 elements) in dst.
- _mm256_alignr_ epi64 avx512fandavx512vl
- Concatenate a and b into a 64-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 32 bytes (4 elements) in dst.
- _mm256_and_ pd avx
- Computes the bitwise AND of a packed double-precision (64-bit)
floating-point elements in aandb.
- _mm256_and_ ps avx
- Computes the bitwise AND of packed single-precision (32-bit) floating-point
elements in aandb.
- _mm256_and_ si256 avx2
- Computes the bitwise AND of 256 bits (representing integer data)
in aandb.
- _mm256_andnot_ pd avx
- Computes the bitwise NOT of packed double-precision (64-bit) floating-point
elements in a, and then AND withb.
- _mm256_andnot_ ps avx
- Computes the bitwise NOT of packed single-precision (32-bit) floating-point
elements in aand then AND withb.
- _mm256_andnot_ si256 avx2
- Computes the bitwise NOT of 256 bits (representing integer data)
in aand then AND withb.
- _mm256_avg_ epu8 avx2
- Averages packed unsigned 8-bit integers in aandb.
- _mm256_avg_ epu16 avx2
- Averages packed unsigned 16-bit integers in aandb.
- _mm256_bitshuffle_ epi64_ mask avx512bitalgandavx512vl
- Considers the input bas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm256_blend_ epi16 avx2
- Blends packed 16-bit integers from aandbusing control maskIMM8.
- _mm256_blend_ epi32 avx2
- Blends packed 32-bit integers from aandbusing control maskIMM8.
- _mm256_blend_ pd avx
- Blends packed double-precision (64-bit) floating-point elements from
aandbusing control maskimm8.
- _mm256_blend_ ps avx
- Blends packed single-precision (32-bit) floating-point elements from
aandbusing control maskimm8.
- _mm256_blendv_ epi8 avx2
- Blends packed 8-bit integers from aandbusingmask.
- _mm256_blendv_ pd avx
- Blends packed double-precision (64-bit) floating-point elements from
aandbusingcas a mask.
- _mm256_blendv_ ps avx
- Blends packed single-precision (32-bit) floating-point elements from
aandbusingcas a mask.
- _mm256_broadcast_ f32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm256_broadcast_ f32x4 avx512fandavx512vl
- Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm256_broadcast_ f64x2 avx512dqandavx512vl
- Broadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst.
- _mm256_broadcast_ i32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst.
- _mm256_broadcast_ i32x4 avx512fandavx512vl
- Broadcast the 4 packed 32-bit integers from a to all elements of dst.
- _mm256_broadcast_ i64x2 avx512dqandavx512vl
- Broadcasts the 2 packed 64-bit integers from a to all elements of dst.
- _mm256_broadcast_ pd avx
- Broadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
- _mm256_broadcast_ ps avx
- Broadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
- _mm256_broadcast_ sd avx
- Broadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
- _mm256_broadcast_ ss avx
- Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
- _mm256_broadcastb_ epi8 avx2
- Broadcasts the low packed 8-bit integer from ato all elements of the 256-bit returned value.
- _mm256_broadcastd_ epi32 avx2
- Broadcasts the low packed 32-bit integer from ato all elements of the 256-bit returned value.
- _mm256_broadcastmb_ epi64 avx512cdandavx512vl
- Broadcast the low 8-bits from input mask k to all 64-bit elements of dst.
- _mm256_broadcastmw_ epi32 avx512cdandavx512vl
- Broadcast the low 16-bits from input mask k to all 32-bit elements of dst.
- _mm256_broadcastq_ epi64 avx2
- Broadcasts the low packed 64-bit integer from ato all elements of the 256-bit returned value.
- _mm256_broadcastsd_ pd avx2
- Broadcasts the low double-precision (64-bit) floating-point element
from ato all elements of the 256-bit returned value.
- _mm256_broadcastsi128_ si256 avx2
- Broadcasts 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value.
- _mm256_broadcastss_ ps avx2
- Broadcasts the low single-precision (32-bit) floating-point element
from ato all elements of the 256-bit returned value.
- _mm256_broadcastw_ epi16 avx2
- Broadcasts the low packed 16-bit integer from a to all elements of the 256-bit returned value
- _mm256_bslli_ epi128 avx2
- Shifts 128-bit lanes in aleft byimm8bytes while shifting in zeros.
- _mm256_bsrli_ epi128 avx2
- Shifts 128-bit lanes in aright byimm8bytes while shifting in zeros.
- _mm256_castpd128_ pd256 avx
- Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
- _mm256_castpd256_ pd128 avx
- Casts vector of type __m256d to type __m128d.
- _mm256_castpd_ ps avx
- Cast vector of type __m256d to type __m256.
- _mm256_castpd_ si256 avx
- Casts vector of type __m256d to type __m256i.
- _mm256_castps128_ ps256 avx
- Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
- _mm256_castps256_ ps128 avx
- Casts vector of type __m256 to type __m128.
- _mm256_castps_ pd avx
- Cast vector of type __m256 to type __m256d.
- _mm256_castps_ si256 avx
- Casts vector of type __m256 to type __m256i.
- _mm256_castsi128_ si256 avx
- Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.
- _mm256_castsi256_ pd avx
- Casts vector of type __m256i to type __m256d.
- _mm256_castsi256_ ps avx
- Casts vector of type __m256i to type __m256.
- _mm256_castsi256_ si128 avx
- Casts vector of type __m256i to type __m128i.
- _mm256_ceil_ pd avx
- Rounds packed double-precision (64-bit) floating point elements in atoward positive infinity.
- _mm256_ceil_ ps avx
- Rounds packed single-precision (32-bit) floating point elements in atoward positive infinity.
- _mm256_clmulepi64_ epi128 vpclmulqdq
- Performs a carry-less multiplication of two 64-bit polynomials over the finite field GF(2) - in each of the 2 128-bit lanes.
- _mm256_cmp_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ pd avx
- Compares packed double-precision (64-bit) floating-point
elements in aandbbased on the comparison operand specified byIMM5.
- _mm256_cmp_ pd_ mask avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ps avx
- Compares packed single-precision (32-bit) floating-point
elements in aandbbased on the comparison operand specified byIMM5.
- _mm256_cmp_ ps_ mask avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmpeq_ epi8 avx2
- Compares packed 8-bit integers in aandbfor equality.
- _mm256_cmpeq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ epi16 avx2
- Compares packed 16-bit integers in aandbfor equality.
- _mm256_cmpeq_ epi32 avx2
- Compares packed 32-bit integers in aandbfor equality.
- _mm256_cmpeq_ epi64 avx2
- Compares packed 64-bit integers in aandbfor equality.
- _mm256_cmpeq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ epi64_ mask avx512fandavx512vl
- Compare packed 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpge_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpgt_ epi8 avx2
- Compares packed 8-bit integers in aandbfor greater-than.
- _mm256_cmpgt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ epi16 avx2
- Compares packed 16-bit integers in aandbfor greater-than.
- _mm256_cmpgt_ epi32 avx2
- Compares packed 32-bit integers in aandbfor greater-than.
- _mm256_cmpgt_ epi64 avx2
- Compares packed 64-bit integers in aandbfor greater-than.
- _mm256_cmpgt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmple_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmplt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmpneq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_conflict_ epi32 avx512cdandavx512vl
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_conflict_ epi64 avx512cdandavx512vl
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_cvtepi8_ epi16 avx2
- Sign-extend 8-bit integers to 16-bit integers.
- _mm256_cvtepi8_ epi32 avx2
- Sign-extend 8-bit integers to 32-bit integers.
- _mm256_cvtepi8_ epi64 avx2
- Sign-extend 8-bit integers to 64-bit integers.
- _mm256_cvtepi16_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi16_ epi32 avx2
- Sign-extend 16-bit integers to 32-bit integers.
- _mm256_cvtepi16_ epi64 avx2
- Sign-extend 16-bit integers to 64-bit integers.
- _mm256_cvtepi32_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi32_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi32_ epi64 avx2
- Sign-extend 32-bit integers to 64-bit integers.
- _mm256_cvtepi32_ pd avx
- Converts packed 32-bit integers in ato packed double-precision (64-bit) floating-point elements.
- _mm256_cvtepi32_ ps avx
- Converts packed 32-bit integers in ato packed single-precision (32-bit) floating-point elements.
- _mm256_cvtepi64_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi64_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi64_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi64_ pd avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepi64_ ps avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu8_ epi16 avx2
- Zero-extend unsigned 8-bit integers in ato 16-bit integers.
- _mm256_cvtepu8_ epi32 avx2
- Zero-extend the lower eight unsigned 8-bit integers in ato 32-bit integers. The upper eight elements ofaare unused.
- _mm256_cvtepu8_ epi64 avx2
- Zero-extend the lower four unsigned 8-bit integers in ato 64-bit integers. The upper twelve elements ofaare unused.
- _mm256_cvtepu16_ epi32 avx2
- Zeroes extend packed unsigned 16-bit integers in ato packed 32-bit integers, and stores the results indst.
- _mm256_cvtepu16_ epi64 avx2
- Zero-extend the lower four unsigned 16-bit integers in ato 64-bit integers. The upper four elements ofaare unused.
- _mm256_cvtepu32_ epi64 avx2
- Zero-extend unsigned 32-bit integers in ato 64-bit integers.
- _mm256_cvtepu32_ pd avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu64_ pd avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu64_ ps avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtne2ps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in two 256-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 256-bit wide vector. Intel’s documentation
- _mm256_cvtneebf16_ ⚠ps avxneconvert
- Convert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneobf16_ ⚠ps avxneconvert
- Convert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneps_ avx_ pbh avxneconvert
- Convert packed single precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
- _mm256_cvtpbh_ ps avx512bf16andavx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtpd_ epi32 avx
- Converts packed double-precision (64-bit) floating-point elements in ato packed 32-bit integers.
- _mm256_cvtpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm256_cvtpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm256_cvtpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm256_cvtpd_ ps avx
- Converts packed double-precision (64-bit) floating-point elements in ato packed single-precision (32-bit) floating-point elements.
- _mm256_cvtph_ ps f16c
- Converts the 8 x 16-bit half-precision float values in the 128-bit vector
ainto 8 x 32-bit float values stored in a 256-bit wide vector.
- _mm256_cvtps_ epi32 avx
- Converts packed single-precision (32-bit) floating-point elements in ato packed 32-bit integers.
- _mm256_cvtps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm256_cvtps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm256_cvtps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm256_cvtps_ pd avx
- Converts packed single-precision (32-bit) floating-point elements in ato packed double-precision (64-bit) floating-point elements.
- _mm256_cvtps_ ph f16c
- Converts the 8 x 32-bit float values in the 256-bit vector ainto 8 x 16-bit half-precision float values stored in a 128-bit wide vector.
- _mm256_cvtsd_ f64 avx
- Returns the first element of the input vector of [4 x double].
- _mm256_cvtsepi16_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi32_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi32_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi64_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi64_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi64_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsi256_ si32 avx
- Returns the first element of the input vector of [8 x i32].
- _mm256_cvtss_ f32 avx
- Returns the first element of the input vector of [8 x float].
- _mm256_cvttpd_ epi32 avx
- Converts packed double-precision (64-bit) floating-point elements in ato packed 32-bit integers with truncation.
- _mm256_cvttpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm256_cvttpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm256_cvttpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm256_cvttps_ epi32 avx
- Converts packed single-precision (32-bit) floating-point elements in ato packed 32-bit integers with truncation.
- _mm256_cvttps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm256_cvttps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm256_cvttps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm256_cvtusepi16_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi32_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi32_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi64_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi64_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi64_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst.
- _mm256_dbsad_ epu8 avx512bwandavx512vl
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm256_div_ pd avx
- Computes the division of each of the 4 packed 64-bit floating-point elements
in aby the corresponding packed elements inb.
- _mm256_div_ ps avx
- Computes the division of each of the 8 packed 32-bit floating-point elements
in aby the corresponding packed elements inb.
- _mm256_dp_ ps avx
- Conditionally multiplies the packed single-precision (32-bit) floating-point
elements in aandbusing the high 4 bits inimm8, sum the four products, and conditionally return the sum using the low 4 bits ofimm8.
- _mm256_dpbf16_ ps avx512bf16andavx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm256_dpbssd_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbssds_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbsud_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbsuds_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbusd_ avx_ epi32 avxvnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbusd_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbusds_ avx_ epi32 avxvnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbusds_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbuud_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbuuds_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwssd_ avx_ epi32 avxvnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwssd_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwssds_ avx_ epi32 avxvnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwssds_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwsud_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwsuds_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwusd_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwusds_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwuud_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwuuds_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_extract_ epi8 avx2
- Extracts an 8-bit integer from a, selected withINDEX. Returns a 32-bit integer containing the zero-extended integer data.
- _mm256_extract_ epi16 avx2
- Extracts a 16-bit integer from a, selected withINDEX. Returns a 32-bit integer containing the zero-extended integer data.
- _mm256_extract_ epi32 avx
- Extracts a 32-bit integer from a, selected withINDEX.
- _mm256_extractf32x4_ ps avx512fandavx512vl
- Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
- _mm256_extractf64x2_ pd avx512dqandavx512vl
- Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst.
- _mm256_extractf128_ pd avx
- Extracts 128 bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from a, selected withimm8.
- _mm256_extractf128_ ps avx
- Extracts 128 bits (composed of 4 packed single-precision (32-bit)
floating-point elements) from a, selected withimm8.
- _mm256_extractf128_ si256 avx
- Extracts 128 bits (composed of integer data) from a, selected withimm8.
- _mm256_extracti32x4_ epi32 avx512fandavx512vl
- Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM1, and store the result in dst.
- _mm256_extracti64x2_ epi64 avx512dqandavx512vl
- Extracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst.
- _mm256_extracti128_ si256 avx2
- Extracts 128 bits (of integer data) from aselected withIMM1.
- _mm256_fixupimm_ pd avx512fandavx512vl
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm256_fixupimm_ ps avx512fandavx512vl
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm256_floor_ pd avx
- Rounds packed double-precision (64-bit) floating point elements in atoward negative infinity.
- _mm256_floor_ ps avx
- Rounds packed single-precision (32-bit) floating point elements in atoward negative infinity.
- _mm256_fmadd_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and add the intermediate result to packed elements inc.
- _mm256_fmadd_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and add the intermediate result to packed elements inc.
- _mm256_fmaddsub_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm256_fmaddsub_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm256_fmsub_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and subtract packed elements incfrom the intermediate result.
- _mm256_fmsub_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and subtract packed elements incfrom the intermediate result.
- _mm256_fmsubadd_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm256_fmsubadd_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm256_fnmadd_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and add the negated intermediate result to packed elements inc.
- _mm256_fnmadd_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and add the negated intermediate result to packed elements inc.
- _mm256_fnmsub_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and subtract packed elements incfrom the negated intermediate result.
- _mm256_fnmsub_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and subtract packed elements incfrom the negated intermediate result.
- _mm256_fpclass_ pd_ mask avx512dqandavx512vl
- Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_fpclass_ ps_ mask avx512dqandavx512vl
- Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_getexp_ pd avx512fandavx512vl
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_getexp_ ps avx512fandavx512vl
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_getmant_ pd avx512fandavx512vl
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_getmant_ ps avx512fandavx512vl
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_gf2p8affine_ epi64_ epi8 gfniandavx
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_gf2p8affineinv_ epi64_ epi8 gfniandavx
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_gf2p8mul_ epi8 gfniandavx
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm256_hadd_ epi16 avx2
- Horizontally adds adjacent pairs of 16-bit integers in aandb.
- _mm256_hadd_ epi32 avx2
- Horizontally adds adjacent pairs of 32-bit integers in aandb.
- _mm256_hadd_ pd avx
- Horizontal addition of adjacent pairs in the two packed vectors
of 4 64-bit floating points aandb. In the result, sums of elements fromaare returned in even locations, while sums of elements frombare returned in odd locations.
- _mm256_hadd_ ps avx
- Horizontal addition of adjacent pairs in the two packed vectors
of 8 32-bit floating points aandb. In the result, sums of elements fromaare returned in locations of indices 0, 1, 4, 5; while sums of elements frombare locations 2, 3, 6, 7.
- _mm256_hadds_ epi16 avx2
- Horizontally adds adjacent pairs of 16-bit integers in aandbusing saturation.
- _mm256_hsub_ epi16 avx2
- Horizontally subtract adjacent pairs of 16-bit integers in aandb.
- _mm256_hsub_ epi32 avx2
- Horizontally subtract adjacent pairs of 32-bit integers in aandb.
- _mm256_hsub_ pd avx
- Horizontal subtraction of adjacent pairs in the two packed vectors
of 4 64-bit floating points aandb. In the result, sums of elements fromaare returned in even locations, while sums of elements frombare returned in odd locations.
- _mm256_hsub_ ps avx
- Horizontal subtraction of adjacent pairs in the two packed vectors
of 8 32-bit floating points aandb. In the result, sums of elements fromaare returned in locations of indices 0, 1, 4, 5; while sums of elements frombare locations 2, 3, 6, 7.
- _mm256_hsubs_ epi16 avx2
- Horizontally subtract adjacent pairs of 16-bit integers in aandbusing saturation.
- _mm256_i32gather_ ⚠epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i32gather_ ⚠epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i32gather_ ⚠pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i32gather_ ⚠ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i32scatter_ ⚠epi32 avx512fandavx512vl
- Stores 8 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm256_i32scatter_ ⚠epi64 avx512fandavx512vl
- Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm256_i32scatter_ ⚠pd avx512fandavx512vl
- Stores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm256_i32scatter_ ⚠ps avx512fandavx512vl
- Stores 8 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm256_i64gather_ ⚠epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64gather_ ⚠epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64gather_ ⚠pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64gather_ ⚠ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64scatter_ ⚠epi32 avx512fandavx512vl
- Stores 4 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_i64scatter_ ⚠epi64 avx512fandavx512vl
- Stores 4 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_i64scatter_ ⚠pd avx512fandavx512vl
- Stores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_i64scatter_ ⚠ps avx512fandavx512vl
- Stores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_insert_ epi8 avx
- Copies ato result, and inserts the 8-bit integeriinto result at the location specified byindex.
- _mm256_insert_ epi16 avx
- Copies ato result, and inserts the 16-bit integeriinto result at the location specified byindex.
- _mm256_insert_ epi32 avx
- Copies ato result, and inserts the 32-bit integeriinto result at the location specified byindex.
- _mm256_insertf32x4 avx512fandavx512vl
- Copy a to dst, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm8.
- _mm256_insertf64x2 avx512dqandavx512vl
- Copy a to dst, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by IMM8.
- _mm256_insertf128_ pd avx
- Copies ato result, then inserts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) frombinto result at the location specified byimm8.
- _mm256_insertf128_ ps avx
- Copies ato result, then inserts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) frombinto result at the location specified byimm8.
- _mm256_insertf128_ si256 avx
- Copies ato result, then inserts 128 bits frombinto result at the location specified byimm8.
- _mm256_inserti32x4 avx512fandavx512vl
- Copy a to dst, then insert 128 bits (composed of 4 packed 32-bit integers) from b into dst at the location specified by imm8.
- _mm256_inserti64x2 avx512dqandavx512vl
- Copy a to dst, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by IMM8.
- _mm256_inserti128_ si256 avx2
- Copies atodst, then insert 128 bits (of integer data) frombat the location specified byIMM1.
- _mm256_lddqu_ ⚠si256 avx
- Loads 256-bits of integer data from unaligned memory into result.
This intrinsic may perform better than _mm256_loadu_si256when the data crosses a cache line boundary.
- _mm256_load_ ⚠epi32 avx512fandavx512vl
- Load 256-bits (composed of 8 packed 32-bit integers) from memory into dst. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠epi64 avx512fandavx512vl
- Load 256-bits (composed of 4 packed 64-bit integers) from memory into dst. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠pd avx
- Loads 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠ps avx
- Loads 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠si256 avx
- Loads 256-bits of integer data from memory into result.
mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_loadu2_ ⚠m128 avx
- Loads two 128-bit values (composed of 4 packed single-precision (32-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_loadu2_ ⚠m128d avx
- Loads two 128-bit values (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_loadu2_ ⚠m128i avx
- Loads two 128-bit values (composed of integer data) from memory, and combine
them into a 256-bit value.
hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠epi8 avx512bwandavx512vl
- Load 256-bits (composed of 32 packed 8-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠epi16 avx512bwandavx512vl
- Load 256-bits (composed of 16 packed 16-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠epi32 avx512fandavx512vl
- Load 256-bits (composed of 8 packed 32-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠epi64 avx512fandavx512vl
- Load 256-bits (composed of 4 packed 64-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠pd avx
- Loads 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠ps avx
- Loads 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠si256 avx
- Loads 256-bits of integer data from memory into result.
mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_lzcnt_ epi32 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst.
- _mm256_lzcnt_ epi64 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst.
- _mm256_madd52hi_ avx_ epu64 avxifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_madd52hi_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_madd52lo_ avx_ epu64 avxifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_madd52lo_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_madd_ epi16 avx2
- Multiplies packed signed 16-bit integers in aandb, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers.
- _mm256_maddubs_ epi16 avx2
- Vertically multiplies each unsigned 8-bit integer from awith the corresponding signed 8-bit integer fromb, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers
- _mm256_mask2_ permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask2_ permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask2_ permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask2_ permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask2_ permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set)
- _mm256_mask2_ permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask3_ fmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmaddsub_ pd avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmaddsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmsubadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmsubadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fnmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fnmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fnmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fnmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask_ abs_ epi8 avx512bwandavx512vl
- Compute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ abs_ epi16 avx512bwandavx512vl
- Compute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ abs_ epi32 avx512fandavx512vl
- Compute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ abs_ epi64 avx512fandavx512vl
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ add_ epi8 avx512bwandavx512vl
- Add packed 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ add_ epi16 avx512bwandavx512vl
- Add packed 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ add_ epi32 avx512fandavx512vl
- Add packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ add_ epi64 avx512fandavx512vl
- Add packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ add_ pd avx512fandavx512vl
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ add_ ps avx512fandavx512vl
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ adds_ epi8 avx512bwandavx512vl
- Add packed signed 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ adds_ epi16 avx512bwandavx512vl
- Add packed signed 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ adds_ epu8 avx512bwandavx512vl
- Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ adds_ epu16 avx512bwandavx512vl
- Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ alignr_ epi8 avx512bwandavx512vl
- Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ alignr_ epi32 avx512fandavx512vl
- Concatenate a and b into a 64-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 32 bytes (8 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ alignr_ epi64 avx512fandavx512vl
- Concatenate a and b into a 64-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 32 bytes (4 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ and_ epi32 avx512fandavx512vl
- Performs element-by-element bitwise AND between packed 32-bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ and_ epi64 avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ and_ pd avx512dqandavx512vl
- Compute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ and_ ps avx512dqandavx512vl
- Compute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ andnot_ epi32 avx512fandavx512vl
- Compute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ andnot_ epi64 avx512fandavx512vl
- Compute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ andnot_ pd avx512dqandavx512vl
- Compute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ andnot_ ps avx512dqandavx512vl
- Compute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ avg_ epu8 avx512bwandavx512vl
- Average packed unsigned 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ avg_ epu16 avx512bwandavx512vl
- Average packed unsigned 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ bitshuffle_ epi64_ mask avx512bitalgandavx512vl
- Considers the input bas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm256_mask_ blend_ epi8 avx512bwandavx512vl
- Blend packed 8-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ blend_ epi16 avx512bwandavx512vl
- Blend packed 16-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ blend_ epi32 avx512fandavx512vl
- Blend packed 32-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ blend_ epi64 avx512fandavx512vl
- Blend packed 64-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ blend_ pd avx512fandavx512vl
- Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_mask_ blend_ ps avx512fandavx512vl
- Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_mask_ broadcast_ f32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ broadcast_ f32x4 avx512fandavx512vl
- Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ broadcast_ f64x2 avx512dqandavx512vl
- Broadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ broadcast_ i32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ broadcast_ i32x4 avx512fandavx512vl
- Broadcast the 4 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ broadcast_ i64x2 avx512dqandavx512vl
- Broadcasts the 2 packed 64-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ broadcastb_ epi8 avx512bwandavx512vl
- Broadcast the low packed 8-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ broadcastd_ epi32 avx512fandavx512vl
- Broadcast the low packed 32-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ broadcastq_ epi64 avx512fandavx512vl
- Broadcast the low packed 64-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ broadcastsd_ pd avx512fandavx512vl
- Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ broadcastss_ ps avx512fandavx512vl
- Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ broadcastw_ epi16 avx512bwandavx512vl
- Broadcast the low packed 16-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ pd_ mask avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmp_ ps_ mask avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epi64_ mask avx512fandavx512vl
- Compare packed 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpeq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpge_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpgt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmple_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmplt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmpneq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ compress_ epi8 avx512vbmi2andavx512vl
- Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ compress_ epi16 avx512vbmi2andavx512vl
- Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ compress_ epi32 avx512fandavx512vl
- Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ compress_ epi64 avx512fandavx512vl
- Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ compress_ pd avx512fandavx512vl
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ compress_ ps avx512fandavx512vl
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ ⚠compressstoreu_ epi8 avx512vbmi2andavx512vl
- Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ epi16 avx512vbmi2andavx512vl
- Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ epi32 avx512fandavx512vl
- Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ epi64 avx512fandavx512vl
- Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ pd avx512fandavx512vl
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ ps avx512fandavx512vl
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ conflict_ epi32 avx512cdandavx512vl
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_mask_ conflict_ epi64 avx512cdandavx512vl
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_mask_ cvt_ roundps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:
- _mm256_mask_ cvtepi8_ epi16 avx512bwandavx512vl
- Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi8_ epi32 avx512fandavx512vl
- Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi8_ epi64 avx512fandavx512vl
- Sign extend packed 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi16_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi16_ epi32 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi16_ epi64 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi16_ storeu_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvtepi32_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi32_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi32_ epi64 avx512fandavx512vl
- Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi32_ pd avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi32_ ps avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi32_ storeu_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi32_ storeu_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvtepi64_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi64_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi64_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi64_ pd avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtepi64_ ps avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtepi64_ storeu_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi64_ storeu_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi64_ storeu_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvtepu8_ epi16 avx512bwandavx512vl
- Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu8_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu8_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu16_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu16_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu32_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu32_ pd avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu64_ pd avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtepu64_ ps avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtne2ps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_mask_ cvtneps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_mask_ cvtpbh_ ps avx512bf16andavx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtpd_ ps avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtph_ ps avx512fandavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_mask_ cvtsepi16_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi16_ storeu_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvtsepi32_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtsepi32_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi32_ storeu_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi32_ storeu_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvtsepi64_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtsepi64_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtsepi64_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi64_ storeu_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi64_ storeu_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi64_ storeu_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvttpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvttpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvttps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvttps_ epu32 avx512fandavx512vl
- Convert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ cvtusepi16_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi16_ storeu_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvtusepi32_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtusepi32_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi32_ storeu_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi32_ storeu_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ cvtusepi64_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtusepi64_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtusepi64_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi64_ storeu_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi64_ storeu_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi64_ storeu_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ dbsad_ epu8 avx512bwandavx512vl
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm256_mask_ div_ pd avx512fandavx512vl
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ div_ ps avx512fandavx512vl
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ dpbf16_ ps avx512bf16andavx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_mask_ dpbusd_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ dpbusds_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ dpwssd_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ dpwssds_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ expand_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ expand_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ expand_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ expand_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ expand_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ expand_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ extractf32x4_ ps avx512fandavx512vl
- Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ extractf64x2_ pd avx512dqandavx512vl
- Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ extracti32x4_ epi32 avx512fandavx512vl
- Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM1, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ extracti64x2_ epi64 avx512dqandavx512vl
- Extracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ fixupimm_ pd avx512fandavx512vl
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_mask_ fixupimm_ ps avx512fandavx512vl
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_mask_ fmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmaddsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmaddsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmsubadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmsubadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fnmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fnmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fnmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fnmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fpclass_ pd_ mask avx512dqandavx512vl
- Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_mask_ fpclass_ ps_ mask avx512dqandavx512vl
- Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_mask_ getexp_ pd avx512fandavx512vl
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_mask_ getexp_ ps avx512fandavx512vl
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_mask_ getmant_ pd avx512fandavx512vl
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_mask_ getmant_ ps avx512fandavx512vl
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_mask_ gf2p8affine_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_mask_ gf2p8affineinv_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_mask_ gf2p8mul_ epi8 gfniandavx512bwandavx512vl
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm256_mask_ ⚠i32gather_ epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i32gather_ epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i32gather_ pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i32gather_ ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i32scatter_ epi32 avx512fandavx512vl
- Stores 8 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i32scatter_ epi64 avx512fandavx512vl
- Stores 4 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i32scatter_ pd avx512fandavx512vl
- Stores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i32scatter_ ps avx512fandavx512vl
- Stores 8 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64gather_ epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64gather_ epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64gather_ pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64gather_ ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64scatter_ epi32 avx512fandavx512vl
- Stores 4 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64scatter_ epi64 avx512fandavx512vl
- Stores 4 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64scatter_ pd avx512fandavx512vl
- Stores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64scatter_ ps avx512fandavx512vl
- Stores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ insertf32x4 avx512fandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ insertf64x2 avx512dqandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ inserti32x4 avx512fandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ inserti64x2 avx512dqandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠load_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠load_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠load_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠load_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠loadu_ epi8 avx512bwandavx512vl
- Load packed 8-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ epi16 avx512bwandavx512vl
- Load packed 16-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ lzcnt_ epi32 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ lzcnt_ epi64 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ madd52hi_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm256_mask_ madd52lo_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm256_mask_ madd_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ maddubs_ epi16 avx512bwandavx512vl
- Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ max_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ min_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mov_ epi8 avx512bwandavx512vl
- Move packed 8-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mov_ epi16 avx512bwandavx512vl
- Move packed 16-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mov_ epi32 avx512fandavx512vl
- Move packed 32-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mov_ epi64 avx512fandavx512vl
- Move packed 64-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mov_ pd avx512fandavx512vl
- Move packed double-precision (64-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mov_ ps avx512fandavx512vl
- Move packed single-precision (32-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ movedup_ pd avx512fandavx512vl
- Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ movehdup_ ps avx512fandavx512vl
- Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ moveldup_ ps avx512fandavx512vl
- Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mul_ epi32 avx512fandavx512vl
- Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mul_ epu32 avx512fandavx512vl
- Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mul_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mul_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mulhi_ epi16 avx512bwandavx512vl
- Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mulhi_ epu16 avx512bwandavx512vl
- Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mulhrs_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mullo_ epi16 avx512bwandavx512vl
- Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mullo_ epi32 avx512fandavx512vl
- Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ mullo_ epi64 avx512dqandavx512vl
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing writemaskk(elements are copied fromsrcif the corresponding bit is not set).
- _mm256_mask_ multishift_ epi64_ epi8 avx512vbmiandavx512vl
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ or_ epi32 avx512fandavx512vl
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ or_ epi64 avx512fandavx512vl
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ or_ pd avx512dqandavx512vl
- Compute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ or_ ps avx512dqandavx512vl
- Compute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ packs_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ packs_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ packus_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ packus_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permute_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permute_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutevar_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutevar_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ permutex_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutex_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutexvar_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutexvar_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutexvar_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutexvar_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutexvar_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ permutexvar_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ popcnt_ epi8 avx512bitalgandavx512vl
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ popcnt_ epi16 avx512bitalgandavx512vl
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ popcnt_ epi32 avx512vpopcntdqandavx512vl
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ popcnt_ epi64 avx512vpopcntdqandavx512vl
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ range_ pd avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_mask_ range_ ps avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_mask_ rcp14_ pd avx512fandavx512vl
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ rcp14_ ps avx512fandavx512vl
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ reduce_ add_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm256_mask_ reduce_ add_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm256_mask_ reduce_ and_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm256_mask_ reduce_ and_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm256_mask_ reduce_ max_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ reduce_ max_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ reduce_ max_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ reduce_ max_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ reduce_ min_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ reduce_ min_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ reduce_ min_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ reduce_ min_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ reduce_ mul_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm256_mask_ reduce_ mul_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm256_mask_ reduce_ or_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm256_mask_ reduce_ or_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm256_mask_ reduce_ pd avx512dqandavx512vl
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_mask_ reduce_ ps avx512dqandavx512vl
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_mask_ rol_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ rol_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ rolv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ rolv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ror_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ror_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ rorv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ rorv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ roundscale_ pd avx512fandavx512vl
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_mask_ roundscale_ ps avx512fandavx512vl
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_mask_ rsqrt14_ pd avx512fandavx512vl
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ rsqrt14_ ps avx512fandavx512vl
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ scalef_ pd avx512fandavx512vl
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ scalef_ ps avx512fandavx512vl
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ set1_ epi8 avx512bwandavx512vl
- Broadcast 8-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ set1_ epi16 avx512bwandavx512vl
- Broadcast 16-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ set1_ epi32 avx512fandavx512vl
- Broadcast 32-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ set1_ epi64 avx512fandavx512vl
- Broadcast 64-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shldi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shldi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shldi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shldv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ shldv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ shldv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ shrdi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shrdi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shrdi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using writemask k (elements are copied from src“ when the corresponding mask bit is not set).
- _mm256_mask_ shrdv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ shrdv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ shrdv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ epi8 avx512bwandavx512vl
- Shuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ f32x4 avx512fandavx512vl
- Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ f64x2 avx512fandavx512vl
- Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ i32x4 avx512fandavx512vl
- Shuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ i64x2 avx512fandavx512vl
- Shuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shuffle_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shufflehi_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ shufflelo_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sll_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sll_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sll_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ slli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ slli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ slli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sllv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sllv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sllv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sqrt_ pd avx512fandavx512vl
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sqrt_ ps avx512fandavx512vl
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sra_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sra_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sra_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srai_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srai_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srai_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srav_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srav_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srav_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srl_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srl_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srl_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srlv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srlv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ srlv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠store_ epi32 avx512fandavx512vl
- Store packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠store_ epi64 avx512fandavx512vl
- Store packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠store_ pd avx512fandavx512vl
- Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠store_ ps avx512fandavx512vl
- Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠storeu_ epi8 avx512bwandavx512vl
- Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ epi16 avx512bwandavx512vl
- Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ epi32 avx512fandavx512vl
- Store packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ epi64 avx512fandavx512vl
- Store packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ pd avx512fandavx512vl
- Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ ps avx512fandavx512vl
- Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ sub_ epi8 avx512bwandavx512vl
- Subtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sub_ epi16 avx512bwandavx512vl
- Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sub_ epi32 avx512fandavx512vl
- Subtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sub_ epi64 avx512fandavx512vl
- Subtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sub_ pd avx512fandavx512vl
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sub_ ps avx512fandavx512vl
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ subs_ epi8 avx512bwandavx512vl
- Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ subs_ epi16 avx512bwandavx512vl
- Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ subs_ epu8 avx512bwandavx512vl
- Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ subs_ epu16 avx512bwandavx512vl
- Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ternarylogic_ epi32 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 32-bit granularity (32-bit elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ternarylogic_ epi64 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 64-bit granularity (64-bit elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ test_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ test_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ test_ epi32_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ test_ epi64_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ testn_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ testn_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ testn_ epi32_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ testn_ epi64_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ unpackhi_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpackhi_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpackhi_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpackhi_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpackhi_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpackhi_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpacklo_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpacklo_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpacklo_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpacklo_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpacklo_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ unpacklo_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ xor_ epi32 avx512fandavx512vl
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ xor_ epi64 avx512fandavx512vl
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ xor_ pd avx512dqandavx512vl
- Compute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ xor_ ps avx512dqandavx512vl
- Compute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_maskload_ ⚠epi32 avx2
- Loads packed 32-bit integers from memory pointed by mem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm256_maskload_ ⚠epi64 avx2
- Loads packed 64-bit integers from memory pointed by mem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm256_maskload_ ⚠pd avx
- Loads packed double-precision (64-bit) floating-point elements from memory
into result using mask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm256_maskload_ ⚠ps avx
- Loads packed single-precision (32-bit) floating-point elements from memory
into result using mask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm256_maskstore_ ⚠epi32 avx2
- Stores packed 32-bit integers from ainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm256_maskstore_ ⚠epi64 avx2
- Stores packed 64-bit integers from ainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm256_maskstore_ ⚠pd avx
- Stores packed double-precision (64-bit) floating-point elements from ainto memory usingmask.
- _mm256_maskstore_ ⚠ps avx
- Stores packed single-precision (32-bit) floating-point elements from ainto memory usingmask.
- _mm256_maskz_ abs_ epi8 avx512bwandavx512vl
- Compute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ abs_ epi16 avx512bwandavx512vl
- Compute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ abs_ epi32 avx512fandavx512vl
- Compute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ abs_ epi64 avx512fandavx512vl
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ add_ epi8 avx512bwandavx512vl
- Add packed 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ add_ epi16 avx512bwandavx512vl
- Add packed 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ add_ epi32 avx512fandavx512vl
- Add packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ add_ epi64 avx512fandavx512vl
- Add packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ add_ pd avx512fandavx512vl
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ add_ ps avx512fandavx512vl
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ adds_ epi8 avx512bwandavx512vl
- Add packed signed 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ adds_ epi16 avx512bwandavx512vl
- Add packed signed 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ adds_ epu8 avx512bwandavx512vl
- Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ adds_ epu16 avx512bwandavx512vl
- Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ alignr_ epi8 avx512bwandavx512vl
- Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ alignr_ epi32 avx512fandavx512vl
- Concatenate a and b into a 64-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 32 bytes (8 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ alignr_ epi64 avx512fandavx512vl
- Concatenate a and b into a 64-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 32 bytes (4 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ and_ epi32 avx512fandavx512vl
- Compute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ and_ epi64 avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ and_ pd avx512dqandavx512vl
- Compute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ and_ ps avx512dqandavx512vl
- Compute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ andnot_ epi32 avx512fandavx512vl
- Compute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ andnot_ epi64 avx512fandavx512vl
- Compute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ andnot_ pd avx512dqandavx512vl
- Compute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ andnot_ ps avx512dqandavx512vl
- Compute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ avg_ epu8 avx512bwandavx512vl
- Average packed unsigned 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ avg_ epu16 avx512bwandavx512vl
- Average packed unsigned 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcast_ f32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ broadcast_ f32x4 avx512fandavx512vl
- Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcast_ f64x2 avx512dqandavx512vl
- Broadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ broadcast_ i32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ broadcast_ i32x4 avx512fandavx512vl
- Broadcast the 4 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcast_ i64x2 avx512dqandavx512vl
- Broadcasts the 2 packed 64-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ broadcastb_ epi8 avx512bwandavx512vl
- Broadcast the low packed 8-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcastd_ epi32 avx512fandavx512vl
- Broadcast the low packed 32-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcastq_ epi64 avx512fandavx512vl
- Broadcast the low packed 64-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcastsd_ pd avx512fandavx512vl
- Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcastss_ ps avx512fandavx512vl
- Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ broadcastw_ epi16 avx512bwandavx512vl
- Broadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ compress_ epi8 avx512vbmi2andavx512vl
- Contiguously store the active 8-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ compress_ epi16 avx512vbmi2andavx512vl
- Contiguously store the active 16-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ compress_ epi32 avx512fandavx512vl
- Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ compress_ epi64 avx512fandavx512vl
- Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ compress_ pd avx512fandavx512vl
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ compress_ ps avx512fandavx512vl
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ conflict_ epi32 avx512cdandavx512vl
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_maskz_ conflict_ epi64 avx512cdandavx512vl
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_maskz_ cvt_ roundps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ cvtepi8_ epi16 avx512bwandavx512vl
- Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi8_ epi32 avx512fandavx512vl
- Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi8_ epi64 avx512fandavx512vl
- Sign extend packed 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi16_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi16_ epi32 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi16_ epi64 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi32_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi32_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi32_ epi64 avx512fandavx512vl
- Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi32_ pd avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi32_ ps avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi64_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi64_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi64_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi64_ pd avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtepi64_ ps avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtepu8_ epi16 avx512bwandavx512vl
- Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu8_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu8_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu16_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu16_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu32_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu32_ pd avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu64_ pd avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtepu64_ ps avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtne2ps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_maskz_ cvtneps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_maskz_ cvtpbh_ ps avx512bf16andavx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtpd_ ps avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtph_ ps avx512fandavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ cvtsepi16_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtsepi32_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtsepi32_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm256_maskz_ cvtsepi64_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtsepi64_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtsepi64_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvttpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvttps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvttps_ epu32 avx512fandavx512vl
- Convert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ cvtusepi16_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtusepi32_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtusepi32_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtusepi64_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtusepi64_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtusepi64_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ dbsad_ epu8 avx512bwandavx512vl
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm256_maskz_ div_ pd avx512fandavx512vl
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ div_ ps avx512fandavx512vl
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ dpbf16_ ps avx512bf16andavx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_maskz_ dpbusd_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ dpbusds_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ dpwssd_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ dpwssds_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ expand_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ expand_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ expand_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ expand_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ expand_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ expand_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ extractf32x4_ ps avx512fandavx512vl
- Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ extractf64x2_ pd avx512dqandavx512vl
- Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ extracti32x4_ epi32 avx512fandavx512vl
- Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM1, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ extracti64x2_ epi64 avx512dqandavx512vl
- Extracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ fixupimm_ pd avx512fandavx512vl
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_maskz_ fixupimm_ ps avx512fandavx512vl
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_maskz_ fmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmaddsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmaddsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmsubadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmsubadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fnmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fnmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fnmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fnmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ getexp_ pd avx512fandavx512vl
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_maskz_ getexp_ ps avx512fandavx512vl
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_maskz_ getmant_ pd avx512fandavx512vl
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_maskz_ getmant_ ps avx512fandavx512vl
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_maskz_ gf2p8affine_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_maskz_ gf2p8affineinv_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_maskz_ gf2p8mul_ epi8 gfniandavx512bwandavx512vl
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm256_maskz_ insertf32x4 avx512fandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ insertf64x2 avx512dqandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ inserti32x4 avx512fandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ inserti64x2 avx512dqandavx512vl
- Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠load_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠load_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠load_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠load_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠loadu_ epi8 avx512bwandavx512vl
- Load packed 8-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ epi16 avx512bwandavx512vl
- Load packed 16-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ lzcnt_ epi32 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ lzcnt_ epi64 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ madd52hi_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ madd52lo_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ madd_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ maddubs_ epi16 avx512bwandavx512vl
- Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ max_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ min_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mov_ epi8 avx512bwandavx512vl
- Move packed 8-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mov_ epi16 avx512bwandavx512vl
- Move packed 16-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mov_ epi32 avx512fandavx512vl
- Move packed 32-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mov_ epi64 avx512fandavx512vl
- Move packed 64-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mov_ pd avx512fandavx512vl
- Move packed double-precision (64-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mov_ ps avx512fandavx512vl
- Move packed single-precision (32-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ movedup_ pd avx512fandavx512vl
- Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ movehdup_ ps avx512fandavx512vl
- Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ moveldup_ ps avx512fandavx512vl
- Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mul_ epi32 avx512fandavx512vl
- Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mul_ epu32 avx512fandavx512vl
- Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mul_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mul_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mulhi_ epi16 avx512bwandavx512vl
- Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mulhi_ epu16 avx512bwandavx512vl
- Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mulhrs_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mullo_ epi16 avx512bwandavx512vl
- Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mullo_ epi32 avx512fandavx512vl
- Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ mullo_ epi64 avx512dqandavx512vl
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing zeromaskk(elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ multishift_ epi64_ epi8 avx512vbmiandavx512vl
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ or_ epi32 avx512fandavx512vl
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ or_ epi64 avx512fandavx512vl
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ or_ pd avx512dqandavx512vl
- Compute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ or_ ps avx512dqandavx512vl
- Compute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ packs_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ packs_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ packus_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ packus_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permute_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permute_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutevar_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutevar_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutex_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutexvar_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutexvar_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutexvar_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutexvar_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutexvar_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ permutexvar_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ popcnt_ epi8 avx512bitalgandavx512vl
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ popcnt_ epi16 avx512bitalgandavx512vl
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ popcnt_ epi32 avx512vpopcntdqandavx512vl
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ popcnt_ epi64 avx512vpopcntdqandavx512vl
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ range_ pd avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_maskz_ range_ ps avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_maskz_ rcp14_ pd avx512fandavx512vl
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ rcp14_ ps avx512fandavx512vl
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ reduce_ pd avx512dqandavx512vl
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_maskz_ reduce_ ps avx512dqandavx512vl
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_maskz_ rol_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ rol_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ rolv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ rolv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ror_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ror_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ rorv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ rorv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ roundscale_ pd avx512fandavx512vl
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ roundscale_ ps avx512fandavx512vl
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ rsqrt14_ pd avx512fandavx512vl
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ rsqrt14_ ps avx512fandavx512vl
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ scalef_ pd avx512fandavx512vl
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ scalef_ ps avx512fandavx512vl
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ set1_ epi8 avx512bwandavx512vl
- Broadcast 8-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ set1_ epi16 avx512bwandavx512vl
- Broadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ set1_ epi32 avx512fandavx512vl
- Broadcast 32-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ set1_ epi64 avx512fandavx512vl
- Broadcast 64-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shldi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shldi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shldi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shldv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shldv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shldv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shrdi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shrdi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shrdi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shrdv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shrdv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shrdv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ epi8 avx512bwandavx512vl
- Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ f32x4 avx512fandavx512vl
- Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ f64x2 avx512fandavx512vl
- Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ i32x4 avx512fandavx512vl
- Shuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ i64x2 avx512fandavx512vl
- Shuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shuffle_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shufflehi_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ shufflelo_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_maskz_ sll_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sll_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sll_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ slli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ slli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ slli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sllv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sllv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sllv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sqrt_ pd avx512fandavx512vl
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sqrt_ ps avx512fandavx512vl
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sra_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sra_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sra_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srai_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srai_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srai_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srav_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srav_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srav_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srl_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srl_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srl_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srlv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srlv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ srlv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sub_ epi8 avx512bwandavx512vl
- Subtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sub_ epi16 avx512bwandavx512vl
- Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sub_ epi32 avx512fandavx512vl
- Subtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sub_ epi64 avx512fandavx512vl
- Subtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sub_ pd avx512fandavx512vl
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sub_ ps avx512fandavx512vl
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ subs_ epi8 avx512bwandavx512vl
- Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ subs_ epi16 avx512bwandavx512vl
- Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ subs_ epu8 avx512bwandavx512vl
- Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ subs_ epu16 avx512bwandavx512vl
- Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ternarylogic_ epi32 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ternarylogic_ epi64 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpackhi_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpackhi_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpackhi_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpackhi_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpackhi_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpackhi_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpacklo_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpacklo_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpacklo_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpacklo_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpacklo_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ unpacklo_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ xor_ epi32 avx512fandavx512vl
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ xor_ epi64 avx512fandavx512vl
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ xor_ pd avx512dqandavx512vl
- Compute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ xor_ ps avx512dqandavx512vl
- Compute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_max_ epi8 avx2
- Compares packed 8-bit integers in aandb, and returns the packed maximum values.
- _mm256_max_ epi16 avx2
- Compares packed 16-bit integers in aandb, and returns the packed maximum values.
- _mm256_max_ epi32 avx2
- Compares packed 32-bit integers in aandb, and returns the packed maximum values.
- _mm256_max_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst.
- _mm256_max_ epu8 avx2
- Compares packed unsigned 8-bit integers in aandb, and returns the packed maximum values.
- _mm256_max_ epu16 avx2
- Compares packed unsigned 16-bit integers in aandb, and returns the packed maximum values.
- _mm256_max_ epu32 avx2
- Compares packed unsigned 32-bit integers in aandb, and returns the packed maximum values.
- _mm256_max_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst.
- _mm256_max_ pd avx
- Compares packed double-precision (64-bit) floating-point elements
in aandb, and returns packed maximum values
- _mm256_max_ ps avx
- Compares packed single-precision (32-bit) floating-point elements in aandb, and returns packed maximum values
- _mm256_min_ epi8 avx2
- Compares packed 8-bit integers in aandb, and returns the packed minimum values.
- _mm256_min_ epi16 avx2
- Compares packed 16-bit integers in aandb, and returns the packed minimum values.
- _mm256_min_ epi32 avx2
- Compares packed 32-bit integers in aandb, and returns the packed minimum values.
- _mm256_min_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst.
- _mm256_min_ epu8 avx2
- Compares packed unsigned 8-bit integers in aandb, and returns the packed minimum values.
- _mm256_min_ epu16 avx2
- Compares packed unsigned 16-bit integers in aandb, and returns the packed minimum values.
- _mm256_min_ epu32 avx2
- Compares packed unsigned 32-bit integers in aandb, and returns the packed minimum values.
- _mm256_min_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst.
- _mm256_min_ pd avx
- Compares packed double-precision (64-bit) floating-point elements
in aandb, and returns packed minimum values
- _mm256_min_ ps avx
- Compares packed single-precision (32-bit) floating-point elements in aandb, and returns packed minimum values
- _mm256_mmask_ ⚠i32gather_ epi32 avx512fandavx512vl
- Loads 8 32-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i32gather_ epi64 avx512fandavx512vl
- Loads 4 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i32gather_ pd avx512fandavx512vl
- Loads 4 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i32gather_ ps avx512fandavx512vl
- Loads 8 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ epi32 avx512fandavx512vl
- Loads 4 32-bit integer elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ epi64 avx512fandavx512vl
- Loads 4 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ pd avx512fandavx512vl
- Loads 4 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ ps avx512fandavx512vl
- Loads 4 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_movedup_ pd avx
- Duplicate even-indexed double-precision (64-bit) floating-point elements
from a, and returns the results.
- _mm256_movehdup_ ps avx
- Duplicate odd-indexed single-precision (32-bit) floating-point elements
from a, and returns the results.
- _mm256_moveldup_ ps avx
- Duplicate even-indexed single-precision (32-bit) floating-point elements
from a, and returns the results.
- _mm256_movemask_ epi8 avx2
- Creates mask from the most significant bit of each 8-bit element in a, return the result.
- _mm256_movemask_ pd avx
- Sets each bit of the returned mask based on the most significant bit of the
corresponding packed double-precision (64-bit) floating-point element in
a.
- _mm256_movemask_ ps avx
- Sets each bit of the returned mask based on the most significant bit of the
corresponding packed single-precision (32-bit) floating-point element in
a.
- _mm256_movepi8_ mask avx512bwandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 8-bit integer in a.
- _mm256_movepi16_ mask avx512bwandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 16-bit integer in a.
- _mm256_movepi32_ mask avx512dqandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 32-bit integer in a.
- _mm256_movepi64_ mask avx512dqandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 64-bit integer in a.
- _mm256_movm_ epi8 avx512bwandavx512vl
- Set each packed 8-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_movm_ epi16 avx512bwandavx512vl
- Set each packed 16-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_movm_ epi32 avx512dqandavx512vl
- Set each packed 32-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_movm_ epi64 avx512dqandavx512vl
- Set each packed 64-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_mpsadbw_ epu8 avx2
- Computes the sum of absolute differences (SADs) of quadruplets of unsigned
8-bit integers in acompared to those inb, and stores the 16-bit results in dst. Eight SADs are performed for each 128-bit lane using one quadruplet fromband eight quadruplets froma. One quadruplet is selected frombstarting at on the offset specified inimm8. Eight quadruplets are formed from sequential 8-bit integers selected fromastarting at the offset specified inimm8.
- _mm256_mul_ epi32 avx2
- Multiplies the low 32-bit integers from each packed 64-bit element in
aandb
- _mm256_mul_ epu32 avx2
- Multiplies the low unsigned 32-bit integers from each packed 64-bit
element in aandb
- _mm256_mul_ pd avx
- Multiplies packed double-precision (64-bit) floating-point elements
in aandb.
- _mm256_mul_ ps avx
- Multiplies packed single-precision (32-bit) floating-point elements in aandb.
- _mm256_mulhi_ epi16 avx2
- Multiplies the packed 16-bit integers in aandb, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
- _mm256_mulhi_ epu16 avx2
- Multiplies the packed unsigned 16-bit integers in aandb, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
- _mm256_mulhrs_ epi16 avx2
- Multiplies packed 16-bit integers in aandb, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and return bits[16:1].
- _mm256_mullo_ epi16 avx2
- Multiplies the packed 16-bit integers in aandb, producing intermediate 32-bit integers, and returns the low 16 bits of the intermediate integers
- _mm256_mullo_ epi32 avx2
- Multiplies the packed 32-bit integers in aandb, producing intermediate 64-bit integers, and returns the low 32 bits of the intermediate integers
- _mm256_mullo_ epi64 avx512dqandavx512vl
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indst.
- _mm256_multishift_ epi64_ epi8 avx512vbmiandavx512vl
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst.
- _mm256_or_ epi32 avx512fandavx512vl
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst.
- _mm256_or_ epi64 avx512fandavx512vl
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the resut in dst.
- _mm256_or_ pd avx
- Computes the bitwise OR packed double-precision (64-bit) floating-point
elements in aandb.
- _mm256_or_ ps avx
- Computes the bitwise OR packed single-precision (32-bit) floating-point
elements in aandb.
- _mm256_or_ si256 avx2
- Computes the bitwise OR of 256 bits (representing integer data) in aandb
- _mm256_packs_ epi16 avx2
- Converts packed 16-bit integers from aandbto packed 8-bit integers using signed saturation
- _mm256_packs_ epi32 avx2
- Converts packed 32-bit integers from aandbto packed 16-bit integers using signed saturation
- _mm256_packus_ epi16 avx2
- Converts packed 16-bit integers from aandbto packed 8-bit integers using unsigned saturation
- _mm256_packus_ epi32 avx2
- Converts packed 32-bit integers from aandbto packed 16-bit integers using unsigned saturation
- _mm256_permute2f128_ pd avx
- Shuffles 256 bits (composed of 4 packed double-precision (64-bit)
floating-point elements) selected by imm8fromaandb.
- _mm256_permute2f128_ ps avx
- Shuffles 256 bits (composed of 8 packed single-precision (32-bit)
floating-point elements) selected by imm8fromaandb.
- _mm256_permute2f128_ si256 avx
- Shuffles 128-bits (composed of integer data) selected by imm8fromaandb.
- _mm256_permute2x128_ si256 avx2
- Shuffles 128-bits of integer data selected by imm8fromaandb.
- _mm256_permute4x64_ epi64 avx2
- Permutes 64-bit integers from ausing control maskimm8.
- _mm256_permute4x64_ pd avx2
- Shuffles 64-bit floating-point elements in aacross lanes using the control inimm8.
- _mm256_permute_ pd avx
- Shuffles double-precision (64-bit) floating-point elements in awithin 128-bit lanes using the control inimm8.
- _mm256_permute_ ps avx
- Shuffles single-precision (32-bit) floating-point elements in awithin 128-bit lanes using the control inimm8.
- _mm256_permutevar8x32_ epi32 avx2
- Permutes packed 32-bit integers from aaccording to the content ofb.
- _mm256_permutevar8x32_ ps avx2
- Shuffles eight 32-bit floating-point elements in aacross lanes using the corresponding 32-bit integer index inidx.
- _mm256_permutevar_ pd avx
- Shuffles double-precision (64-bit) floating-point elements in awithin 256-bit lanes using the control inb.
- _mm256_permutevar_ ps avx
- Shuffles single-precision (32-bit) floating-point elements in awithin 128-bit lanes using the control inb.
- _mm256_permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm256_permutex_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm256_permutexvar_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.
- _mm256_popcnt_ epi8 avx512bitalgandavx512vl
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm256_popcnt_ epi16 avx512bitalgandavx512vl
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm256_popcnt_ epi32 avx512vpopcntdqandavx512vl
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm256_popcnt_ epi64 avx512vpopcntdqandavx512vl
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm256_range_ pd avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_range_ ps avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_rcp14_ pd avx512fandavx512vl
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rcp14_ ps avx512fandavx512vl
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rcp_ ps avx
- Computes the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm256_reduce_ add_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by addition. Returns the sum of all elements in a.
- _mm256_reduce_ add_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by addition. Returns the sum of all elements in a.
- _mm256_reduce_ and_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm256_reduce_ and_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm256_reduce_ max_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ max_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ max_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ max_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ min_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ min_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ min_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ min_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ mul_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm256_reduce_ mul_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm256_reduce_ or_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm256_reduce_ or_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm256_reduce_ pd avx512dqandavx512vl
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_reduce_ ps avx512dqandavx512vl
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_rol_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm256_rol_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm256_rolv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_rolv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_ror_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm256_ror_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm256_rorv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_rorv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_round_ pd avx
- Rounds packed double-precision (64-bit) floating point elements in aaccording to the flagROUNDING. The value ofROUNDINGmay be as follows:
- _mm256_round_ ps avx
- Rounds packed single-precision (32-bit) floating point elements in aaccording to the flagROUNDING. The value ofROUNDINGmay be as follows:
- _mm256_roundscale_ pd avx512fandavx512vl
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_roundscale_ ps avx512fandavx512vl
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_rsqrt14_ pd avx512fandavx512vl
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rsqrt14_ ps avx512fandavx512vl
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rsqrt_ ps avx
- Computes the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm256_sad_ epu8 avx2
- Computes the absolute differences of packed unsigned 8-bit integers in aandb, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of the 64-bit return value
- _mm256_scalef_ pd avx512fandavx512vl
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_scalef_ ps avx512fandavx512vl
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_set1_ epi8 avx
- Broadcasts 8-bit integer ato all elements of returned vector. This intrinsic may generate thevpbroadcastb.
- _mm256_set1_ epi16 avx
- Broadcasts 16-bit integer ato all elements of returned vector. This intrinsic may generate thevpbroadcastw.
- _mm256_set1_ epi32 avx
- Broadcasts 32-bit integer ato all elements of returned vector. This intrinsic may generate thevpbroadcastd.
- _mm256_set1_ epi64x avx
- Broadcasts 64-bit integer ato all elements of returned vector. This intrinsic may generate thevpbroadcastq.
- _mm256_set1_ pd avx
- Broadcasts double-precision (64-bit) floating-point value ato all elements of returned vector.
- _mm256_set1_ ps avx
- Broadcasts single-precision (32-bit) floating-point value ato all elements of returned vector.
- _mm256_set_ epi8 avx
- Sets packed 8-bit integers in returned vector with the supplied values.
- _mm256_set_ epi16 avx
- Sets packed 16-bit integers in returned vector with the supplied values.
- _mm256_set_ epi32 avx
- Sets packed 32-bit integers in returned vector with the supplied values.
- _mm256_set_ epi64x avx
- Sets packed 64-bit integers in returned vector with the supplied values.
- _mm256_set_ m128 avx
- Sets packed __m256 returned vector with the supplied values.
- _mm256_set_ m128d avx
- Sets packed __m256d returned vector with the supplied values.
- _mm256_set_ m128i avx
- Sets packed __m256i returned vector with the supplied values.
- _mm256_set_ pd avx
- Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values.
- _mm256_set_ ps avx
- Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values.
- _mm256_setr_ epi8 avx
- Sets packed 8-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ epi16 avx
- Sets packed 16-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ epi32 avx
- Sets packed 32-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ epi64x avx
- Sets packed 64-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ m128 avx
- Sets packed __m256 returned vector with the supplied values.
- _mm256_setr_ m128d avx
- Sets packed __m256d returned vector with the supplied values.
- _mm256_setr_ m128i avx
- Sets packed __m256i returned vector with the supplied values.
- _mm256_setr_ pd avx
- Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order.
- _mm256_setr_ ps avx
- Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order.
- _mm256_setzero_ pd avx
- Returns vector of type __m256d with all elements set to zero.
- _mm256_setzero_ ps avx
- Returns vector of type __m256 with all elements set to zero.
- _mm256_setzero_ si256 avx
- Returns vector of type __m256i with all elements set to zero.
- _mm256_sha512msg1_ epi64 sha512andavx
- This intrinsic is one of the two SHA512 message scheduling instructions. The intrinsic performs an intermediate calculation for the next four SHA512 message qwords. The calculated results are stored in dst.
- _mm256_sha512msg2_ epi64 sha512andavx
- This intrinsic is one of the two SHA512 message scheduling instructions. The intrinsic performs the final calculation for the next four SHA512 message qwords. The calculated results are stored in dst.
- _mm256_sha512rnds2_ epi64 sha512andavx
- This intrinsic performs two rounds of SHA512 operation using initial SHA512 state
(C,D,G,H)froma, an initial SHA512 state(A,B,E,F)fromb, and a pre-computed sum of the next two round message qwords and the corresponding round constants fromc(only the two lower qwords of the third operand). The updated SHA512 state(A,B,E,F)is written to dst, and dst can be used as the updated state(C,D,G,H)in later rounds.
- _mm256_shldi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst).
- _mm256_shldi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst.
- _mm256_shldi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst).
- _mm256_shldv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst.
- _mm256_shldv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst.
- _mm256_shldv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst.
- _mm256_shrdi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst.
- _mm256_shrdi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst.
- _mm256_shrdi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst.
- _mm256_shrdv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst.
- _mm256_shrdv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.
- _mm256_shrdv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst.
- _mm256_shuffle_ epi8 avx2
- Shuffles bytes from aaccording to the content ofb.
- _mm256_shuffle_ epi32 avx2
- Shuffles 32-bit integers in 128-bit lanes of ausing the control inimm8.
- _mm256_shuffle_ f32x4 avx512fandavx512vl
- Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm256_shuffle_ f64x2 avx512fandavx512vl
- Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm256_shuffle_ i32x4 avx512fandavx512vl
- Shuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm256_shuffle_ i64x2 avx512fandavx512vl
- Shuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm256_shuffle_ pd avx
- Shuffles double-precision (64-bit) floating-point elements within 128-bit
lanes using the control in imm8.
- _mm256_shuffle_ ps avx
- Shuffles single-precision (32-bit) floating-point elements in awithin 128-bit lanes using the control inimm8.
- _mm256_shufflehi_ epi16 avx2
- Shuffles 16-bit integers in the high 64 bits of 128-bit lanes of ausing the control inimm8. The low 64 bits of 128-bit lanes ofaare copied to the output.
- _mm256_shufflelo_ epi16 avx2
- Shuffles 16-bit integers in the low 64 bits of 128-bit lanes of ausing the control inimm8. The high 64 bits of 128-bit lanes ofaare copied to the output.
- _mm256_sign_ epi8 avx2
- Negates packed 8-bit integers in awhen the corresponding signed 8-bit integer inbis negative, and returns the results. Results are zeroed out when the corresponding element inbis zero.
- _mm256_sign_ epi16 avx2
- Negates packed 16-bit integers in awhen the corresponding signed 16-bit integer inbis negative, and returns the results. Results are zeroed out when the corresponding element inbis zero.
- _mm256_sign_ epi32 avx2
- Negates packed 32-bit integers in awhen the corresponding signed 32-bit integer inbis negative, and returns the results. Results are zeroed out when the corresponding element inbis zero.
- _mm256_sll_ epi16 avx2
- Shifts packed 16-bit integers in aleft bycountwhile shifting in zeros, and returns the result
- _mm256_sll_ epi32 avx2
- Shifts packed 32-bit integers in aleft bycountwhile shifting in zeros, and returns the result
- _mm256_sll_ epi64 avx2
- Shifts packed 64-bit integers in aleft bycountwhile shifting in zeros, and returns the result
- _mm256_slli_ epi16 avx2
- Shifts packed 16-bit integers in aleft byIMM8while shifting in zeros, return the results;
- _mm256_slli_ epi32 avx2
- Shifts packed 32-bit integers in aleft byIMM8while shifting in zeros, return the results;
- _mm256_slli_ epi64 avx2
- Shifts packed 64-bit integers in aleft byIMM8while shifting in zeros, return the results;
- _mm256_slli_ si256 avx2
- Shifts 128-bit lanes in aleft byimm8bytes while shifting in zeros.
- _mm256_sllv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm256_sllv_ epi32 avx2
- Shifts packed 32-bit integers in aleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm256_sllv_ epi64 avx2
- Shifts packed 64-bit integers in aleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm256_sm4key4_ epi32 sm4andavx
- This intrinsic performs four rounds of SM4 key expansion. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in dst.
- _mm256_sm4rnds4_ epi32 sm4andavx
- This intrinsic performs four rounds of SM4 encryption. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in dst.
- _mm256_sqrt_ pd avx
- Returns the square root of packed double-precision (64-bit) floating point
elements in a.
- _mm256_sqrt_ ps avx
- Returns the square root of packed single-precision (32-bit) floating point
elements in a.
- _mm256_sra_ epi16 avx2
- Shifts packed 16-bit integers in aright bycountwhile shifting in sign bits.
- _mm256_sra_ epi32 avx2
- Shifts packed 32-bit integers in aright bycountwhile shifting in sign bits.
- _mm256_sra_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm256_srai_ epi16 avx2
- Shifts packed 16-bit integers in aright byIMM8while shifting in sign bits.
- _mm256_srai_ epi32 avx2
- Shifts packed 32-bit integers in aright byIMM8while shifting in sign bits.
- _mm256_srai_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm256_srav_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm256_srav_ epi32 avx2
- Shifts packed 32-bit integers in aright by the amount specified by the corresponding element incountwhile shifting in sign bits.
- _mm256_srav_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm256_srl_ epi16 avx2
- Shifts packed 16-bit integers in aright bycountwhile shifting in zeros.
- _mm256_srl_ epi32 avx2
- Shifts packed 32-bit integers in aright bycountwhile shifting in zeros.
- _mm256_srl_ epi64 avx2
- Shifts packed 64-bit integers in aright bycountwhile shifting in zeros.
- _mm256_srli_ epi16 avx2
- Shifts packed 16-bit integers in aright byIMM8while shifting in zeros
- _mm256_srli_ epi32 avx2
- Shifts packed 32-bit integers in aright byIMM8while shifting in zeros
- _mm256_srli_ epi64 avx2
- Shifts packed 64-bit integers in aright byIMM8while shifting in zeros
- _mm256_srli_ si256 avx2
- Shifts 128-bit lanes in aright byimm8bytes while shifting in zeros.
- _mm256_srlv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm256_srlv_ epi32 avx2
- Shifts packed 32-bit integers in aright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm256_srlv_ epi64 avx2
- Shifts packed 64-bit integers in aright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm256_store_ ⚠epi32 avx512fandavx512vl
- Store 256-bits (composed of 8 packed 32-bit integers) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠epi64 avx512fandavx512vl
- Store 256-bits (composed of 4 packed 64-bit integers) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠pd avx
- Stores 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from ainto memory.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠ps avx
- Stores 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from ainto memory.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠si256 avx
- Stores 256-bits of integer data from ainto memory.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_storeu2_ ⚠m128 avx
- Stores the high and low 128-bit halves (each composed of 4 packed
single-precision (32-bit) floating-point elements) from ainto memory two different 128-bit locations.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_storeu2_ ⚠m128d avx
- Stores the high and low 128-bit halves (each composed of 2 packed
double-precision (64-bit) floating-point elements) from ainto memory two different 128-bit locations.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_storeu2_ ⚠m128i avx
- Stores the high and low 128-bit halves (each composed of integer data) from
ainto memory two different 128-bit locations.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠epi8 avx512bwandavx512vl
- Store 256-bits (composed of 32 packed 8-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠epi16 avx512bwandavx512vl
- Store 256-bits (composed of 16 packed 16-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠epi32 avx512fandavx512vl
- Store 256-bits (composed of 8 packed 32-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠epi64 avx512fandavx512vl
- Store 256-bits (composed of 4 packed 64-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠pd avx
- Stores 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from ainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠ps avx
- Stores 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from ainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠si256 avx
- Stores 256-bits of integer data from ainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_stream_ ⚠load_ si256 avx2
- Load 256-bits of integer data from memory into dst using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm256_stream_ ⚠pd avx
- Moves double-precision values from a 256-bit vector of [4 x double]to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm256_stream_ ⚠ps avx
- Moves single-precision floating point values from a 256-bit vector
of [8 x float]to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm256_stream_ ⚠si256 avx
- Moves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm256_sub_ epi8 avx2
- Subtract packed 8-bit integers in bfrom packed 8-bit integers ina
- _mm256_sub_ epi16 avx2
- Subtract packed 16-bit integers in bfrom packed 16-bit integers ina
- _mm256_sub_ epi32 avx2
- Subtract packed 32-bit integers in bfrom packed 32-bit integers ina
- _mm256_sub_ epi64 avx2
- Subtract packed 64-bit integers in bfrom packed 64-bit integers ina
- _mm256_sub_ pd avx
- Subtracts packed double-precision (64-bit) floating-point elements in bfrom packed elements ina.
- _mm256_sub_ ps avx
- Subtracts packed single-precision (32-bit) floating-point elements in bfrom packed elements ina.
- _mm256_subs_ epi8 avx2
- Subtract packed 8-bit integers in bfrom packed 8-bit integers inausing saturation.
- _mm256_subs_ epi16 avx2
- Subtract packed 16-bit integers in bfrom packed 16-bit integers inausing saturation.
- _mm256_subs_ epu8 avx2
- Subtract packed unsigned 8-bit integers in bfrom packed 8-bit integers inausing saturation.
- _mm256_subs_ epu16 avx2
- Subtract packed unsigned 16-bit integers in bfrom packed 16-bit integers inausing saturation.
- _mm256_ternarylogic_ epi32 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm256_ternarylogic_ epi64 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm256_test_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_test_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_test_ epi32_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_test_ epi64_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_testc_ pd avx
- Computes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in aandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm256_testc_ ps avx
- Computes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in aandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm256_testc_ si256 avx
- Computes the bitwise AND of 256 bits (representing integer data) in aandb, and setZFto 1 if the result is zero, otherwise setZFto 0. Computes the bitwise NOT ofaand then AND withb, and setCFto 1 if the result is zero, otherwise setCFto 0. Return theCFvalue.
- _mm256_testn_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_testn_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_testn_ epi32_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_testn_ epi64_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_testnzc_ pd avx
- Computes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in aandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm256_testnzc_ ps avx
- Computes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in aandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm256_testnzc_ si256 avx
- Computes the bitwise AND of 256 bits (representing integer data) in aandb, and setZFto 1 if the result is zero, otherwise setZFto 0. Computes the bitwise NOT ofaand then AND withb, and setCFto 1 if the result is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm256_testz_ pd avx
- Computes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in aandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm256_testz_ ps avx
- Computes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in aandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm256_testz_ si256 avx
- Computes the bitwise AND of 256 bits (representing integer data) in aandb, and setZFto 1 if the result is zero, otherwise setZFto 0. Computes the bitwise NOT ofaand then AND withb, and setCFto 1 if the result is zero, otherwise setCFto 0. Return theZFvalue.
- _mm256_undefined_ pd avx
- Returns vector of type __m256dwith indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm256_undefined_ ps avx
- Returns vector of type __m256with indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm256_undefined_ si256 avx
- Returns vector of type __m256i with with indeterminate elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm256_unpackhi_ epi8 avx2
- Unpacks and interleave 8-bit integers from the high half of each
128-bit lane in aandb.
- _mm256_unpackhi_ epi16 avx2
- Unpacks and interleave 16-bit integers from the high half of each
128-bit lane of aandb.
- _mm256_unpackhi_ epi32 avx2
- Unpacks and interleave 32-bit integers from the high half of each
128-bit lane of aandb.
- _mm256_unpackhi_ epi64 avx2
- Unpacks and interleave 64-bit integers from the high half of each
128-bit lane of aandb.
- _mm256_unpackhi_ pd avx
- Unpacks and interleave double-precision (64-bit) floating-point elements
from the high half of each 128-bit lane in aandb.
- _mm256_unpackhi_ ps avx
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the high half of each 128-bit lane in aandb.
- _mm256_unpacklo_ epi8 avx2
- Unpacks and interleave 8-bit integers from the low half of each
128-bit lane of aandb.
- _mm256_unpacklo_ epi16 avx2
- Unpacks and interleave 16-bit integers from the low half of each
128-bit lane of aandb.
- _mm256_unpacklo_ epi32 avx2
- Unpacks and interleave 32-bit integers from the low half of each
128-bit lane of aandb.
- _mm256_unpacklo_ epi64 avx2
- Unpacks and interleave 64-bit integers from the low half of each
128-bit lane of aandb.
- _mm256_unpacklo_ pd avx
- Unpacks and interleave double-precision (64-bit) floating-point elements
from the low half of each 128-bit lane in aandb.
- _mm256_unpacklo_ ps avx
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the low half of each 128-bit lane in aandb.
- _mm256_xor_ epi32 avx512fandavx512vl
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst.
- _mm256_xor_ epi64 avx512fandavx512vl
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst.
- _mm256_xor_ pd avx
- Computes the bitwise XOR of packed double-precision (64-bit) floating-point
elements in aandb.
- _mm256_xor_ ps avx
- Computes the bitwise XOR of packed single-precision (32-bit) floating-point
elements in aandb.
- _mm256_xor_ si256 avx2
- Computes the bitwise XOR of 256 bits (representing integer data)
in aandb
- _mm256_zeroall avx
- Zeroes the contents of all XMM or YMM registers.
- _mm256_zeroupper avx
- Zeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
- _mm256_zextpd128_ pd256 avx
- Constructs a 256-bit floating-point vector of [4 x double]from a 128-bit floating-point vector of[2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
- _mm256_zextps128_ ps256 avx
- Constructs a 256-bit floating-point vector of [8 x float]from a 128-bit floating-point vector of[4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
- _mm256_zextsi128_ si256 avx
- Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
- _mm512_abs_ epi8 avx512bw
- Compute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst.
- _mm512_abs_ epi16 avx512bw
- Compute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst.
- _mm512_abs_ epi32 avx512f
- Computes the absolute values of packed 32-bit integers in a.
- _mm512_abs_ epi64 avx512f
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst.
- _mm512_abs_ pd avx512f
- Finds the absolute value of each packed double-precision (64-bit) floating-point element in v2, storing the results in dst.
- _mm512_abs_ ps avx512f
- Finds the absolute value of each packed single-precision (32-bit) floating-point element in v2, storing the results in dst.
- _mm512_add_ epi8 avx512bw
- Add packed 8-bit integers in a and b, and store the results in dst.
- _mm512_add_ epi16 avx512bw
- Add packed 16-bit integers in a and b, and store the results in dst.
- _mm512_add_ epi32 avx512f
- Add packed 32-bit integers in a and b, and store the results in dst.
- _mm512_add_ epi64 avx512f
- Add packed 64-bit integers in a and b, and store the results in dst.
- _mm512_add_ pd avx512f
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_add_ ps avx512f
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_add_ round_ pd avx512f
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_add_ round_ ps avx512f
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_adds_ epi8 avx512bw
- Add packed signed 8-bit integers in a and b using saturation, and store the results in dst.
- _mm512_adds_ epi16 avx512bw
- Add packed signed 16-bit integers in a and b using saturation, and store the results in dst.
- _mm512_adds_ epu8 avx512bw
- Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst.
- _mm512_adds_ epu16 avx512bw
- Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst.
- _mm512_aesdec_ epi128 vaesandavx512f
- Performs one round of an AES decryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm512_aesdeclast_ epi128 vaesandavx512f
- Performs the last round of an AES decryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm512_aesenc_ epi128 vaesandavx512f
- Performs one round of an AES encryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm512_aesenclast_ epi128 vaesandavx512f
- Performs the last round of an AES encryption flow on each 128-bit word (state) in ausing the corresponding 128-bit word (key) inround_key.
- _mm512_alignr_ epi8 avx512bw
- Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst.
Unlike _mm_alignr_epi8,_mm256_alignr_epi8functions, where the entire input vectors are concatenated to the temporary result, this concatenation happens in 4 steps, where each step builds 32-byte temporary result.
- _mm512_alignr_ epi32 avx512f
- Concatenate a and b into a 128-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 64 bytes (16 elements) in dst.
- _mm512_alignr_ epi64 avx512f
- Concatenate a and b into a 128-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 64 bytes (8 elements) in dst.
- _mm512_and_ epi32 avx512f
- Compute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst.
- _mm512_and_ epi64 avx512f
- Compute the bitwise AND of 512 bits (composed of packed 64-bit integers) in a and b, and store the results in dst.
- _mm512_and_ pd avx512dq
- Compute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst.
- _mm512_and_ ps avx512dq
- Compute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst.
- _mm512_and_ si512 avx512f
- Compute the bitwise AND of 512 bits (representing integer data) in a and b, and store the result in dst.
- _mm512_andnot_ epi32 avx512f
- Compute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst.
- _mm512_andnot_ epi64 avx512f
- Compute the bitwise NOT of 512 bits (composed of packed 64-bit integers) in a and then AND with b, and store the results in dst.
- _mm512_andnot_ pd avx512dq
- Compute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst.
- _mm512_andnot_ ps avx512dq
- Compute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst.
- _mm512_andnot_ si512 avx512f
- Compute the bitwise NOT of 512 bits (representing integer data) in a and then AND with b, and store the result in dst.
- _mm512_avg_ epu8 avx512bw
- Average packed unsigned 8-bit integers in a and b, and store the results in dst.
- _mm512_avg_ epu16 avx512bw
- Average packed unsigned 16-bit integers in a and b, and store the results in dst.
- _mm512_bitshuffle_ epi64_ mask avx512bitalg
- Considers the input bas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm512_broadcast_ f32x2 avx512dq
- Broadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ f32x4 avx512f
- Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ f32x8 avx512dq
- Broadcasts the 8 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ f64x2 avx512dq
- Broadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ f64x4 avx512f
- Broadcast the 4 packed double-precision (64-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ i32x2 avx512dq
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst.
- _mm512_broadcast_ i32x4 avx512f
- Broadcast the 4 packed 32-bit integers from a to all elements of dst.
- _mm512_broadcast_ i32x8 avx512dq
- Broadcasts the 8 packed 32-bit integers from a to all elements of dst.
- _mm512_broadcast_ i64x2 avx512dq
- Broadcasts the 2 packed 64-bit integers from a to all elements of dst.
- _mm512_broadcast_ i64x4 avx512f
- Broadcast the 4 packed 64-bit integers from a to all elements of dst.
- _mm512_broadcastb_ epi8 avx512bw
- Broadcast the low packed 8-bit integer from a to all elements of dst.
- _mm512_broadcastd_ epi32 avx512f
- Broadcast the low packed 32-bit integer from a to all elements of dst.
- _mm512_broadcastmb_ epi64 avx512cd
- Broadcast the low 8-bits from input mask k to all 64-bit elements of dst.
- _mm512_broadcastmw_ epi32 avx512cd
- Broadcast the low 16-bits from input mask k to all 32-bit elements of dst.
- _mm512_broadcastq_ epi64 avx512f
- Broadcast the low packed 64-bit integer from a to all elements of dst.
- _mm512_broadcastsd_ pd avx512f
- Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst.
- _mm512_broadcastss_ ps avx512f
- Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst.
- _mm512_broadcastw_ epi16 avx512bw
- Broadcast the low packed 16-bit integer from a to all elements of dst.
- _mm512_bslli_ epi128 avx512bw
- Shift 128-bit lanes in a left by imm8 bytes while shifting in zeros, and store the results in dst.
- _mm512_bsrli_ epi128 avx512bw
- Shift 128-bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst.
- _mm512_castpd128_ pd512 avx512f
- Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd256_ pd512 avx512f
- Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd512_ pd128 avx512f
- Cast vector of type __m512d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd512_ pd256 avx512f
- Cast vector of type __m512d to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd_ ps avx512f
- Cast vector of type __m512d to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd_ si512 avx512f
- Cast vector of type __m512d to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps128_ ps512 avx512f
- Cast vector of type __m128 to type __m512; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps256_ ps512 avx512f
- Cast vector of type __m256 to type __m512; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps512_ ps128 avx512f
- Cast vector of type __m512 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps512_ ps256 avx512f
- Cast vector of type __m512 to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps_ pd avx512f
- Cast vector of type __m512 to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps_ si512 avx512f
- Cast vector of type __m512 to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi128_ si512 avx512f
- Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi256_ si512 avx512f
- Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ pd avx512f
- Cast vector of type __m512i to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ ps avx512f
- Cast vector of type __m512i to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ si128 avx512f
- Cast vector of type __m512i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ si256 avx512f
- Cast vector of type __m512i to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_clmulepi64_ epi128 vpclmulqdqandavx512f
- Performs a carry-less multiplication of two 64-bit polynomials over the finite field GF(2) - in each of the 4 128-bit lanes.
- _mm512_cmp_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b based on the comparison operand specified by IMM8, and store the results in mask vector k.
- _mm512_cmp_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ round_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cmp_ round_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cmpeq_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ epi32_ mask avx512f
- Compare packed 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ epi64_ mask avx512f
- Compare packed 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for equality, and store the results in mask vector k.
- _mm512_cmpge_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpgt_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmple_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmplt_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for less-than, and store the results in mask vector k.
- _mm512_cmpneq_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ epi32_ mask avx512f
- Compare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpnle_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k.
- _mm512_cmpnle_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k.
- _mm512_cmpnlt_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k.
- _mm512_cmpnlt_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k.
- _mm512_cmpord_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k.
- _mm512_cmpord_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k.
- _mm512_cmpunord_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k.
- _mm512_cmpunord_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k.
- _mm512_conflict_ epi32 avx512cd
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_conflict_ epi64 avx512cd
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_cvt_ roundepi32_ ps avx512f
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.\
- _mm512_cvt_ roundepi64_ pd avx512dq
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundepi64_ ps avx512dq
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundepu32_ ps avx512f
- Convert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.\
- _mm512_cvt_ roundepu64_ pd avx512dq
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundepu64_ ps avx512dq
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.\
- _mm512_cvt_ roundpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.\
- _mm512_cvt_ roundpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundpd_ ps avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.\
- _mm512_cvt_ roundph_ ps avx512f
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvt_ roundps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvt_ roundps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.\
- _mm512_cvt_ roundps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ roundps_ pd avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvt_ roundps_ ph avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:
- _mm512_cvtepi8_ epi16 avx512bw
- Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvtepi8_ epi32 avx512f
- Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepi8_ epi64 avx512f
- Sign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepi16_ epi8 avx512bw
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi16_ epi32 avx512f
- Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepi16_ epi64 avx512f
- Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepi32_ epi8 avx512f
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi32_ epi16 avx512f
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi32_ epi64 avx512f
- Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepi32_ pd avx512f
- Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi32_ ps avx512f
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi32lo_ pd avx512f
- Performs element-by-element conversion of the lower half of packed 32-bit integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst.
- _mm512_cvtepi64_ epi8 avx512f
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi64_ epi16 avx512f
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi64_ epi32 avx512f
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi64_ pd avx512dq
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi64_ ps avx512dq
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu8_ epi16 avx512bw
- Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvtepu8_ epi32 avx512f
- Zero extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepu8_ epi64 avx512f
- Zero extend packed unsigned 8-bit integers in the low 8 byte sof a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepu16_ epi32 avx512f
- Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepu16_ epi64 avx512f
- Zero extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepu32_ epi64 avx512f
- Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepu32_ pd avx512f
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu32_ ps avx512f
- Convert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu32lo_ pd avx512f
- Performs element-by-element conversion of the lower half of packed 32-bit unsigned integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst.
- _mm512_cvtepu64_ pd avx512dq
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu64_ ps avx512dq
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtne2ps_ pbh avx512bf16andavx512f
- Convert packed single-precision (32-bit) floating-point elements in two 512-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 512-bit wide vector. Intel’s documentation
- _mm512_cvtneps_ pbh avx512bf16andavx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
- _mm512_cvtpbh_ ps avx512bf16andavx512f
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm512_cvtpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm512_cvtpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm512_cvtpd_ ps avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtpd_ pslo avx512f
- Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in v2 to single-precision (32-bit) floating-point elements and stores them in dst. The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.
- _mm512_cvtph_ ps avx512f
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm512_cvtps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm512_cvtps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm512_cvtps_ pd avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtps_ ph avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:
- _mm512_cvtpslo_ pd avx512f
- Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst.
- _mm512_cvtsd_ f64 avx512f
- Copy the lower double-precision (64-bit) floating-point element of a to dst.
- _mm512_cvtsepi16_ epi8 avx512bw
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi32_ epi8 avx512f
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi32_ epi16 avx512f
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi64_ epi8 avx512f
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi64_ epi16 avx512f
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi64_ epi32 avx512f
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsi512_ si32 avx512f
- Copy the lower 32-bit integer in a to dst.
- _mm512_cvtss_ f32 avx512f
- Copy the lower single-precision (32-bit) floating-point element of a to dst.
- _mm512_cvtt_ roundpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ roundpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvtt_ roundpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ roundpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvtt_ roundps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ roundps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvtt_ roundps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ roundps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvttpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm512_cvttpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm512_cvttps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm512_cvttps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm512_cvtusepi16_ epi8 avx512bw
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi32_ epi8 avx512f
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi32_ epi16 avx512f
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi64_ epi8 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi64_ epi16 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi64_ epi32 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst.
- _mm512_dbsad_ epu8 avx512bw
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm512_div_ pd avx512f
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst.
- _mm512_div_ ps avx512f
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst.
- _mm512_div_ round_ pd avx512f
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, =and store the results in dst.\
- _mm512_div_ round_ ps avx512f
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst.\
- _mm512_dpbf16_ ps avx512bf16andavx512f
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst.Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm512_dpbusd_ epi32 avx512vnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm512_dpbusds_ epi32 avx512vnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm512_dpwssd_ epi32 avx512vnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm512_dpwssds_ epi32 avx512vnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm512_extractf32x4_ ps avx512f
- Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
- _mm512_extractf32x8_ ps avx512dq
- Extracts 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst.
- _mm512_extractf64x2_ pd avx512dq
- Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst.
- _mm512_extractf64x4_ pd avx512f
- Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
- _mm512_extracti32x4_ epi32 avx512f
- Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM2, and store the result in dst.
- _mm512_extracti32x8_ epi32 avx512dq
- Extracts 256 bits (composed of 8 packed 32-bit integers) from a, selected with IMM8, and stores the result in dst.
- _mm512_extracti64x2_ epi64 avx512dq
- Extracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst.
- _mm512_extracti64x4_ epi64 avx512f
- Extract 256 bits (composed of 4 packed 64-bit integers) from a, selected with IMM1, and store the result in dst.
- _mm512_fixupimm_ pd avx512f
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm512_fixupimm_ ps avx512f
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm512_fixupimm_ round_ pd avx512f
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.\
- _mm512_fixupimm_ round_ ps avx512f
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.\
- _mm512_fmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fmaddsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmaddsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmaddsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.\
- _mm512_fmaddsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.\
- _mm512_fmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.\
- _mm512_fmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.\
- _mm512_fmsubadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.\
- _mm512_fmsubadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.\
- _mm512_fnmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.
- _mm512_fnmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.
- _mm512_fnmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fnmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fnmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fnmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fnmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.\
- _mm512_fnmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.\
- _mm512_fpclass_ pd_ mask avx512dq
- Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_fpclass_ ps_ mask avx512dq
- Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_getexp_ pd avx512f
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_getexp_ ps avx512f
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_getexp_ round_ pd avx512f
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_getexp_ round_ ps avx512f
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_getmant_ pd avx512f
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_getmant_ ps avx512f
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_getmant_ round_ pd avx512f
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_getmant_ round_ ps avx512f
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_gf2p8affine_ epi64_ epi8 gfniandavx512f
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_gf2p8affineinv_ epi64_ epi8 gfniandavx512f
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_gf2p8mul_ epi8 gfniandavx512f
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm512_i32gather_ ⚠epi32 avx512f
- Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32gather_ ⚠epi64 avx512f
- Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32gather_ ⚠pd avx512f
- Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32gather_ ⚠ps avx512f
- Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32logather_ ⚠epi64 avx512f
- Loads 8 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst.
- _mm512_i32logather_ ⚠pd avx512f
- Loads 8 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst.
- _mm512_i32loscatter_ ⚠epi64 avx512f
- Stores 8 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale.
- _mm512_i32loscatter_ ⚠pd avx512f
- Stores 8 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale.
- _mm512_i32scatter_ ⚠epi32 avx512f
- Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i32scatter_ ⚠epi64 avx512f
- Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i32scatter_ ⚠pd avx512f
- Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i32scatter_ ⚠ps avx512f
- Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠epi32 avx512f
- Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠epi64 avx512f
- Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠pd avx512f
- Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠ps avx512f
- Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠epi32 avx512f
- Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠epi64 avx512f
- Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠pd avx512f
- Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠ps avx512f
- Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_insertf32x4 avx512f
- Copy a to dst, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm8.
- _mm512_insertf32x8 avx512dq
- Copy a to dst, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by IMM8.
- _mm512_insertf64x2 avx512dq
- Copy a to dst, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by IMM8.
- _mm512_insertf64x4 avx512f
- Copy a to dst, then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm8.
- _mm512_inserti32x4 avx512f
- Copy a to dst, then insert 128 bits (composed of 4 packed 32-bit integers) from b into dst at the location specified by imm8.
- _mm512_inserti32x8 avx512dq
- Copy a to dst, then insert 256 bits (composed of 8 packed 32-bit integers) from b into dst at the location specified by IMM8.
- _mm512_inserti64x2 avx512dq
- Copy a to dst, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by IMM8.
- _mm512_inserti64x4 avx512f
- Copy a to dst, then insert 256 bits (composed of 4 packed 64-bit integers) from b into dst at the location specified by imm8.
- _mm512_int2mask avx512f
- Converts integer mask into bitmask, storing the result in dst.
- _mm512_kand avx512f
- Compute the bitwise AND of 16-bit masks a and b, and store the result in k.
- _mm512_kandn avx512f
- Compute the bitwise NOT of 16-bit masks a and then AND with b, and store the result in k.
- _mm512_kmov avx512f
- Copy 16-bit mask a to k.
- _mm512_knot avx512f
- Compute the bitwise NOT of 16-bit mask a, and store the result in k.
- _mm512_kor avx512f
- Compute the bitwise OR of 16-bit masks a and b, and store the result in k.
- _mm512_kortestc avx512f
- Performs bitwise OR between k1 and k2, storing the result in dst. CF flag is set if dst consists of all 1’s.
- _mm512_kortestz avx512f
- Performs bitwise OR between k1 and k2, storing the result in dst. ZF flag is set if dst is 0.
- _mm512_kunpackb avx512f
- Unpack and interleave 8 bits from masks a and b, and store the 16-bit result in k.
- _mm512_kunpackd avx512bw
- Unpack and interleave 32 bits from masks a and b, and store the 64-bit result in k.
- _mm512_kunpackw avx512bw
- Unpack and interleave 16 bits from masks a and b, and store the 32-bit result in k.
- _mm512_kxnor avx512f
- Compute the bitwise XNOR of 16-bit masks a and b, and store the result in k.
- _mm512_kxor avx512f
- Compute the bitwise XOR of 16-bit masks a and b, and store the result in k.
- _mm512_load_ ⚠epi32 avx512f
- Load 512-bits (composed of 16 packed 32-bit integers) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠epi64 avx512f
- Load 512-bits (composed of 8 packed 64-bit integers) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠pd avx512f
- Load 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠ps avx512f
- Load 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠si512 avx512f
- Load 512-bits of integer data from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_loadu_ ⚠epi8 avx512bw
- Load 512-bits (composed of 64 packed 8-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠epi16 avx512bw
- Load 512-bits (composed of 32 packed 16-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠epi32 avx512f
- Load 512-bits (composed of 16 packed 32-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠epi64 avx512f
- Load 512-bits (composed of 8 packed 64-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠pd avx512f
- Loads 512-bits (composed of 8 packed double-precision (64-bit)
floating-point elements) from memory into result.
mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠ps avx512f
- Loads 512-bits (composed of 16 packed single-precision (32-bit)
floating-point elements) from memory into result.
mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠si512 avx512f
- Load 512-bits of integer data from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_lzcnt_ epi32 avx512cd
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst.
- _mm512_lzcnt_ epi64 avx512cd
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst.
- _mm512_madd52hi_ epu64 avx512ifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm512_madd52lo_ epu64 avx512ifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm512_madd_ epi16 avx512bw
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst.
- _mm512_maddubs_ epi16 avx512bw
- Vertically multiply each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst.
- _mm512_mask2_ permutex2var_ epi8 avx512vbmi
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask2_ permutex2var_ epi16 avx512bw
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2_ permutex2var_ epi32 avx512f
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2_ permutex2var_ epi64 avx512f
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2_ permutex2var_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set)
- _mm512_mask2_ permutex2var_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2int avx512f
- Converts bit mask k1 into an integer value, storing the results in dst.
- _mm512_mask3_ fmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fmaddsub_ pd avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmaddsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmaddsub_ round_ pd avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fmaddsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fmsubadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsubadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsubadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fmsubadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fnmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fnmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fnmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ fnmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask_ abs_ epi8 avx512bw
- Compute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ abs_ epi16 avx512bw
- Compute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ abs_ epi32 avx512f
- Computes the absolute value of packed 32-bit integers in a, and store the unsigned results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set).
- _mm512_mask_ abs_ epi64 avx512f
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ abs_ pd avx512f
- Finds the absolute value of each packed double-precision (64-bit) floating-point element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ abs_ ps avx512f
- Finds the absolute value of each packed single-precision (32-bit) floating-point element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ epi8 avx512bw
- Add packed 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ epi16 avx512bw
- Add packed 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ epi32 avx512f
- Add packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ epi64 avx512f
- Add packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ pd avx512f
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ ps avx512f
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ round_ pd avx512f
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ add_ round_ ps avx512f
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ adds_ epi8 avx512bw
- Add packed signed 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ adds_ epi16 avx512bw
- Add packed signed 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ adds_ epu8 avx512bw
- Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ adds_ epu16 avx512bw
- Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ alignr_ epi8 avx512bw
- Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ alignr_ epi32 avx512f
- Concatenate a and b into a 128-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 64 bytes (16 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ alignr_ epi64 avx512f
- Concatenate a and b into a 128-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 64 bytes (8 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ and_ epi32 avx512f
- Performs element-by-element bitwise AND between packed 32-bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ and_ epi64 avx512f
- Compute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ and_ pd avx512dq
- Compute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ and_ ps avx512dq
- Compute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ andnot_ epi32 avx512f
- Compute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ andnot_ epi64 avx512f
- Compute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ andnot_ pd avx512dq
- Compute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ andnot_ ps avx512dq
- Compute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ avg_ epu8 avx512bw
- Average packed unsigned 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ avg_ epu16 avx512bw
- Average packed unsigned 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ bitshuffle_ epi64_ mask avx512bitalg
- Considers the input bas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm512_mask_ blend_ epi8 avx512bw
- Blend packed 8-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ blend_ epi16 avx512bw
- Blend packed 16-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ blend_ epi32 avx512f
- Blend packed 32-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ blend_ epi64 avx512f
- Blend packed 64-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ blend_ pd avx512f
- Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_mask_ blend_ ps avx512f
- Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_mask_ broadcast_ f32x2 avx512dq
- Broadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ broadcast_ f32x4 avx512f
- Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcast_ f32x8 avx512dq
- Broadcasts the 8 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ broadcast_ f64x2 avx512dq
- Broadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ broadcast_ f64x4 avx512f
- Broadcast the 4 packed double-precision (64-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcast_ i32x2 avx512dq
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ broadcast_ i32x4 avx512f
- Broadcast the 4 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcast_ i32x8 avx512dq
- Broadcasts the 8 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ broadcast_ i64x2 avx512dq
- Broadcasts the 2 packed 64-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ broadcast_ i64x4 avx512f
- Broadcast the 4 packed 64-bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcastb_ epi8 avx512bw
- Broadcast the low packed 8-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcastd_ epi32 avx512f
- Broadcast the low packed 32-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcastq_ epi64 avx512f
- Broadcast the low packed 64-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcastsd_ pd avx512f
- Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcastss_ ps avx512f
- Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ broadcastw_ epi16 avx512bw
- Broadcast the low packed 16-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ round_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cmp_ round_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cmpeq_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ epi32_ mask avx512f
- Compare packed 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ epi64_ mask avx512f
- Compare packed 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpeq_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpge_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpgt_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmple_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epi32_ mask avx512f
- Compare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmplt_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epi8_ mask avx512bw
- Compare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epi16_ mask avx512bw
- Compare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epi32_ mask avx512f
- Compare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epi64_ mask avx512f
- Compare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epu8_ mask avx512bw
- Compare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epu16_ mask avx512bw
- Compare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epu32_ mask avx512f
- Compare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ epu64_ mask avx512f
- Compare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpneq_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpnle_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpnle_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpnlt_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpnlt_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpord_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpord_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpunord_ pd_ mask avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmpunord_ ps_ mask avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ compress_ epi8 avx512vbmi2
- Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ compress_ epi16 avx512vbmi2
- Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ compress_ epi32 avx512f
- Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ compress_ epi64 avx512f
- Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ compress_ pd avx512f
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ compress_ ps avx512f
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ ⚠compressstoreu_ epi8 avx512vbmi2
- Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ epi16 avx512vbmi2
- Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ epi32 avx512f
- Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ epi64 avx512f
- Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ pd avx512f
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ ps avx512f
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ conflict_ epi32 avx512cd
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_mask_ conflict_ epi64 avx512cd
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_mask_ cvt_ roundepi32_ ps avx512f
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ cvt_ roundepi64_ pd avx512dq
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundepi64_ ps avx512dq
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundepu32_ ps avx512f
- Convert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ cvt_ roundepu64_ pd avx512dq
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundepu64_ ps avx512dq
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ cvt_ roundpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ cvt_ roundpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundpd_ ps avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ cvt_ roundph_ ps avx512f
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cvt_ roundps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ cvt_ roundps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ cvt_ roundps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ cvt_ roundps_ pd avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cvt_ roundps_ ph avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the rounding[3:0] parameter, which can be one of:
- _mm512_mask_ cvtepi8_ epi16 avx512bw
- Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi8_ epi32 avx512f
- Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi8_ epi64 avx512f
- Sign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi16_ epi8 avx512bw
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi16_ epi32 avx512f
- Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi16_ epi64 avx512f
- Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi16_ storeu_ epi8 avx512bw
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtepi32_ epi8 avx512f
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi32_ epi16 avx512f
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi32_ epi64 avx512f
- Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi32_ pd avx512f
- Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi32_ ps avx512f
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi32_ storeu_ epi8 avx512f
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi32_ storeu_ epi16 avx512f
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtepi32lo_ pd avx512f
- Performs element-by-element conversion of the lower half of packed 32-bit integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi64_ epi8 avx512f
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi64_ epi16 avx512f
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi64_ epi32 avx512f
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi64_ pd avx512dq
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtepi64_ ps avx512dq
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtepi64_ storeu_ epi8 avx512f
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi64_ storeu_ epi16 avx512f
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi64_ storeu_ epi32 avx512f
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtepu8_ epi16 avx512bw
- Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu8_ epi32 avx512f
- Zero extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu8_ epi64 avx512f
- Zero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu16_ epi32 avx512f
- Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu16_ epi64 avx512f
- Zero extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu32_ epi64 avx512f
- Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu32_ pd avx512f
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu32_ ps avx512f
- Convert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu32lo_ pd avx512f
- Performs element-by-element conversion of the lower half of 32-bit unsigned integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu64_ pd avx512dq
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtepu64_ ps avx512dq
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtne2ps_ pbh avx512bf16andavx512f
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_mask_ cvtneps_ pbh avx512bf16andavx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_mask_ cvtpbh_ ps avx512bf16andavx512f
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtpd_ ps avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtpd_ pslo avx512f
- Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in v2 to single-precision (32-bit) floating-point elements and stores them in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.
- _mm512_mask_ cvtph_ ps avx512f
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtps_ pd avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtps_ ph avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the rounding[3:0] parameter, which can be one of:
- _mm512_mask_ cvtpslo_ pd avx512f
- Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtsepi16_ epi8 avx512bw
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi16_ storeu_ epi8 avx512bw
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtsepi32_ epi8 avx512f
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtsepi32_ epi16 avx512f
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi32_ storeu_ epi8 avx512f
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi32_ storeu_ epi16 avx512f
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtsepi64_ epi8 avx512f
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtsepi64_ epi16 avx512f
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtsepi64_ epi32 avx512f
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi64_ storeu_ epi8 avx512f
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi64_ storeu_ epi16 avx512f
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi64_ storeu_ epi32 avx512f
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtt_ roundpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cvtt_ roundpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ cvtt_ roundpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cvtt_ roundpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ cvtt_ roundps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cvtt_ roundps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ cvtt_ roundps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ cvtt_ roundps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ cvttpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvttpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvttps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvttps_ epu32 avx512f
- Convert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ cvtusepi16_ epi8 avx512bw
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi16_ storeu_ epi8 avx512bw
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtusepi32_ epi8 avx512f
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtusepi32_ epi16 avx512f
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi32_ storeu_ epi8 avx512f
- Convert packed unsigned 32-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi32_ storeu_ epi16 avx512f
- Convert packed unsigned 32-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ cvtusepi64_ epi8 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtusepi64_ epi16 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtusepi64_ epi32 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi64_ storeu_ epi8 avx512f
- Convert packed unsigned 64-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi64_ storeu_ epi16 avx512f
- Convert packed unsigned 64-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi64_ storeu_ epi32 avx512f
- Convert packed unsigned 64-bit integers in a to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ dbsad_ epu8 avx512bw
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm512_mask_ div_ pd avx512f
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ div_ ps avx512f
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ div_ round_ pd avx512f
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ div_ round_ ps avx512f
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ dpbf16_ ps avx512bf16andavx512f
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_mask_ dpbusd_ epi32 avx512vnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ dpbusds_ epi32 avx512vnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ dpwssd_ epi32 avx512vnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ dpwssds_ epi32 avx512vnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ expand_ epi8 avx512vbmi2
- Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ expand_ epi16 avx512vbmi2
- Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ expand_ epi32 avx512f
- Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ expand_ epi64 avx512f
- Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ expand_ pd avx512f
- Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ expand_ ps avx512f
- Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi8 avx512vbmi2
- Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi16 avx512vbmi2
- Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi32 avx512f
- Load contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi64 avx512f
- Load contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ pd avx512f
- Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ ps avx512f
- Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ extractf32x4_ ps avx512f
- Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ extractf32x8_ ps avx512dq
- Extracts 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ extractf64x2_ pd avx512dq
- Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ extractf64x4_ pd avx512f
- Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ extracti32x4_ epi32 avx512f
- Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM2, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ extracti32x8_ epi32 avx512dq
- Extracts 256 bits (composed of 8 packed 32-bit integers) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ extracti64x2_ epi64 avx512dq
- Extracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ extracti64x4_ epi64 avx512f
- Extract 256 bits (composed of 4 packed 64-bit integers) from a, selected with IMM1, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ fixupimm_ pd avx512f
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_mask_ fixupimm_ ps avx512f
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_mask_ fixupimm_ round_ pd avx512f
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_mask_ fixupimm_ round_ ps avx512f
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_mask_ fmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fmaddsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmaddsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmaddsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fmaddsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fmsubadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsubadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsubadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fmsubadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fnmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fnmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fnmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fnmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fnmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fnmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fnmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fnmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ fpclass_ pd_ mask avx512dq
- Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_mask_ fpclass_ ps_ mask avx512dq
- Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_mask_ getexp_ pd avx512f
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_mask_ getexp_ ps avx512f
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_mask_ getexp_ round_ pd avx512f
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ getexp_ round_ ps avx512f
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ getmant_ pd avx512f
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_mask_ getmant_ ps avx512f
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_mask_ getmant_ round_ pd avx512f
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ getmant_ round_ ps avx512f
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ gf2p8affine_ epi64_ epi8 gfniandavx512bwandavx512f
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_mask_ gf2p8affineinv_ epi64_ epi8 gfniandavx512bwandavx512f
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_mask_ gf2p8mul_ epi8 gfniandavx512bwandavx512f
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm512_mask_ ⚠i32gather_ epi32 avx512f
- Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32gather_ epi64 avx512f
- Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32gather_ pd avx512f
- Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32gather_ ps avx512f
- Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32logather_ epi64 avx512f
- Loads 8 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠i32logather_ pd avx512f
- Loads 8 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠i32loscatter_ epi64 avx512f
- Stores 8 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm512_mask_ ⚠i32loscatter_ pd avx512f
- Stores 8 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm512_mask_ ⚠i32scatter_ epi32 avx512f
- Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32scatter_ epi64 avx512f
- Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32scatter_ pd avx512f
- Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32scatter_ ps avx512f
- Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ epi32 avx512f
- Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ epi64 avx512f
- Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ pd avx512f
- Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ ps avx512f
- Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ epi32 avx512f
- Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ epi64 avx512f
- Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ pd avx512f
- Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ ps avx512f
- Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ insertf32x4 avx512f
- Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ insertf32x8 avx512dq
- Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ insertf64x2 avx512dq
- Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ insertf64x4 avx512f
- Copy a to tmp, then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ inserti32x4 avx512f
- Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ inserti32x8 avx512dq
- Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ inserti64x2 avx512dq
- Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ inserti64x4 avx512f
- Copy a to tmp, then insert 256 bits (composed of 4 packed 64-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠load_ epi32 avx512f
- Load packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠load_ epi64 avx512f
- Load packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠load_ pd avx512f
- Load packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠load_ ps avx512f
- Load packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠loadu_ epi8 avx512bw
- Load packed 8-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ epi16 avx512bw
- Load packed 16-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ epi32 avx512f
- Load packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ epi64 avx512f
- Load packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ pd avx512f
- Load packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ ps avx512f
- Load packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ lzcnt_ epi32 avx512cd
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ lzcnt_ epi64 avx512cd
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ madd52hi_ epu64 avx512ifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm512_mask_ madd52lo_ epu64 avx512ifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm512_mask_ madd_ epi16 avx512bw
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ maddubs_ epi16 avx512bw
- Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epi8 avx512bw
- Compare packed signed 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epi16 avx512bw
- Compare packed signed 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epi32 avx512f
- Compare packed signed 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epi64 avx512f
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epu8 avx512bw
- Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epu16 avx512bw
- Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epu32 avx512f
- Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ epu64 avx512f
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ max_ round_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ max_ round_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ min_ epi8 avx512bw
- Compare packed signed 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ epi16 avx512bw
- Compare packed signed 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ epi32 avx512f
- Compare packed signed 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ epi64 avx512f
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ epu8 avx512bw
- Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ epu16 avx512bw
- Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ epu32 avx512f
- Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ epu64 avx512f
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ min_ round_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ min_ round_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ mov_ epi8 avx512bw
- Move packed 8-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mov_ epi16 avx512bw
- Move packed 16-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mov_ epi32 avx512f
- Move packed 32-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mov_ epi64 avx512f
- Move packed 64-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mov_ pd avx512f
- Move packed double-precision (64-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mov_ ps avx512f
- Move packed single-precision (32-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ movedup_ pd avx512f
- Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ movehdup_ ps avx512f
- Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ moveldup_ ps avx512f
- Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mul_ epi32 avx512f
- Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mul_ epu32 avx512f
- Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mul_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mul_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mul_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ mul_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ mulhi_ epi16 avx512bw
- Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mulhi_ epu16 avx512bw
- Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mulhrs_ epi16 avx512bw
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mullo_ epi16 avx512bw
- Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mullo_ epi32 avx512f
- Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mullo_ epi64 avx512dq
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing writemaskk(elements are copied fromsrcif the corresponding bit is not set).
- _mm512_mask_ mullox_ epi64 avx512f
- Multiplies elements in packed 64-bit integer vectors a and b together, storing the lower 64 bits of the result in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ multishift_ epi64_ epi8 avx512vbmi
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ or_ epi32 avx512f
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ or_ epi64 avx512f
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ or_ pd avx512dq
- Compute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ or_ ps avx512dq
- Compute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ packs_ epi16 avx512bw
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ packs_ epi32 avx512bw
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ packus_ epi16 avx512bw
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ packus_ epi32 avx512bw
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permute_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permute_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutevar_ epi32 avx512f
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the permutevar name. This intrinsic is identical to _mm512_mask_permutexvar_epi32, and it is recommended that you use that intrinsic name.
- _mm512_mask_ permutevar_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutevar_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutex2var_ epi8 avx512vbmi
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ permutex2var_ epi16 avx512bw
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ permutex2var_ epi32 avx512f
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ permutex2var_ epi64 avx512f
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ permutex2var_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ permutex2var_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ permutex_ epi64 avx512f
- Shuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutex_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutexvar_ epi8 avx512vbmi
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutexvar_ epi16 avx512bw
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutexvar_ epi32 avx512f
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutexvar_ epi64 avx512f
- Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutexvar_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ permutexvar_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ popcnt_ epi8 avx512bitalg
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ popcnt_ epi16 avx512bitalg
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ popcnt_ epi32 avx512vpopcntdq
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ popcnt_ epi64 avx512vpopcntdq
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ range_ pd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_mask_ range_ ps avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_mask_ range_ round_ pd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ range_ round_ ps avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_mask_ rcp14_ pd avx512f
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ rcp14_ ps avx512f
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ reduce_ add_ epi32 avx512f
- Reduce the packed 32-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ reduce_ add_ epi64 avx512f
- Reduce the packed 64-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ reduce_ add_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ reduce_ add_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ reduce_ and_ epi32 avx512f
- Reduce the packed 32-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm512_mask_ reduce_ and_ epi64 avx512f
- Reduce the packed 64-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ reduce_ max_ epi32 avx512f
- Reduce the packed signed 32-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ reduce_ max_ epi64 avx512f
- Reduce the packed signed 64-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ reduce_ max_ epu32 avx512f
- Reduce the packed unsigned 32-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ reduce_ max_ epu64 avx512f
- Reduce the packed unsigned 64-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ reduce_ max_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ reduce_ max_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ reduce_ min_ epi32 avx512f
- Reduce the packed signed 32-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ reduce_ min_ epi64 avx512f
- Reduce the packed signed 64-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ reduce_ min_ epu32 avx512f
- Reduce the packed unsigned 32-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ reduce_ min_ epu64 avx512f
- Reduce the packed signed 64-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ reduce_ min_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ reduce_ min_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ reduce_ mul_ epi32 avx512f
- Reduce the packed 32-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ reduce_ mul_ epi64 avx512f
- Reduce the packed 64-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ reduce_ mul_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ reduce_ mul_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ reduce_ or_ epi32 avx512f
- Reduce the packed 32-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm512_mask_ reduce_ or_ epi64 avx512f
- Reduce the packed 64-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm512_mask_ reduce_ pd avx512dq
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ reduce_ ps avx512dq
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ reduce_ round_ pd avx512dq
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ reduce_ round_ ps avx512dq
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ rol_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ rol_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ rolv_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ rolv_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ror_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ror_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ rorv_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ rorv_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ roundscale_ pd avx512f
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ roundscale_ ps avx512f
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ roundscale_ round_ pd avx512f
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ roundscale_ round_ ps avx512f
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ rsqrt14_ pd avx512f
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ rsqrt14_ ps avx512f
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ scalef_ pd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ scalef_ ps avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ scalef_ round_ pd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ scalef_ round_ ps avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ set1_ epi8 avx512bw
- Broadcast 8-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ set1_ epi16 avx512bw
- Broadcast 16-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ set1_ epi32 avx512f
- Broadcast 32-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ set1_ epi64 avx512f
- Broadcast 64-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shldi_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shldi_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shldi_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shldv_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ shldv_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ shldv_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ shrdi_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shrdi_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shrdi_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using writemask k (elements are copied from src“ when the corresponding mask bit is not set).
- _mm512_mask_ shrdv_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ shrdv_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ shrdv_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ epi8 avx512bw
- Shuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ epi32 avx512f
- Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ f32x4 avx512f
- Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ f64x2 avx512f
- Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ i32x4 avx512f
- Shuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ i64x2 avx512f
- Shuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shuffle_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shufflehi_ epi16 avx512bw
- Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ shufflelo_ epi16 avx512bw
- Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sll_ epi16 avx512bw
- Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sll_ epi32 avx512f
- Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sll_ epi64 avx512f
- Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ slli_ epi16 avx512bw
- Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ slli_ epi32 avx512f
- Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ slli_ epi64 avx512f
- Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sllv_ epi16 avx512bw
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sllv_ epi32 avx512f
- Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sllv_ epi64 avx512f
- Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sqrt_ pd avx512f
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sqrt_ ps avx512f
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sqrt_ round_ pd avx512f
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ sqrt_ round_ ps avx512f
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ sra_ epi16 avx512bw
- Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sra_ epi32 avx512f
- Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sra_ epi64 avx512f
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srai_ epi16 avx512bw
- Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srai_ epi32 avx512f
- Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srai_ epi64 avx512f
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srav_ epi16 avx512bw
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srav_ epi32 avx512f
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srav_ epi64 avx512f
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srl_ epi16 avx512bw
- Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srl_ epi32 avx512f
- Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srl_ epi64 avx512f
- Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srli_ epi16 avx512bw
- Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srli_ epi32 avx512f
- Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srli_ epi64 avx512f
- Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srlv_ epi16 avx512bw
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srlv_ epi32 avx512f
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ srlv_ epi64 avx512f
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠store_ epi32 avx512f
- Store packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠store_ epi64 avx512f
- Store packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠store_ pd avx512f
- Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠store_ ps avx512f
- Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠storeu_ epi8 avx512bw
- Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ epi16 avx512bw
- Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ epi32 avx512f
- Store packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ epi64 avx512f
- Store packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ pd avx512f
- Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ ps avx512f
- Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ sub_ epi8 avx512bw
- Subtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sub_ epi16 avx512bw
- Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sub_ epi32 avx512f
- Subtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sub_ epi64 avx512f
- Subtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sub_ pd avx512f
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sub_ ps avx512f
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sub_ round_ pd avx512f
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ sub_ round_ ps avx512f
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ subs_ epi8 avx512bw
- Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ subs_ epi16 avx512bw
- Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ subs_ epu8 avx512bw
- Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ subs_ epu16 avx512bw
- Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ternarylogic_ epi32 avx512f
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 32-bit granularity (32-bit elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ternarylogic_ epi64 avx512f
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 64-bit granularity (64-bit elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ test_ epi8_ mask avx512bw
- Compute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ test_ epi16_ mask avx512bw
- Compute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ test_ epi32_ mask avx512f
- Compute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ test_ epi64_ mask avx512f
- Compute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ testn_ epi8_ mask avx512bw
- Compute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ testn_ epi16_ mask avx512bw
- Compute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ testn_ epi32_ mask avx512f
- Compute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ testn_ epi64_ mask avx512f
- Compute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ unpackhi_ epi8 avx512bw
- Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpackhi_ epi16 avx512bw
- Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpackhi_ epi32 avx512f
- Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpackhi_ epi64 avx512f
- Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpackhi_ pd avx512f
- Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpackhi_ ps avx512f
- Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpacklo_ epi8 avx512bw
- Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpacklo_ epi16 avx512bw
- Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpacklo_ epi32 avx512f
- Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpacklo_ epi64 avx512f
- Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpacklo_ pd avx512f
- Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ unpacklo_ ps avx512f
- Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ xor_ epi32 avx512f
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ xor_ epi64 avx512f
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ xor_ pd avx512dq
- Compute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ xor_ ps avx512dq
- Compute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_maskz_ abs_ epi8 avx512bw
- Compute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ abs_ epi16 avx512bw
- Compute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ abs_ epi32 avx512f
- Computes the absolute value of packed 32-bit integers in a, and store the unsigned results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ abs_ epi64 avx512f
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ epi8 avx512bw
- Add packed 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ epi16 avx512bw
- Add packed 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ epi32 avx512f
- Add packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ epi64 avx512f
- Add packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ pd avx512f
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ ps avx512f
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ round_ pd avx512f
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ add_ round_ ps avx512f
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ adds_ epi8 avx512bw
- Add packed signed 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ adds_ epi16 avx512bw
- Add packed signed 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ adds_ epu8 avx512bw
- Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ adds_ epu16 avx512bw
- Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ alignr_ epi8 avx512bw
- Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ alignr_ epi32 avx512f
- Concatenate a and b into a 128-byte immediate result, shift the result right by imm8 32-bit elements, and stores the low 64 bytes (16 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ alignr_ epi64 avx512f
- Concatenate a and b into a 128-byte immediate result, shift the result right by imm8 64-bit elements, and stores the low 64 bytes (8 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ and_ epi32 avx512f
- Compute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ and_ epi64 avx512f
- Compute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ and_ pd avx512dq
- Compute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ and_ ps avx512dq
- Compute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ andnot_ epi32 avx512f
- Compute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ andnot_ epi64 avx512f
- Compute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ andnot_ pd avx512dq
- Compute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ andnot_ ps avx512dq
- Compute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ avg_ epu8 avx512bw
- Average packed unsigned 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ avg_ epu16 avx512bw
- Average packed unsigned 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcast_ f32x2 avx512dq
- Broadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ broadcast_ f32x4 avx512f
- Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcast_ f32x8 avx512dq
- Broadcasts the 8 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ broadcast_ f64x2 avx512dq
- Broadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ broadcast_ f64x4 avx512f
- Broadcast the 4 packed double-precision (64-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcast_ i32x2 avx512dq
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ broadcast_ i32x4 avx512f
- Broadcast the 4 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcast_ i32x8 avx512dq
- Broadcasts the 8 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ broadcast_ i64x2 avx512dq
- Broadcasts the 2 packed 64-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ broadcast_ i64x4 avx512f
- Broadcast the 4 packed 64-bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcastb_ epi8 avx512bw
- Broadcast the low packed 8-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcastd_ epi32 avx512f
- Broadcast the low packed 32-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcastq_ epi64 avx512f
- Broadcast the low packed 64-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcastsd_ pd avx512f
- Broadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcastss_ ps avx512f
- Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ broadcastw_ epi16 avx512bw
- Broadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ compress_ epi8 avx512vbmi2
- Contiguously store the active 8-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ compress_ epi16 avx512vbmi2
- Contiguously store the active 16-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ compress_ epi32 avx512f
- Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ compress_ epi64 avx512f
- Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ compress_ pd avx512f
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ compress_ ps avx512f
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ conflict_ epi32 avx512cd
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_maskz_ conflict_ epi64 avx512cd
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_maskz_ cvt_ roundepi32_ ps avx512f
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ cvt_ roundepi64_ pd avx512dq
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundepi64_ ps avx512dq
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundepu32_ ps avx512f
- Convert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ cvt_ roundepu64_ pd avx512dq
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundepu64_ ps avx512dq
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ cvt_ roundpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ cvt_ roundpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundpd_ ps avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ cvt_ roundph_ ps avx512f
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ cvt_ roundps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ cvt_ roundps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ cvt_ roundps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ cvt_ roundps_ pd avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ cvt_ roundps_ ph avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the rounding[3:0] parameter, which can be one of:
- _mm512_maskz_ cvtepi8_ epi16 avx512bw
- Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi8_ epi32 avx512f
- Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi8_ epi64 avx512f
- Sign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi16_ epi8 avx512bw
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi16_ epi32 avx512f
- Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi16_ epi64 avx512f
- Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi32_ epi8 avx512f
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi32_ epi16 avx512f
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi32_ epi64 avx512f
- Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi32_ pd avx512f
- Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi32_ ps avx512f
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi64_ epi8 avx512f
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi64_ epi16 avx512f
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi64_ epi32 avx512f
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi64_ pd avx512dq
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtepi64_ ps avx512dq
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtepu8_ epi16 avx512bw
- Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu8_ epi32 avx512f
- Zero extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu8_ epi64 avx512f
- Zero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu16_ epi32 avx512f
- Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu16_ epi64 avx512f
- Zero extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu32_ epi64 avx512f
- Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu32_ pd avx512f
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu32_ ps avx512f
- Convert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu64_ pd avx512dq
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtepu64_ ps avx512dq
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtne2ps_ pbh avx512bf16andavx512f
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_maskz_ cvtneps_ pbh avx512bf16andavx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_maskz_ cvtpbh_ ps avx512bf16andavx512f
- Converts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtpd_ ps avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ ps avx512f
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtps_ pd avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtps_ ph avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the rounding[3:0] parameter, which can be one of:
- _mm512_maskz_ cvtsepi16_ epi8 avx512bw
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtsepi32_ epi8 avx512f
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtsepi32_ epi16 avx512f
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm512_maskz_ cvtsepi64_ epi8 avx512f
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtsepi64_ epi16 avx512f
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtsepi64_ epi32 avx512f
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtt_ roundpd_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ cvtt_ roundpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ cvtt_ roundpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ cvtt_ roundpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ cvtt_ roundps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ cvtt_ roundps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ cvtt_ roundps_ epu32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ cvtt_ roundps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ cvttpd_ epi32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttpd_ epi64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvttpd_ epu32 avx512f
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttpd_ epu64 avx512dq
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding
- _mm512_maskz_ cvttps_ epi32 avx512f
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttps_ epi64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvttps_ epu32 avx512f
- Convert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttps_ epu64 avx512dq
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ cvtusepi16_ epi8 avx512bw
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtusepi32_ epi8 avx512f
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtusepi32_ epi16 avx512f
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtusepi64_ epi8 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtusepi64_ epi16 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtusepi64_ epi32 avx512f
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ dbsad_ epu8 avx512bw
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm512_maskz_ div_ pd avx512f
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ div_ ps avx512f
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ div_ round_ pd avx512f
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ div_ round_ ps avx512f
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ dpbf16_ ps avx512bf16andavx512f
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_maskz_ dpbusd_ epi32 avx512vnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ dpbusds_ epi32 avx512vnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ dpwssd_ epi32 avx512vnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ dpwssds_ epi32 avx512vnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ expand_ epi8 avx512vbmi2
- Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ expand_ epi16 avx512vbmi2
- Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ expand_ epi32 avx512f
- Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ expand_ epi64 avx512f
- Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ expand_ pd avx512f
- Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ expand_ ps avx512f
- Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi8 avx512vbmi2
- Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi16 avx512vbmi2
- Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi32 avx512f
- Load contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi64 avx512f
- Load contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ pd avx512f
- Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ ps avx512f
- Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ extractf32x4_ ps avx512f
- Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ extractf32x8_ ps avx512dq
- Extracts 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ extractf64x2_ pd avx512dq
- Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ extractf64x4_ pd avx512f
- Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ extracti32x4_ epi32 avx512f
- Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM2, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ extracti32x8_ epi32 avx512dq
- Extracts 256 bits (composed of 8 packed 32-bit integers) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ extracti64x2_ epi64 avx512dq
- Extracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ extracti64x4_ epi64 avx512f
- Extract 256 bits (composed of 4 packed 64-bit integers) from a, selected with IMM1, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fixupimm_ pd avx512f
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_maskz_ fixupimm_ ps avx512f
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_maskz_ fixupimm_ round_ pd avx512f
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_maskz_ fixupimm_ round_ ps avx512f
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_maskz_ fmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in a using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fmaddsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmaddsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmaddsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fmaddsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fmsubadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsubadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsubadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fmsubadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fnmadd_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fnmadd_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fnmadd_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fnmadd_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fnmsub_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fnmsub_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fnmsub_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ fnmsub_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ getexp_ pd avx512f
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_maskz_ getexp_ ps avx512f
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_maskz_ getexp_ round_ pd avx512f
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ getexp_ round_ ps avx512f
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ getmant_ pd avx512f
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_maskz_ getmant_ ps avx512f
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_maskz_ getmant_ round_ pd avx512f
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ getmant_ round_ ps avx512f
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ gf2p8affine_ epi64_ epi8 gfniandavx512bwandavx512f
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_maskz_ gf2p8affineinv_ epi64_ epi8 gfniandavx512bwandavx512f
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_maskz_ gf2p8mul_ epi8 gfniandavx512bwandavx512f
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm512_maskz_ insertf32x4 avx512f
- Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ insertf32x8 avx512dq
- Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ insertf64x2 avx512dq
- Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ insertf64x4 avx512f
- Copy a to tmp, then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ inserti32x4 avx512f
- Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ inserti32x8 avx512dq
- Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ inserti64x2 avx512dq
- Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ inserti64x4 avx512f
- Copy a to tmp, then insert 256 bits (composed of 4 packed 64-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠load_ epi32 avx512f
- Load packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠load_ epi64 avx512f
- Load packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠load_ pd avx512f
- Load packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠load_ ps avx512f
- Load packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠loadu_ epi8 avx512bw
- Load packed 8-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ epi16 avx512bw
- Load packed 16-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ epi32 avx512f
- Load packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ epi64 avx512f
- Load packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ pd avx512f
- Load packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ ps avx512f
- Load packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ lzcnt_ epi32 avx512cd
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ lzcnt_ epi64 avx512cd
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ madd52hi_ epu64 avx512ifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ madd52lo_ epu64 avx512ifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ madd_ epi16 avx512bw
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ maddubs_ epi16 avx512bw
- Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epi8 avx512bw
- Compare packed signed 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epi16 avx512bw
- Compare packed signed 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epi32 avx512f
- Compare packed signed 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epi64 avx512f
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epu8 avx512bw
- Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epu16 avx512bw
- Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epu32 avx512f
- Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ epu64 avx512f
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ max_ round_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ max_ round_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ min_ epi8 avx512bw
- Compare packed signed 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ epi16 avx512bw
- Compare packed signed 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ epi32 avx512f
- Compare packed signed 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ epi64 avx512f
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ epu8 avx512bw
- Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ epu16 avx512bw
- Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ epu32 avx512f
- Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ epu64 avx512f
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ min_ round_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ min_ round_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ mov_ epi8 avx512bw
- Move packed 8-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mov_ epi16 avx512bw
- Move packed 16-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mov_ epi32 avx512f
- Move packed 32-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mov_ epi64 avx512f
- Move packed 64-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mov_ pd avx512f
- Move packed double-precision (64-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mov_ ps avx512f
- Move packed single-precision (32-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ movedup_ pd avx512f
- Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ movehdup_ ps avx512f
- Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ moveldup_ ps avx512f
- Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mul_ epi32 avx512f
- Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mul_ epu32 avx512f
- Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mul_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mul_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mul_ round_ pd avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ mul_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ mulhi_ epi16 avx512bw
- Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mulhi_ epu16 avx512bw
- Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mulhrs_ epi16 avx512bw
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mullo_ epi16 avx512bw
- Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mullo_ epi32 avx512f
- Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mullo_ epi64 avx512dq
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing zeromaskk(elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ multishift_ epi64_ epi8 avx512vbmi
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ or_ epi32 avx512f
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ or_ epi64 avx512f
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ or_ pd avx512dq
- Compute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ or_ ps avx512dq
- Compute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ packs_ epi16 avx512bw
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ packs_ epi32 avx512bw
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ packus_ epi16 avx512bw
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ packus_ epi32 avx512bw
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permute_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permute_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutevar_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutevar_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex2var_ epi8 avx512vbmi
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex2var_ epi16 avx512bw
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex2var_ epi32 avx512f
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex2var_ epi64 avx512f
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex2var_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex2var_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex_ epi64 avx512f
- Shuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutex_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutexvar_ epi8 avx512vbmi
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutexvar_ epi16 avx512bw
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutexvar_ epi32 avx512f
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutexvar_ epi64 avx512f
- Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutexvar_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ permutexvar_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ popcnt_ epi8 avx512bitalg
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ popcnt_ epi16 avx512bitalg
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ popcnt_ epi32 avx512vpopcntdq
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ popcnt_ epi64 avx512vpopcntdq
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ range_ pd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_maskz_ range_ ps avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_maskz_ range_ round_ pd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ range_ round_ ps avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_maskz_ rcp14_ pd avx512f
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ rcp14_ ps avx512f
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ reduce_ pd avx512dq
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ reduce_ ps avx512dq
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ reduce_ round_ pd avx512dq
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ reduce_ round_ ps avx512dq
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ rol_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ rol_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ rolv_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ rolv_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ror_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ror_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ rorv_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ rorv_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ roundscale_ pd avx512f
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ roundscale_ ps avx512f
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ roundscale_ round_ pd avx512f
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ roundscale_ round_ ps avx512f
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ rsqrt14_ pd avx512f
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ rsqrt14_ ps avx512f
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ scalef_ pd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ scalef_ ps avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ scalef_ round_ pd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ scalef_ round_ ps avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ set1_ epi8 avx512bw
- Broadcast 8-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ set1_ epi16 avx512bw
- Broadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ set1_ epi32 avx512f
- Broadcast 32-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ set1_ epi64 avx512f
- Broadcast 64-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shldi_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shldi_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shldi_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shldv_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shldv_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shldv_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shrdi_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shrdi_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shrdi_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shrdv_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shrdv_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shrdv_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ epi8 avx512bw
- Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ epi32 avx512f
- Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ f32x4 avx512f
- Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ f64x2 avx512f
- Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ i32x4 avx512f
- Shuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ i64x2 avx512f
- Shuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shuffle_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shufflehi_ epi16 avx512bw
- Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ shufflelo_ epi16 avx512bw
- Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sll_ epi16 avx512bw
- Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sll_ epi32 avx512f
- Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sll_ epi64 avx512f
- Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ slli_ epi16 avx512bw
- Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ slli_ epi32 avx512f
- Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ slli_ epi64 avx512f
- Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sllv_ epi16 avx512bw
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sllv_ epi32 avx512f
- Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sllv_ epi64 avx512f
- Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sqrt_ pd avx512f
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sqrt_ ps avx512f
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sqrt_ round_ pd avx512f
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ sqrt_ round_ ps avx512f
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ sra_ epi16 avx512bw
- Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sra_ epi32 avx512f
- Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sra_ epi64 avx512f
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srai_ epi16 avx512bw
- Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srai_ epi32 avx512f
- Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srai_ epi64 avx512f
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srav_ epi16 avx512bw
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srav_ epi32 avx512f
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srav_ epi64 avx512f
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srl_ epi16 avx512bw
- Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srl_ epi32 avx512f
- Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srl_ epi64 avx512f
- Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srli_ epi16 avx512bw
- Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srli_ epi32 avx512f
- Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srli_ epi64 avx512f
- Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srlv_ epi16 avx512bw
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srlv_ epi32 avx512f
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ srlv_ epi64 avx512f
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ epi8 avx512bw
- Subtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ epi16 avx512bw
- Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ epi32 avx512f
- Subtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ epi64 avx512f
- Subtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ pd avx512f
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ ps avx512f
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ round_ pd avx512f
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ sub_ round_ ps avx512f
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ subs_ epi8 avx512bw
- Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ subs_ epi16 avx512bw
- Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ subs_ epu8 avx512bw
- Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ subs_ epu16 avx512bw
- Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ternarylogic_ epi32 avx512f
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ternarylogic_ epi64 avx512f
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpackhi_ epi8 avx512bw
- Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpackhi_ epi16 avx512bw
- Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpackhi_ epi32 avx512f
- Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpackhi_ epi64 avx512f
- Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpackhi_ pd avx512f
- Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpackhi_ ps avx512f
- Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpacklo_ epi8 avx512bw
- Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpacklo_ epi16 avx512bw
- Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpacklo_ epi32 avx512f
- Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpacklo_ epi64 avx512f
- Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpacklo_ pd avx512f
- Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ unpacklo_ ps avx512f
- Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ xor_ epi32 avx512f
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ xor_ epi64 avx512f
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ xor_ pd avx512dq
- Compute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ xor_ ps avx512dq
- Compute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_max_ epi8 avx512bw
- Compare packed signed 8-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ epi16 avx512bw
- Compare packed signed 16-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ epi32 avx512f
- Compare packed signed 32-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ epi64 avx512f
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ epu8 avx512bw
- Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ epu16 avx512bw
- Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ epu32 avx512f
- Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ epu64 avx512f
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst.
- _mm512_max_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst.
- _mm512_max_ round_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_max_ round_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_min_ epi8 avx512bw
- Compare packed signed 8-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ epi16 avx512bw
- Compare packed signed 16-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ epi32 avx512f
- Compare packed signed 32-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ epi64 avx512f
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ epu8 avx512bw
- Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ epu16 avx512bw
- Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ epu32 avx512f
- Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ epu64 avx512f
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst.
- _mm512_min_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst.
- _mm512_min_ round_ pd avx512f
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_min_ round_ ps avx512f
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_movedup_ pd avx512f
- Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst.
- _mm512_movehdup_ ps avx512f
- Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
- _mm512_moveldup_ ps avx512f
- Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
- _mm512_movepi8_ mask avx512bw
- Set each bit of mask register k based on the most significant bit of the corresponding packed 8-bit integer in a.
- _mm512_movepi16_ mask avx512bw
- Set each bit of mask register k based on the most significant bit of the corresponding packed 16-bit integer in a.
- _mm512_movepi32_ mask avx512dq
- Set each bit of mask register k based on the most significant bit of the corresponding packed 32-bit integer in a.
- _mm512_movepi64_ mask avx512dq
- Set each bit of mask register k based on the most significant bit of the corresponding packed 64-bit integer in a.
- _mm512_movm_ epi8 avx512bw
- Set each packed 8-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_movm_ epi16 avx512bw
- Set each packed 16-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_movm_ epi32 avx512dq
- Set each packed 32-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_movm_ epi64 avx512dq
- Set each packed 64-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_mul_ epi32 avx512f
- Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst.
- _mm512_mul_ epu32 avx512f
- Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst.
- _mm512_mul_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_mul_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_mul_ round_ pd avx512f
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_mul_ round_ ps avx512f
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_mulhi_ epi16 avx512bw
- Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
- _mm512_mulhi_ epu16 avx512bw
- Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
- _mm512_mulhrs_ epi16 avx512bw
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst.
- _mm512_mullo_ epi16 avx512bw
- Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst.
- _mm512_mullo_ epi32 avx512f
- Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.
- _mm512_mullo_ epi64 avx512dq
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indst.
- _mm512_mullox_ epi64 avx512f
- Multiplies elements in packed 64-bit integer vectors a and b together, storing the lower 64 bits of the result in dst.
- _mm512_multishift_ epi64_ epi8 avx512vbmi
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst.
- _mm512_or_ epi32 avx512f
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst.
- _mm512_or_ epi64 avx512f
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the resut in dst.
- _mm512_or_ pd avx512dq
- Compute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst.
- _mm512_or_ ps avx512dq
- Compute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst.
- _mm512_or_ si512 avx512f
- Compute the bitwise OR of 512 bits (representing integer data) in a and b, and store the result in dst.
- _mm512_packs_ epi16 avx512bw
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst.
- _mm512_packs_ epi32 avx512bw
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst.
- _mm512_packus_ epi16 avx512bw
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst.
- _mm512_packus_ epi32 avx512bw
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst.
- _mm512_permute_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permute_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permutevar_ epi32 avx512f
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst. Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the permutevar name. This intrinsic is identical to _mm512_permutexvar_epi32, and it is recommended that you use that intrinsic name.
- _mm512_permutevar_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
- _mm512_permutevar_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
- _mm512_permutex2var_ epi8 avx512vbmi
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ epi16 avx512bw
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ epi32 avx512f
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ epi64 avx512f
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex_ epi64 avx512f
- Shuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permutex_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permutexvar_ epi8 avx512vbmi
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ epi16 avx512bw
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ epi32 avx512f
- Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ epi64 avx512f
- Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.
- _mm512_popcnt_ epi8 avx512bitalg
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm512_popcnt_ epi16 avx512bitalg
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm512_popcnt_ epi32 avx512vpopcntdq
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm512_popcnt_ epi64 avx512vpopcntdq
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm512_range_ pd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_range_ ps avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_range_ round_ pd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_range_ round_ ps avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_rcp14_ pd avx512f
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_rcp14_ ps avx512f
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_reduce_ add_ epi32 avx512f
- Reduce the packed 32-bit integers in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ add_ epi64 avx512f
- Reduce the packed 64-bit integers in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ add_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ add_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ and_ epi32 avx512f
- Reduce the packed 32-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm512_reduce_ and_ epi64 avx512f
- Reduce the packed 64-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm512_reduce_ max_ epi32 avx512f
- Reduce the packed signed 32-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ max_ epi64 avx512f
- Reduce the packed signed 64-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ max_ epu32 avx512f
- Reduce the packed unsigned 32-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ max_ epu64 avx512f
- Reduce the packed unsigned 64-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ max_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ max_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ min_ epi32 avx512f
- Reduce the packed signed 32-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ min_ epi64 avx512f
- Reduce the packed signed 64-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ min_ epu32 avx512f
- Reduce the packed unsigned 32-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ min_ epu64 avx512f
- Reduce the packed unsigned 64-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ min_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ min_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ mul_ epi32 avx512f
- Reduce the packed 32-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ mul_ epi64 avx512f
- Reduce the packed 64-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ mul_ pd avx512f
- Reduce the packed double-precision (64-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ mul_ ps avx512f
- Reduce the packed single-precision (32-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ or_ epi32 avx512f
- Reduce the packed 32-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm512_reduce_ or_ epi64 avx512f
- Reduce the packed 64-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm512_reduce_ pd avx512dq
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_reduce_ ps avx512dq
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_reduce_ round_ pd avx512dq
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_reduce_ round_ ps avx512dq
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_rol_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm512_rol_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm512_rolv_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_rolv_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_ror_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm512_ror_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm512_rorv_ epi32 avx512f
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_rorv_ epi64 avx512f
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_roundscale_ pd avx512f
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_roundscale_ ps avx512f
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_roundscale_ round_ pd avx512f
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_roundscale_ round_ ps avx512f
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_rsqrt14_ pd avx512f
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_rsqrt14_ ps avx512f
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_sad_ epu8 avx512bw
- Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce eight unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in dst.
- _mm512_scalef_ pd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_scalef_ ps avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_scalef_ round_ pd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.\
- _mm512_scalef_ round_ ps avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.\
- _mm512_set1_ epi8 avx512f
- Broadcast 8-bit integer a to all elements of dst.
- _mm512_set1_ epi16 avx512f
- Broadcast the low packed 16-bit integer from a to all elements of dst.
- _mm512_set1_ epi32 avx512f
- Broadcast 32-bit integer ato all elements ofdst.
- _mm512_set1_ epi64 avx512f
- Broadcast 64-bit integer ato all elements ofdst.
- _mm512_set1_ pd avx512f
- Broadcast 64-bit float ato all elements ofdst.
- _mm512_set1_ ps avx512f
- Broadcast 32-bit float ato all elements ofdst.
- _mm512_set4_ epi32 avx512f
- Set packed 32-bit integers in dst with the repeated 4 element sequence.
- _mm512_set4_ epi64 avx512f
- Set packed 64-bit integers in dst with the repeated 4 element sequence.
- _mm512_set4_ pd avx512f
- Set packed double-precision (64-bit) floating-point elements in dst with the repeated 4 element sequence.
- _mm512_set4_ ps avx512f
- Set packed single-precision (32-bit) floating-point elements in dst with the repeated 4 element sequence.
- _mm512_set_ epi8 avx512f
- Set packed 8-bit integers in dst with the supplied values.
- _mm512_set_ epi16 avx512f
- Set packed 16-bit integers in dst with the supplied values.
- _mm512_set_ epi32 avx512f
- Sets packed 32-bit integers in dstwith the supplied values.
- _mm512_set_ epi64 avx512f
- Set packed 64-bit integers in dst with the supplied values.
- _mm512_set_ pd avx512f
- Set packed double-precision (64-bit) floating-point elements in dst with the supplied values.
- _mm512_set_ ps avx512f
- Sets packed 32-bit integers in dstwith the supplied values.
- _mm512_setr4_ epi32 avx512f
- Set packed 32-bit integers in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr4_ epi64 avx512f
- Set packed 64-bit integers in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr4_ pd avx512f
- Set packed double-precision (64-bit) floating-point elements in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr4_ ps avx512f
- Set packed single-precision (32-bit) floating-point elements in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr_ epi32 avx512f
- Sets packed 32-bit integers in dstwith the supplied values in reverse order.
- _mm512_setr_ epi64 avx512f
- Set packed 64-bit integers in dst with the supplied values in reverse order.
- _mm512_setr_ pd avx512f
- Set packed double-precision (64-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm512_setr_ ps avx512f
- Sets packed 32-bit integers in dstwith the supplied values in reverse order.
- _mm512_setzero avx512f
- Return vector of type __m512with all elements set to zero.
- _mm512_setzero_ epi32 avx512f
- Return vector of type __m512iwith all elements set to zero.
- _mm512_setzero_ pd avx512f
- Returns vector of type __m512dwith all elements set to zero.
- _mm512_setzero_ ps avx512f
- Returns vector of type __m512with all elements set to zero.
- _mm512_setzero_ si512 avx512f
- Returns vector of type __m512iwith all elements set to zero.
- _mm512_shldi_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst).
- _mm512_shldi_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst.
- _mm512_shldi_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst).
- _mm512_shldv_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst.
- _mm512_shldv_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst.
- _mm512_shldv_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst.
- _mm512_shrdi_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst.
- _mm512_shrdi_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst.
- _mm512_shrdi_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst.
- _mm512_shrdv_ epi16 avx512vbmi2
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst.
- _mm512_shrdv_ epi32 avx512vbmi2
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.
- _mm512_shrdv_ epi64 avx512vbmi2
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst.
- _mm512_shuffle_ epi8 avx512bw
- Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst.
- _mm512_shuffle_ epi32 avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_shuffle_ f32x4 avx512f
- Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ f64x2 avx512f
- Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ i32x4 avx512f
- Shuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ i64x2 avx512f
- Shuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ pd avx512f
- Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_shuffle_ ps avx512f
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_shufflehi_ epi16 avx512bw
- Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst.
- _mm512_shufflelo_ epi16 avx512bw
- Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst.
- _mm512_sll_ epi16 avx512bw
- Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst.
- _mm512_sll_ epi32 avx512f
- Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst.
- _mm512_sll_ epi64 avx512f
- Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst.
- _mm512_slli_ epi16 avx512bw
- Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
- _mm512_slli_ epi32 avx512f
- Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
- _mm512_slli_ epi64 avx512f
- Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
- _mm512_sllv_ epi16 avx512bw
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_sllv_ epi32 avx512f
- Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_sllv_ epi64 avx512f
- Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_sqrt_ pd avx512f
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst.
- _mm512_sqrt_ ps avx512f
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst.
- _mm512_sqrt_ round_ pd avx512f
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst.\
- _mm512_sqrt_ round_ ps avx512f
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst.\
- _mm512_sra_ epi16 avx512bw
- Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm512_sra_ epi32 avx512f
- Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm512_sra_ epi64 avx512f
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm512_srai_ epi16 avx512bw
- Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm512_srai_ epi32 avx512f
- Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm512_srai_ epi64 avx512f
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm512_srav_ epi16 avx512bw
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm512_srav_ epi32 avx512f
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm512_srav_ epi64 avx512f
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm512_srl_ epi16 avx512bw
- Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst.
- _mm512_srl_ epi32 avx512f
- Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst.
- _mm512_srl_ epi64 avx512f
- Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst.
- _mm512_srli_ epi16 avx512bw
- Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
- _mm512_srli_ epi32 avx512f
- Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
- _mm512_srli_ epi64 avx512f
- Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
- _mm512_srlv_ epi16 avx512bw
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_srlv_ epi32 avx512f
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_srlv_ epi64 avx512f
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_store_ ⚠epi32 avx512f
- Store 512-bits (composed of 16 packed 32-bit integers) from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠epi64 avx512f
- Store 512-bits (composed of 8 packed 64-bit integers) from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠pd avx512f
- Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠ps avx512f
- Store 512-bits of integer data from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠si512 avx512f
- Store 512-bits of integer data from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_storeu_ ⚠epi8 avx512bw
- Store 512-bits (composed of 64 packed 8-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠epi16 avx512bw
- Store 512-bits (composed of 32 packed 16-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠epi32 avx512f
- Store 512-bits (composed of 16 packed 32-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠epi64 avx512f
- Store 512-bits (composed of 8 packed 64-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠pd avx512f
- Stores 512-bits (composed of 8 packed double-precision (64-bit)
floating-point elements) from ainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠ps avx512f
- Stores 512-bits (composed of 16 packed single-precision (32-bit)
floating-point elements) from ainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠si512 avx512f
- Store 512-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_stream_ ⚠load_ si512 avx512f
- Load 512-bits of integer data from memory into dst using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm512_stream_ ⚠pd avx512f
- Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_stream_ ⚠ps avx512f
- Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_stream_ ⚠si512 avx512f
- Store 512-bits of integer data from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_sub_ epi8 avx512bw
- Subtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst.
- _mm512_sub_ epi16 avx512bw
- Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst.
- _mm512_sub_ epi32 avx512f
- Subtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst.
- _mm512_sub_ epi64 avx512f
- Subtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst.
- _mm512_sub_ pd avx512f
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst.
- _mm512_sub_ ps avx512f
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst.
- _mm512_sub_ round_ pd avx512f
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst.\
- _mm512_sub_ round_ ps avx512f
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst.\
- _mm512_subs_ epi8 avx512bw
- Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst.
- _mm512_subs_ epi16 avx512bw
- Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst.
- _mm512_subs_ epu8 avx512bw
- Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst.
- _mm512_subs_ epu16 avx512bw
- Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst.
- _mm512_ternarylogic_ epi32 avx512f
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm512_ternarylogic_ epi64 avx512f
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm512_test_ epi8_ mask avx512bw
- Compute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_test_ epi16_ mask avx512bw
- Compute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_test_ epi32_ mask avx512f
- Compute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_test_ epi64_ mask avx512f
- Compute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_testn_ epi8_ mask avx512bw
- Compute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_testn_ epi16_ mask avx512bw
- Compute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_testn_ epi32_ mask avx512f
- Compute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_testn_ epi64_ mask avx512f
- Compute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_undefined avx512f
- Return vector of type __m512 with indeterminate elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm512_undefined_ epi32 avx512f
- Return vector of type __m512i with indeterminate elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm512_undefined_ pd avx512f
- Returns vector of type __m512dwith indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm512_undefined_ ps avx512f
- Returns vector of type __m512with indeterminate elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm512_unpackhi_ epi8 avx512bw
- Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ epi16 avx512bw
- Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ epi32 avx512f
- Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ epi64 avx512f
- Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ pd avx512f
- Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ ps avx512f
- Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ epi8 avx512bw
- Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ epi16 avx512bw
- Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ epi32 avx512f
- Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ epi64 avx512f
- Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ pd avx512f
- Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ ps avx512f
- Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_xor_ epi32 avx512f
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst.
- _mm512_xor_ epi64 avx512f
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst.
- _mm512_xor_ pd avx512dq
- Compute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst.
- _mm512_xor_ ps avx512dq
- Compute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst.
- _mm512_xor_ si512 avx512f
- Compute the bitwise XOR of 512 bits (representing integer data) in a and b, and store the result in dst.
- _mm512_zextpd128_ pd512 avx512f
- Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextpd256_ pd512 avx512f
- Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextps128_ ps512 avx512f
- Cast vector of type __m128 to type __m512; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextps256_ ps512 avx512f
- Cast vector of type __m256 to type __m512; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextsi128_ si512 avx512f
- Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextsi256_ si512 avx512f
- Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_abs_ epi8 ssse3
- Computes the absolute value of packed 8-bit signed integers in aand return the unsigned results.
- _mm_abs_ epi16 ssse3
- Computes the absolute value of each of the packed 16-bit signed integers in
aand return the 16-bit unsigned integer
- _mm_abs_ epi32 ssse3
- Computes the absolute value of each of the packed 32-bit signed integers in
aand return the 32-bit unsigned integer
- _mm_abs_ epi64 avx512fandavx512vl
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst.
- _mm_add_ epi8 sse2
- Adds packed 8-bit integers in aandb.
- _mm_add_ epi16 sse2
- Adds packed 16-bit integers in aandb.
- _mm_add_ epi32 sse2
- Adds packed 32-bit integers in aandb.
- _mm_add_ epi64 sse2
- Adds packed 64-bit integers in aandb.
- _mm_add_ pd sse2
- Adds packed double-precision (64-bit) floating-point elements in aandb.
- _mm_add_ ps sse
- Adds packed single-precision (32-bit) floating-point elements in aandb.
- _mm_add_ round_ sd avx512f
- Add the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_add_ round_ ss avx512f
- Add the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_add_ sd sse2
- Returns a new vector with the low element of areplaced by the sum of the low elements ofaandb.
- _mm_add_ ss sse
- Adds the first component of aandb, the other components are copied froma.
- _mm_adds_ epi8 sse2
- Adds packed 8-bit integers in aandbusing saturation.
- _mm_adds_ epi16 sse2
- Adds packed 16-bit integers in aandbusing saturation.
- _mm_adds_ epu8 sse2
- Adds packed unsigned 8-bit integers in aandbusing saturation.
- _mm_adds_ epu16 sse2
- Adds packed unsigned 16-bit integers in aandbusing saturation.
- _mm_addsub_ pd sse3
- Alternatively add and subtract packed double-precision (64-bit)
floating-point elements in ato/from packed elements inb.
- _mm_addsub_ ps sse3
- Alternatively add and subtract packed single-precision (32-bit)
floating-point elements in ato/from packed elements inb.
- _mm_aesdec128kl_ ⚠u8 kl
- Decrypt 10 rounds of unsigned 8-bit integers in inputusing 128-bit AES key specified in the 384-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesdec256kl_ ⚠u8 kl
- Decrypt 14 rounds of unsigned 8-bit integers in inputusing 256-bit AES key specified in the 512-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesdec_ si128 aes
- Performs one round of an AES decryption flow on data (state) in a.
- _mm_aesdeclast_ si128 aes
- Performs the last round of an AES decryption flow on data (state) in a.
- _mm_aesdecwide128kl_ ⚠u8 widekl
- Decrypt 10 rounds of 8 groups of unsigned 8-bit integers in inputusing 128-bit AES key specified in the 384-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesdecwide256kl_ ⚠u8 widekl
- Decrypt 14 rounds of 8 groups of unsigned 8-bit integers in inputusing 256-bit AES key specified in the 512-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesenc128kl_ ⚠u8 kl
- Encrypt 10 rounds of unsigned 8-bit integers in inputusing 128-bit AES key specified in the 384-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesenc256kl_ ⚠u8 kl
- Encrypt 14 rounds of unsigned 8-bit integers in inputusing 256-bit AES key specified in the 512-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesenc_ si128 aes
- Performs one round of an AES encryption flow on data (state) in a.
- _mm_aesenclast_ si128 aes
- Performs the last round of an AES encryption flow on data (state) in a.
- _mm_aesencwide128kl_ ⚠u8 widekl
- Encrypt 10 rounds of 8 groups of unsigned 8-bit integers in inputusing 128-bit AES key specified in the 384-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesencwide256kl_ ⚠u8 widekl
- Encrypt 14 rounds of 8 groups of unsigned 8-bit integers in inputusing 256-bit AES key specified in the 512-bit key handlehandle. Store the resulting unsigned 8-bit integers into the corresponding elements ofoutput. Returns0if the operation was successful, and1if the operation failed due to a handle violation.
- _mm_aesimc_ si128 aes
- Performs the InvMixColumnstransformation ona.
- _mm_aeskeygenassist_ si128 aes
- Assist in expanding the AES cipher key.
- _mm_alignr_ epi8 ssse3
- Concatenate 16-byte blocks in aandbinto a 32-byte temporary result, shift the result right bynbytes, and returns the low 16 bytes.
- _mm_alignr_ epi32 avx512fandavx512vl
- Concatenate a and b into a 32-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 16 bytes (4 elements) in dst.
- _mm_alignr_ epi64 avx512fandavx512vl
- Concatenate a and b into a 32-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 16 bytes (2 elements) in dst.
- _mm_and_ pd sse2
- Computes the bitwise AND of packed double-precision (64-bit) floating-point
elements in aandb.
- _mm_and_ ps sse
- Bitwise AND of packed single-precision (32-bit) floating-point elements.
- _mm_and_ si128 sse2
- Computes the bitwise AND of 128 bits (representing integer data) in aandb.
- _mm_andnot_ pd sse2
- Computes the bitwise NOT of aand then AND withb.
- _mm_andnot_ ps sse
- Bitwise AND-NOT of packed single-precision (32-bit) floating-point elements.
- _mm_andnot_ si128 sse2
- Computes the bitwise NOT of 128 bits (representing integer data) in aand then AND withb.
- _mm_avg_ epu8 sse2
- Averages packed unsigned 8-bit integers in aandb.
- _mm_avg_ epu16 sse2
- Averages packed unsigned 16-bit integers in aandb.
- _mm_bitshuffle_ epi64_ mask avx512bitalgandavx512vl
- Considers the input bas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm_blend_ epi16 sse4.1
- Blend packed 16-bit integers from aandbusing the maskIMM8.
- _mm_blend_ epi32 avx2
- Blends packed 32-bit integers from aandbusing control maskIMM4.
- _mm_blend_ pd sse4.1
- Blend packed double-precision (64-bit) floating-point elements from aandbusing control maskIMM2
- _mm_blend_ ps sse4.1
- Blend packed single-precision (32-bit) floating-point elements from aandbusing maskIMM4
- _mm_blendv_ epi8 sse4.1
- Blend packed 8-bit integers from aandbusingmask
- _mm_blendv_ pd sse4.1
- Blend packed double-precision (64-bit) floating-point elements from aandbusingmask
- _mm_blendv_ ps sse4.1
- Blend packed single-precision (32-bit) floating-point elements from aandbusingmask
- _mm_broadcast_ i32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst.
- _mm_broadcast_ ss avx
- Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
- _mm_broadcastb_ epi8 avx2
- Broadcasts the low packed 8-bit integer from ato all elements of the 128-bit returned value.
- _mm_broadcastd_ epi32 avx2
- Broadcasts the low packed 32-bit integer from ato all elements of the 128-bit returned value.
- _mm_broadcastmb_ epi64 avx512cdandavx512vl
- Broadcast the low 8-bits from input mask k to all 64-bit elements of dst.
- _mm_broadcastmw_ epi32 avx512cdandavx512vl
- Broadcast the low 16-bits from input mask k to all 32-bit elements of dst.
- _mm_broadcastq_ epi64 avx2
- Broadcasts the low packed 64-bit integer from ato all elements of the 128-bit returned value.
- _mm_broadcastsd_ pd avx2
- Broadcasts the low double-precision (64-bit) floating-point element
from ato all elements of the 128-bit returned value.
- _mm_broadcastsi128_ si256 avx2
- Broadcasts 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value.
- _mm_broadcastss_ ps avx2
- Broadcasts the low single-precision (32-bit) floating-point element
from ato all elements of the 128-bit returned value.
- _mm_broadcastw_ epi16 avx2
- Broadcasts the low packed 16-bit integer from a to all elements of the 128-bit returned value
- _mm_bslli_ si128 sse2
- Shifts aleft byIMM8bytes while shifting in zeros.
- _mm_bsrli_ si128 sse2
- Shifts aright byIMM8bytes while shifting in zeros.
- _mm_castpd_ ps sse2
- Casts a 128-bit floating-point vector of [2 x double]into a 128-bit floating-point vector of[4 x float].
- _mm_castpd_ si128 sse2
- Casts a 128-bit floating-point vector of [2 x double]into a 128-bit integer vector.
- _mm_castps_ pd sse2
- Casts a 128-bit floating-point vector of [4 x float]into a 128-bit floating-point vector of[2 x double].
- _mm_castps_ si128 sse2
- Casts a 128-bit floating-point vector of [4 x float]into a 128-bit integer vector.
- _mm_castsi128_ pd sse2
- Casts a 128-bit integer vector into a 128-bit floating-point vector
of [2 x double].
- _mm_castsi128_ ps sse2
- Casts a 128-bit integer vector into a 128-bit floating-point vector
of [4 x float].
- _mm_ceil_ pd sse4.1
- Round the packed double-precision (64-bit) floating-point elements in aup to an integer value, and stores the results as packed double-precision floating-point elements.
- _mm_ceil_ ps sse4.1
- Round the packed single-precision (32-bit) floating-point elements in aup to an integer value, and stores the results as packed single-precision floating-point elements.
- _mm_ceil_ sd sse4.1
- Round the lower double-precision (64-bit) floating-point element in bup to an integer value, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copies the upper element fromato the upper element of the intrinsic result.
- _mm_ceil_ ss sse4.1
- Round the lower single-precision (32-bit) floating-point element in bup to an integer value, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copies the upper 3 packed elements fromato the upper elements of the intrinsic result.
- _mm_clflush ⚠sse2
- Invalidates and flushes the cache line that contains pfrom all levels of the cache hierarchy.
- _mm_clmulepi64_ si128 pclmulqdq
- Performs a carry-less multiplication of two 64-bit polynomials over the finite field GF(2).
- _mm_cmp_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ pd avx
- Compares packed double-precision (64-bit) floating-point
elements in aandbbased on the comparison operand specified byIMM5.
- _mm_cmp_ pd_ mask avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ps avx
- Compares packed single-precision (32-bit) floating-point
elements in aandbbased on the comparison operand specified byIMM5.
- _mm_cmp_ ps_ mask avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ round_ sd_ mask avx512f
- Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cmp_ round_ ss_ mask avx512f
- Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cmp_ sd avx
- Compares the lower double-precision (64-bit) floating-point element in
aandbbased on the comparison operand specified byIMM5, store the result in the lower element of returned vector, and copies the upper element fromato the upper element of returned vector.
- _mm_cmp_ sd_ mask avx512f
- Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_cmp_ ss avx
- Compares the lower single-precision (32-bit) floating-point element in
aandbbased on the comparison operand specified byIMM5, store the result in the lower element of returned vector, and copies the upper 3 packed elements fromato the upper elements of returned vector.
- _mm_cmp_ ss_ mask avx512f
- Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_cmpeq_ epi8 sse2
- Compares packed 8-bit integers in aandbfor equality.
- _mm_cmpeq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ epi16 sse2
- Compares packed 16-bit integers in aandbfor equality.
- _mm_cmpeq_ epi32 sse2
- Compares packed 32-bit integers in aandbfor equality.
- _mm_cmpeq_ epi64 sse4.1
- Compares packed 64-bit integers in aandbfor equality
- _mm_cmpeq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ epi64_ mask avx512fandavx512vl
- Compare packed 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ pd sse2
- Compares corresponding elements in aandbfor equality.
- _mm_cmpeq_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input elements were equal, or0otherwise.
- _mm_cmpeq_ sd sse2
- Returns a new vector with the low element of areplaced by the equality comparison of the lower elements ofaandb.
- _mm_cmpeq_ ss sse
- Compares the lowest f32of both inputs for equality. The lowest 32 bits of the result will be0xffffffffif the two inputs are equal, or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpestra sse4.2
- Compares packed strings in aandbwith lengthslaandlbusing the control inIMM8, and return1ifbdid not contain a null character and the resulting mask was zero, and0otherwise.
- _mm_cmpestrc sse4.2
- Compares packed strings in aandbwith lengthslaandlbusing the control inIMM8, and return1if the resulting mask was non-zero, and0otherwise.
- _mm_cmpestri sse4.2
- Compares packed strings aandbwith lengthslaandlbusing the control inIMM8and return the generated index. Similar to_mm_cmpistriwith the exception that_mm_cmpistriimplicitly determines the length ofaandb.
- _mm_cmpestrm sse4.2
- Compares packed strings in aandbwith lengthslaandlbusing the control inIMM8, and return the generated mask.
- _mm_cmpestro sse4.2
- Compares packed strings in aandbwith lengthslaandlbusing the control inIMM8, and return bit0of the resulting bit mask.
- _mm_cmpestrs sse4.2
- Compares packed strings in aandbwith lengthslaandlbusing the control inIMM8, and return1if any character in a was null, and0otherwise.
- _mm_cmpestrz sse4.2
- Compares packed strings in aandbwith lengthslaandlbusing the control inIMM8, and return1if any character inbwas null, and0otherwise.
- _mm_cmpge_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ pd sse2
- Compares corresponding elements in aandbfor greater-than-or-equal.
- _mm_cmpge_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais greater than or equal to the corresponding element inb, or0otherwise.
- _mm_cmpge_ sd sse2
- Returns a new vector with the low element of areplaced by the greater-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmpge_ ss sse
- Compares the lowest f32of both inputs for greater than or equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is greater than or equalb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpgt_ epi8 sse2
- Compares packed 8-bit integers in aandbfor greater-than.
- _mm_cmpgt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ epi16 sse2
- Compares packed 16-bit integers in aandbfor greater-than.
- _mm_cmpgt_ epi32 sse2
- Compares packed 32-bit integers in aandbfor greater-than.
- _mm_cmpgt_ epi64 sse4.2
- Compares packed 64-bit integers in aandbfor greater-than, return the results.
- _mm_cmpgt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ pd sse2
- Compares corresponding elements in aandbfor greater-than.
- _mm_cmpgt_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais greater than the corresponding element inb, or0otherwise.
- _mm_cmpgt_ sd sse2
- Returns a new vector with the low element of areplaced by the greater-than comparison of the lower elements ofaandb.
- _mm_cmpgt_ ss sse
- Compares the lowest f32of both inputs for greater than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is greater thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpistra sse4.2
- Compares packed strings with implicit lengths in aandbusing the control inIMM8, and return1ifbdid not contain a null character and the resulting mask was zero, and0otherwise.
- _mm_cmpistrc sse4.2
- Compares packed strings with implicit lengths in aandbusing the control inIMM8, and return1if the resulting mask was non-zero, and0otherwise.
- _mm_cmpistri sse4.2
- Compares packed strings with implicit lengths in aandbusing the control inIMM8and return the generated index. Similar to_mm_cmpestriwith the exception that_mm_cmpestrirequires the lengths ofaandbto be explicitly specified.
- _mm_cmpistrm sse4.2
- Compares packed strings with implicit lengths in aandbusing the control inIMM8, and return the generated mask.
- _mm_cmpistro sse4.2
- Compares packed strings with implicit lengths in aandbusing the control inIMM8, and return bit0of the resulting bit mask.
- _mm_cmpistrs sse4.2
- Compares packed strings with implicit lengths in aandbusing the control inIMM8, and returns1if any character inawas null, and0otherwise.
- _mm_cmpistrz sse4.2
- Compares packed strings with implicit lengths in aandbusing the control inIMM8, and return1if any character inbwas null. and0otherwise.
- _mm_cmple_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ pd sse2
- Compares corresponding elements in aandbfor less-than-or-equal
- _mm_cmple_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais less than or equal to the corresponding element inb, or0otherwise.
- _mm_cmple_ sd sse2
- Returns a new vector with the low element of areplaced by the less-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmple_ ss sse
- Compares the lowest f32of both inputs for less than or equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is less than or equalb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmplt_ epi8 sse2
- Compares packed 8-bit integers in aandbfor less-than.
- _mm_cmplt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ epi16 sse2
- Compares packed 16-bit integers in aandbfor less-than.
- _mm_cmplt_ epi32 sse2
- Compares packed 32-bit integers in aandbfor less-than.
- _mm_cmplt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ pd sse2
- Compares corresponding elements in aandbfor less-than.
- _mm_cmplt_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais less than the corresponding element inb, or0otherwise.
- _mm_cmplt_ sd sse2
- Returns a new vector with the low element of areplaced by the less-than comparison of the lower elements ofaandb.
- _mm_cmplt_ ss sse
- Compares the lowest f32of both inputs for less than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is less thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpneq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ pd sse2
- Compares corresponding elements in aandbfor not-equal.
- _mm_cmpneq_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input elements are not equal, or0otherwise.
- _mm_cmpneq_ sd sse2
- Returns a new vector with the low element of areplaced by the not-equal comparison of the lower elements ofaandb.
- _mm_cmpneq_ ss sse
- Compares the lowest f32of both inputs for inequality. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not equal tob.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpnge_ pd sse2
- Compares corresponding elements in aandbfor not-greater-than-or-equal.
- _mm_cmpnge_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not greater than or equal to the corresponding element inb, or0otherwise.
- _mm_cmpnge_ sd sse2
- Returns a new vector with the low element of areplaced by the not-greater-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmpnge_ ss sse
- Compares the lowest f32of both inputs for not-greater-than-or-equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not greater than or equal tob.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpngt_ pd sse2
- Compares corresponding elements in aandbfor not-greater-than.
- _mm_cmpngt_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not greater than the corresponding element inb, or0otherwise.
- _mm_cmpngt_ sd sse2
- Returns a new vector with the low element of areplaced by the not-greater-than comparison of the lower elements ofaandb.
- _mm_cmpngt_ ss sse
- Compares the lowest f32of both inputs for not-greater-than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not greater thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpnle_ pd sse2
- Compares corresponding elements in aandbfor not-less-than-or-equal.
- _mm_cmpnle_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not less than or equal to the corresponding element inb, or0otherwise.
- _mm_cmpnle_ sd sse2
- Returns a new vector with the low element of areplaced by the not-less-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmpnle_ ss sse
- Compares the lowest f32of both inputs for not-less-than-or-equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not less than or equal tob.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpnlt_ pd sse2
- Compares corresponding elements in aandbfor not-less-than.
- _mm_cmpnlt_ ps sse
- Compares each of the four floats in ato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not less than the corresponding element inb, or0otherwise.
- _mm_cmpnlt_ sd sse2
- Returns a new vector with the low element of areplaced by the not-less-than comparison of the lower elements ofaandb.
- _mm_cmpnlt_ ss sse
- Compares the lowest f32of both inputs for not-less-than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not less thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpord_ pd sse2
- Compares corresponding elements in aandbto see if neither isNaN.
- _mm_cmpord_ ps sse
- Compares each of the four floats in ato the corresponding element inb. Returns four floats that have one of two possible bit patterns. The element in the output vector will be0xffffffffif the input elements inaandbare ordered (i.e., neither of them is a NaN), or 0 otherwise.
- _mm_cmpord_ sd sse2
- Returns a new vector with the low element of areplaced by the result of comparing both of the lower elements ofaandbtoNaN. If neither are equal toNaNthen0xFFFFFFFFFFFFFFFFis used and0otherwise.
- _mm_cmpord_ ss sse
- Checks if the lowest f32of both inputs are ordered. The lowest 32 bits of the result will be0xffffffffif neither ofa.extract(0)orb.extract(0)is a NaN, or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpunord_ pd sse2
- Compares corresponding elements in aandbto see if either isNaN.
- _mm_cmpunord_ ps sse
- Compares each of the four floats in ato the corresponding element inb. Returns four floats that have one of two possible bit patterns. The element in the output vector will be0xffffffffif the input elements inaandbare unordered (i.e., at least on of them is a NaN), or 0 otherwise.
- _mm_cmpunord_ sd sse2
- Returns a new vector with the low element of areplaced by the result of comparing both of the lower elements ofaandbtoNaN. If either is equal toNaNthen0xFFFFFFFFFFFFFFFFis used and0otherwise.
- _mm_cmpunord_ ss sse
- Checks if the lowest f32of both inputs are unordered. The lowest 32 bits of the result will be0xffffffffif any ofa.extract(0)orb.extract(0)is a NaN, or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_comi_ round_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_comi_ round_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_comieq_ sd sse2
- Compares the lower element of aandbfor equality.
- _mm_comieq_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if they are equal, or0otherwise.
- _mm_comige_ sd sse2
- Compares the lower element of aandbfor greater-than-or-equal.
- _mm_comige_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais greater than or equal to the one fromb, or0otherwise.
- _mm_comigt_ sd sse2
- Compares the lower element of aandbfor greater-than.
- _mm_comigt_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais greater than the one fromb, or0otherwise.
- _mm_comile_ sd sse2
- Compares the lower element of aandbfor less-than-or-equal.
- _mm_comile_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais less than or equal to the one fromb, or0otherwise.
- _mm_comilt_ sd sse2
- Compares the lower element of aandbfor less-than.
- _mm_comilt_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais less than the one fromb, or0otherwise.
- _mm_comineq_ sd sse2
- Compares the lower element of aandbfor not-equal.
- _mm_comineq_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if they are not equal, or0otherwise.
- _mm_conflict_ epi32 avx512cdandavx512vl
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm_conflict_ epi64 avx512cdandavx512vl
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm_crc32_ u8 sse4.2
- Starting with the initial value in crc, return the accumulated CRC32-C value for unsigned 8-bit integerv.
- _mm_crc32_ u16 sse4.2
- Starting with the initial value in crc, return the accumulated CRC32-C value for unsigned 16-bit integerv.
- _mm_crc32_ u32 sse4.2
- Starting with the initial value in crc, return the accumulated CRC32-C value for unsigned 32-bit integerv.
- _mm_cvt_ roundi32_ ss avx512f
- Convert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_cvt_ roundsd_ i32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ roundsd_ si32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ roundsd_ ss avx512f
- Convert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ roundsd_ u32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ roundsi32_ ss avx512f
- Convert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_cvt_ roundss_ i32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ roundss_ sd avx512f
- Convert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvt_ roundss_ si32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ roundss_ u32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ roundu32_ ss avx512f
- Convert the unsigned 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ si2ss sse
- Alias for _mm_cvtsi32_ss.
- _mm_cvt_ ss2si sse
- Alias for _mm_cvtss_si32.
- _mm_cvtepi8_ epi16 sse4.1
- Sign extend packed 8-bit integers in ato packed 16-bit integers
- _mm_cvtepi8_ epi32 sse4.1
- Sign extend packed 8-bit integers in ato packed 32-bit integers
- _mm_cvtepi8_ epi64 sse4.1
- Sign extend packed 8-bit integers in the low 8 bytes of ato packed 64-bit integers
- _mm_cvtepi16_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm_cvtepi16_ epi32 sse4.1
- Sign extend packed 16-bit integers in ato packed 32-bit integers
- _mm_cvtepi16_ epi64 sse4.1
- Sign extend packed 16-bit integers in ato packed 64-bit integers
- _mm_cvtepi32_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm_cvtepi32_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_cvtepi32_ epi64 sse4.1
- Sign extend packed 32-bit integers in ato packed 64-bit integers
- _mm_cvtepi32_ pd sse2
- Converts the lower two packed 32-bit integers in ato packed double-precision (64-bit) floating-point elements.
- _mm_cvtepi32_ ps sse2
- Converts packed 32-bit integers in ato packed single-precision (32-bit) floating-point elements.
- _mm_cvtepi64_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm_cvtepi64_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_cvtepi64_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm_cvtepi64_ pd avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtepi64_ ps avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu8_ epi16 sse4.1
- Zeroes extend packed unsigned 8-bit integers in ato packed 16-bit integers
- _mm_cvtepu8_ epi32 sse4.1
- Zeroes extend packed unsigned 8-bit integers in ato packed 32-bit integers
- _mm_cvtepu8_ epi64 sse4.1
- Zeroes extend packed unsigned 8-bit integers in ato packed 64-bit integers
- _mm_cvtepu16_ epi32 sse4.1
- Zeroes extend packed unsigned 16-bit integers in ato packed 32-bit integers
- _mm_cvtepu16_ epi64 sse4.1
- Zeroes extend packed unsigned 16-bit integers in ato packed 64-bit integers
- _mm_cvtepu32_ epi64 sse4.1
- Zeroes extend packed unsigned 32-bit integers in ato packed 64-bit integers
- _mm_cvtepu32_ pd avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu64_ pd avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu64_ ps avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvti32_ sd avx512f
- Convert the signed 32-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvti32_ ss avx512f
- Convert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvtne2ps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in two 128-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 128-bit wide vector. Intel’s documentation
- _mm_cvtneebf16_ ⚠ps avxneconvert
- Convert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtneobf16_ ⚠ps avxneconvert
- Convert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtneps_ avx_ pbh avxneconvert
- Convert packed single precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtneps_ pbh avx512bf16andavx512vl
- Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtpbh_ ps avx512bf16andavx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtpd_ epi32 sse2
- Converts packed double-precision (64-bit) floating-point elements in ato packed 32-bit integers.
- _mm_cvtpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm_cvtpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm_cvtpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm_cvtpd_ ps sse2
- Converts packed double-precision (64-bit) floating-point elements in ato packed single-precision (32-bit) floating-point elements
- _mm_cvtph_ ps f16c
- Converts the 4 x 16-bit half-precision float values in the lowest 64-bit of
the 128-bit vector ainto 4 x 32-bit float values stored in a 128-bit wide vector.
- _mm_cvtps_ epi32 sse2
- Converts packed single-precision (32-bit) floating-point elements in ato packed 32-bit integers.
- _mm_cvtps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm_cvtps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm_cvtps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm_cvtps_ pd sse2
- Converts packed single-precision (32-bit) floating-point elements in ato packed double-precision (64-bit) floating-point elements.
- _mm_cvtps_ ph f16c
- Converts the 4 x 32-bit float values in the 128-bit vector ainto 4 x 16-bit half-precision float values stored in the lowest 64-bit of a 128-bit vector.
- _mm_cvtsd_ f64 sse2
- Returns the lower double-precision (64-bit) floating-point element of a.
- _mm_cvtsd_ i32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvtsd_ si32 sse2
- Converts the lower double-precision (64-bit) floating-point element in a to a 32-bit integer.
- _mm_cvtsd_ ss sse2
- Converts the lower double-precision (64-bit) floating-point element in bto a single-precision (32-bit) floating-point element, store the result in the lower element of the return value, and copies the upper element fromato the upper element the return value.
- _mm_cvtsd_ u32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
- _mm_cvtsepi16_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi32_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi32_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi64_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi64_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi64_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsi32_ sd sse2
- Returns awith its lower element replaced bybafter converting it to anf64.
- _mm_cvtsi32_ si128 sse2
- Returns a vector whose lowest element is aand all higher elements are0.
- _mm_cvtsi32_ ss sse
- Converts a 32 bit integer to a 32 bit float. The result vector is the input
vector awith the lowest 32 bit float replaced by the converted integer.
- _mm_cvtsi128_ si32 sse2
- Returns the lowest element of a.
- _mm_cvtss_ f32 sse
- Extracts the lowest 32 bit float from the input vector.
- _mm_cvtss_ i32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvtss_ sd sse2
- Converts the lower single-precision (32-bit) floating-point element in bto a double-precision (64-bit) floating-point element, store the result in the lower element of the return value, and copies the upper element fromato the upper element the return value.
- _mm_cvtss_ si32 sse
- Converts the lowest 32 bit float in the input vector to a 32 bit integer.
- _mm_cvtss_ u32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
- _mm_cvtt_ roundsd_ i32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ roundsd_ si32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ roundsd_ u32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ roundss_ i32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ roundss_ si32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ roundss_ u32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ss2si sse
- Alias for _mm_cvttss_si32.
- _mm_cvttpd_ epi32 sse2
- Converts packed double-precision (64-bit) floating-point elements in ato packed 32-bit integers with truncation.
- _mm_cvttpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm_cvttpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm_cvttpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm_cvttps_ epi32 sse2
- Converts packed single-precision (32-bit) floating-point elements in ato packed 32-bit integers with truncation.
- _mm_cvttps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm_cvttps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm_cvttps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm_cvttsd_ i32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvttsd_ si32 sse2
- Converts the lower double-precision (64-bit) floating-point element in ato a 32-bit integer with truncation.
- _mm_cvttsd_ u32 avx512f
- Convert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
- _mm_cvttss_ i32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvttss_ si32 sse
- Converts the lowest 32 bit float in the input vector to a 32 bit integer with truncation.
- _mm_cvttss_ u32 avx512f
- Convert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
- _mm_cvtu32_ sd avx512f
- Convert the unsigned 32-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvtu32_ ss avx512f
- Convert the unsigned 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvtusepi16_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi32_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi32_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi64_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi64_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi64_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst.
- _mm_dbsad_ epu8 avx512bwandavx512vl
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm_div_ pd sse2
- Divide packed double-precision (64-bit) floating-point elements in aby packed elements inb.
- _mm_div_ ps sse
- Divides packed single-precision (32-bit) floating-point elements in aandb.
- _mm_div_ round_ sd avx512f
- Divide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_div_ round_ ss avx512f
- Divide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_div_ sd sse2
- Returns a new vector with the low element of areplaced by the result of diving the lower element ofaby the lower element ofb.
- _mm_div_ ss sse
- Divides the first component of bbya, the other components are copied froma.
- _mm_dp_ pd sse4.1
- Returns the dot product of two __m128d vectors.
- _mm_dp_ ps sse4.1
- Returns the dot product of two __m128 vectors.
- _mm_dpbf16_ ps avx512bf16andavx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm_dpbssd_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbssds_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbsud_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbsuds_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbusd_ avx_ epi32 avxvnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbusd_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbusds_ avx_ epi32 avxvnni
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbusds_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbuud_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbuuds_ epi32 avxvnniint8
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwssd_ avx_ epi32 avxvnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwssd_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwssds_ avx_ epi32 avxvnni
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwssds_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwsud_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwsuds_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwusd_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwusds_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwuud_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwuuds_ epi32 avxvnniint16
- Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_encodekey128_ ⚠u32 kl
- Wrap a 128-bit AES key into a 384-bit key handle and stores it in handle. Returns thecontrolparameter used to create the IWKey.
- _mm_encodekey256_ ⚠u32 kl
- Wrap a 256-bit AES key into a 512-bit key handle and stores it in handle. Returns thecontrolparameter used to create the IWKey.
- _mm_extract_ epi8 sse4.1
- Extracts an 8-bit integer from a, selected withIMM8. Returns a 32-bit integer containing the zero-extended integer data.
- _mm_extract_ epi16 sse2
- Returns the imm8element ofa.
- _mm_extract_ epi32 sse4.1
- Extracts an 32-bit integer from aselected withIMM8
- _mm_extract_ ps sse4.1
- Extracts a single-precision (32-bit) floating-point element from a, selected withIMM8. The returnedi32stores the float’s bit-pattern, and may be converted back to a floating point number via casting.
- _mm_extract_ si64 sse4a
- Extracts the bit range specified by yfrom the lower 64 bits ofx.
- _mm_extracti_ si64 sse4a
- Extracts the specified bits from the lower 64 bits of the 128-bit integer vector operand at the
index idxand of the lengthlen.
- _mm_fixupimm_ pd avx512fandavx512vl
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm_fixupimm_ ps avx512fandavx512vl
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm_fixupimm_ round_ sd avx512f
- Fix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_fixupimm_ round_ ss avx512f
- Fix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_fixupimm_ sd avx512f
- Fix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
- _mm_fixupimm_ ss avx512f
- Fix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
- _mm_floor_ pd sse4.1
- Round the packed double-precision (64-bit) floating-point elements in adown to an integer value, and stores the results as packed double-precision floating-point elements.
- _mm_floor_ ps sse4.1
- Round the packed single-precision (32-bit) floating-point elements in adown to an integer value, and stores the results as packed single-precision floating-point elements.
- _mm_floor_ sd sse4.1
- Round the lower double-precision (64-bit) floating-point element in bdown to an integer value, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copies the upper element fromato the upper element of the intrinsic result.
- _mm_floor_ ss sse4.1
- Round the lower single-precision (32-bit) floating-point element in bdown to an integer value, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copies the upper 3 packed elements fromato the upper elements of the intrinsic result.
- _mm_fmadd_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and add the intermediate result to packed elements inc.
- _mm_fmadd_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and add the intermediate result to packed elements inc.
- _mm_fmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fmadd_ sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
aandb, and add the intermediate result to the lower element inc. Stores the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fmadd_ ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
aandb, and add the intermediate result to the lower element inc. Stores the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_fmaddsub_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm_fmaddsub_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm_fmsub_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and subtract packed elements incfrom the intermediate result.
- _mm_fmsub_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and subtract packed elements incfrom the intermediate result.
- _mm_fmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fmsub_ sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
aandb, and subtract the lower element incfrom the intermediate result. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fmsub_ ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
aandb, and subtract the lower element incfrom the intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_fmsubadd_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm_fmsubadd_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm_fnmadd_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and add the negated intermediate result to packed elements inc.
- _mm_fnmadd_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and add the negated intermediate result to packed elements inc.
- _mm_fnmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fnmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fnmadd_ sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
aandb, and add the negated intermediate result to the lower element inc. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fnmadd_ ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
aandb, and add the negated intermediate result to the lower element inc. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_fnmsub_ pd fma
- Multiplies packed double-precision (64-bit) floating-point elements in aandb, and subtract packed elements incfrom the negated intermediate result.
- _mm_fnmsub_ ps fma
- Multiplies packed single-precision (32-bit) floating-point elements in aandb, and subtract packed elements incfrom the negated intermediate result.
- _mm_fnmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fnmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, subtract the lower element in c from the negated intermediate result, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fnmsub_ sd fma
- Multiplies the lower double-precision (64-bit) floating-point elements in
aandb, and subtract packed elements incfrom the negated intermediate result. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fnmsub_ ss fma
- Multiplies the lower single-precision (32-bit) floating-point elements in
aandb, and subtract packed elements incfrom the negated intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_fpclass_ pd_ mask avx512dqandavx512vl
- Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ ps_ mask avx512dqandavx512vl
- Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ sd_ mask avx512dq
- Test the lower double-precision (64-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ ss_ mask avx512dq
- Test the lower single-precision (32-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_getcsr ⚠Deprecated sse
- Gets the unsigned 32-bit value of the MXCSR control and status register.
- _mm_getexp_ pd avx512fandavx512vl
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_getexp_ ps avx512fandavx512vl
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_getexp_ round_ sd avx512f
- Convert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getexp_ round_ ss avx512f
- Convert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getexp_ sd avx512f
- Convert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_getexp_ ss avx512f
- Convert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_getmant_ pd avx512fandavx512vl
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_getmant_ ps avx512fandavx512vl
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_getmant_ round_ sd avx512f
- Normalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getmant_ round_ ss avx512f
- Normalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getmant_ sd avx512f
- Normalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getmant_ ss avx512f
- Normalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_gf2p8affine_ epi64_ epi8 gfni
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_gf2p8affineinv_ epi64_ epi8 gfni
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_gf2p8mul_ epi8 gfni
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm_hadd_ epi16 ssse3
- Horizontally adds the adjacent pairs of values contained in 2 packed
128-bit vectors of [8 x i16].
- _mm_hadd_ epi32 ssse3
- Horizontally adds the adjacent pairs of values contained in 2 packed
128-bit vectors of [4 x i32].
- _mm_hadd_ pd sse3
- Horizontally adds adjacent pairs of double-precision (64-bit)
floating-point elements in aandb, and pack the results.
- _mm_hadd_ ps sse3
- Horizontally adds adjacent pairs of single-precision (32-bit)
floating-point elements in aandb, and pack the results.
- _mm_hadds_ epi16 ssse3
- Horizontally adds the adjacent pairs of values contained in 2 packed
128-bit vectors of [8 x i16]. Positive sums greater than 7FFFh are saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
- _mm_hsub_ epi16 ssse3
- Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of [8 x i16].
- _mm_hsub_ epi32 ssse3
- Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of [4 x i32].
- _mm_hsub_ pd sse3
- Horizontally subtract adjacent pairs of double-precision (64-bit)
floating-point elements in aandb, and pack the results.
- _mm_hsub_ ps sse3
- Horizontally adds adjacent pairs of single-precision (32-bit)
floating-point elements in aandb, and pack the results.
- _mm_hsubs_ epi16 ssse3
- Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of [8 x i16]. Positive differences greater than 7FFFh are saturated to 7FFFh. Negative differences less than 8000h are saturated to 8000h.
- _mm_i32gather_ ⚠epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i32gather_ ⚠epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i32gather_ ⚠pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i32gather_ ⚠ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i32scatter_ ⚠epi32 avx512fandavx512vl
- Stores 4 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i32scatter_ ⚠epi64 avx512fandavx512vl
- Stores 2 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i32scatter_ ⚠pd avx512fandavx512vl
- Stores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i32scatter_ ⚠ps avx512fandavx512vl
- Stores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i64gather_ ⚠epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64gather_ ⚠epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64gather_ ⚠pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64gather_ ⚠ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64scatter_ ⚠epi32 avx512fandavx512vl
- Stores 2 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_i64scatter_ ⚠epi64 avx512fandavx512vl
- Stores 2 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_i64scatter_ ⚠pd avx512fandavx512vl
- Stores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_i64scatter_ ⚠ps avx512fandavx512vl
- Stores 2 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_insert_ epi8 sse4.1
- Returns a copy of awith the 8-bit integer fromiinserted at a location specified byIMM8.
- _mm_insert_ epi16 sse2
- Returns a new vector where the imm8element ofais replaced withi.
- _mm_insert_ epi32 sse4.1
- Returns a copy of awith the 32-bit integer fromiinserted at a location specified byIMM8.
- _mm_insert_ ps sse4.1
- Select a single value in bto store at some position ina, Then zero elements according toIMM8.
- _mm_insert_ si64 sse4a
- Inserts the [length:0]bits ofyintoxatindex.
- _mm_inserti_ si64 sse4a
- Inserts the lenleast-significant bits from the lower 64 bits of the 128-bit integer vector operandyinto the lower 64 bits of the 128-bit integer vector operandxat the indexidxand of the lengthlen.
- _mm_lddqu_ ⚠si128 sse3
- Loads 128-bits of integer data from unaligned memory.
This intrinsic may perform better than _mm_loadu_si128when the data crosses a cache line boundary.
- _mm_lfence ⚠sse2
- Performs a serializing operation on all load-from-memory instructions that were issued prior to this instruction.
- _mm_load1_ ⚠pd sse2
- Loads a double-precision (64-bit) floating-point element from memory into both elements of returned vector.
- _mm_load1_ ⚠ps sse
- Construct a __m128by duplicating the value read frompinto all elements.
- _mm_load_ ⚠epi32 avx512fandavx512vl
- Load 128-bits (composed of 4 packed 32-bit integers) from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_load_ ⚠epi64 avx512fandavx512vl
- Load 128-bits (composed of 2 packed 64-bit integers) from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_load_ ⚠pd sse2
- Loads 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_load_ ⚠pd1 sse2
- Loads a double-precision (64-bit) floating-point element from memory into both elements of returned vector.
- _mm_load_ ⚠ps sse
- Loads four f32values from aligned memory into a__m128. If the pointer is not aligned to a 128-bit boundary (16 bytes) a general protection fault will be triggered (fatal program crash).
- _mm_load_ ⚠ps1 sse
- Alias for _mm_load1_ps
- _mm_load_ ⚠sd sse2
- Loads a 64-bit double-precision value to the low element of a 128-bit integer vector and clears the upper element.
- _mm_load_ ⚠si128 sse2
- Loads 128-bits of integer data from memory into a new vector.
- _mm_load_ ⚠ss sse
- Construct a __m128with the lowest element read frompand the other elements set to zero.
- _mm_loaddup_ ⚠pd sse3
- Loads a double-precision (64-bit) floating-point element from memory into both elements of return vector.
- _mm_loadh_ ⚠pd sse2
- Loads a double-precision value into the high-order bits of a 128-bit
vector of [2 x double]. The low-order bits are copied from the low-order bits of the first operand.
- _mm_loadiwkey ⚠kl
- Load internal wrapping key (IWKey). The 32-bit unsigned integer controlspecifies IWKey’s KeySource and whether backing up the key is permitted. IWKey’s 256-bit encryption key is loaded fromkey_loandkey_hi.
- _mm_loadl_ ⚠epi64 sse2
- Loads 64-bit integer from memory into first element of returned vector.
- _mm_loadl_ ⚠pd sse2
- Loads a double-precision value into the low-order bits of a 128-bit
vector of [2 x double]. The high-order bits are copied from the high-order bits of the first operand.
- _mm_loadr_ ⚠pd sse2
- Loads 2 double-precision (64-bit) floating-point elements from memory into
the returned vector in reverse order. mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_loadr_ ⚠ps sse
- Loads four f32values from aligned memory into a__m128in reverse order.
- _mm_loadu_ ⚠epi8 avx512bwandavx512vl
- Load 128-bits (composed of 16 packed 8-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠epi16 avx512bwandavx512vl
- Load 128-bits (composed of 8 packed 16-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠epi32 avx512fandavx512vl
- Load 128-bits (composed of 4 packed 32-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠epi64 avx512fandavx512vl
- Load 128-bits (composed of 2 packed 64-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠pd sse2
- Loads 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
mem_addrdoes not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠ps sse
- Loads four f32values from memory into a__m128. There are no restrictions on memory alignment. For aligned memory_mm_load_psmay be faster.
- _mm_loadu_ ⚠si16 sse2
- Loads unaligned 16-bits of integer data from memory into new vector.
- _mm_loadu_ ⚠si32 sse2
- Loads unaligned 32-bits of integer data from memory into new vector.
- _mm_loadu_ ⚠si64 sse2
- Loads unaligned 64-bits of integer data from memory into new vector.
- _mm_loadu_ ⚠si128 sse2
- Loads 128-bits of integer data from memory into a new vector.
- _mm_lzcnt_ epi32 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst.
- _mm_lzcnt_ epi64 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst.
- _mm_madd52hi_ avx_ epu64 avxifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_madd52hi_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_madd52lo_ avx_ epu64 avxifma
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_madd52lo_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_madd_ epi16 sse2
- Multiplies and then horizontally add signed 16 bit integers in aandb.
- _mm_maddubs_ epi16 ssse3
- Multiplies corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination.
- _mm_mask2_ permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask2_ permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask2_ permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask2_ permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask2_ permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set)
- _mm_mask2_ permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask3_ fmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ fmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ fmadd_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ fmadd_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask3_ fmaddsub_ pd avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmaddsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ fmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ fmsub_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ fmsub_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask3_ fmsubadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmsubadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ fnmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ fnmadd_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ fnmadd_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask3_ fnmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ fnmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ fnmsub_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ fnmsub_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask_ abs_ epi8 avx512bwandavx512vl
- Compute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set)
- _mm_mask_ abs_ epi16 avx512bwandavx512vl
- Compute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ abs_ epi32 avx512fandavx512vl
- Compute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ abs_ epi64 avx512fandavx512vl
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ epi8 avx512bwandavx512vl
- Add packed 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ epi16 avx512bwandavx512vl
- Add packed 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ epi32 avx512fandavx512vl
- Add packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ epi64 avx512fandavx512vl
- Add packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ pd avx512fandavx512vl
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ ps avx512fandavx512vl
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ round_ sd avx512f
- Add the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ add_ round_ ss avx512f
- Add the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ add_ sd avx512f
- Add the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ add_ ss avx512f
- Add the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ adds_ epi8 avx512bwandavx512vl
- Add packed signed 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ adds_ epi16 avx512bwandavx512vl
- Add packed signed 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ adds_ epu8 avx512bwandavx512vl
- Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ adds_ epu16 avx512bwandavx512vl
- Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ alignr_ epi8 avx512bwandavx512vl
- Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ alignr_ epi32 avx512fandavx512vl
- Concatenate a and b into a 32-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 16 bytes (4 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ alignr_ epi64 avx512fandavx512vl
- Concatenate a and b into a 32-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 16 bytes (2 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ and_ epi32 avx512fandavx512vl
- Performs element-by-element bitwise AND between packed 32-bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ and_ epi64 avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ and_ pd avx512dqandavx512vl
- Compute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ and_ ps avx512dqandavx512vl
- Compute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ andnot_ epi32 avx512fandavx512vl
- Compute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ andnot_ epi64 avx512fandavx512vl
- Compute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ andnot_ pd avx512dqandavx512vl
- Compute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ andnot_ ps avx512dqandavx512vl
- Compute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ avg_ epu8 avx512bwandavx512vl
- Average packed unsigned 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ avg_ epu16 avx512bwandavx512vl
- Average packed unsigned 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ bitshuffle_ epi64_ mask avx512bitalgandavx512vl
- Considers the input bas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm_mask_ blend_ epi8 avx512bwandavx512vl
- Blend packed 8-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ blend_ epi16 avx512bwandavx512vl
- Blend packed 16-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ blend_ epi32 avx512fandavx512vl
- Blend packed 32-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ blend_ epi64 avx512fandavx512vl
- Blend packed 64-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ blend_ pd avx512fandavx512vl
- Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_mask_ blend_ ps avx512fandavx512vl
- Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_mask_ broadcast_ i32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ broadcastb_ epi8 avx512bwandavx512vl
- Broadcast the low packed 8-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ broadcastd_ epi32 avx512fandavx512vl
- Broadcast the low packed 32-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ broadcastq_ epi64 avx512fandavx512vl
- Broadcast the low packed 64-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ broadcastss_ ps avx512fandavx512vl
- Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ broadcastw_ epi16 avx512bwandavx512vl
- Broadcast the low packed 16-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ pd_ mask avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ ps_ mask avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ round_ sd_ mask avx512f
- Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ cmp_ round_ ss_ mask avx512f
- Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not seti).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ cmp_ sd_ mask avx512f
- Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
- _mm_mask_ cmp_ ss_ mask avx512f
- Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
- _mm_mask_ cmpeq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpeq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpeq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpeq_ epi64_ mask avx512fandavx512vl
- Compare packed 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpeq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpeq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpeq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpeq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpge_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpgt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmple_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epi32_ mask avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmplt_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epi8_ mask avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epi16_ mask avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epi32_ mask avx512fandavx512vl
- Compare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epi64_ mask avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epu8_ mask avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epu16_ mask avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epu32_ mask avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmpneq_ epu64_ mask avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ compress_ epi8 avx512vbmi2andavx512vl
- Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ compress_ epi16 avx512vbmi2andavx512vl
- Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ compress_ epi32 avx512fandavx512vl
- Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ compress_ epi64 avx512fandavx512vl
- Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ compress_ pd avx512fandavx512vl
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ compress_ ps avx512fandavx512vl
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ ⚠compressstoreu_ epi8 avx512vbmi2andavx512vl
- Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ epi16 avx512vbmi2andavx512vl
- Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ epi32 avx512fandavx512vl
- Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ epi64 avx512fandavx512vl
- Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ pd avx512fandavx512vl
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ ps avx512fandavx512vl
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ conflict_ epi32 avx512cdandavx512vl
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_mask_ conflict_ epi64 avx512cdandavx512vl
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_mask_ cvt_ roundps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ cvt_ roundsd_ ss avx512f
- Convert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_mask_ cvt_ roundss_ sd avx512f
- Convert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ cvtepi8_ epi16 avx512bwandavx512vl
- Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi8_ epi32 avx512fandavx512vl
- Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi8_ epi64 avx512fandavx512vl
- Sign extend packed 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi16_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi16_ epi32 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi16_ epi64 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi16_ storeu_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtepi32_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi32_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi32_ epi64 avx512fandavx512vl
- Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi32_ pd avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi32_ ps avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi32_ storeu_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi32_ storeu_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtepi64_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi64_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi64_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepi64_ pd avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtepi64_ ps avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtepi64_ storeu_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi64_ storeu_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi64_ storeu_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtepu8_ epi16 avx512bwandavx512vl
- Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepu8_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in the low 4 bytes of a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepu8_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepu16_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepu16_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepu32_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepu32_ pd avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtepu64_ pd avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtepu64_ ps avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtne2ps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm_mask_ cvtneps_ pbh avx512bf16andavx512vl
- Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtpbh_ ps avx512bf16andavx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtpd_ ps avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtph_ ps avx512fandavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ cvtsd_ ss avx512f
- Convert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ cvtsepi16_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi16_ storeu_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtsepi32_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtsepi32_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi32_ storeu_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi32_ storeu_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtsepi64_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtsepi64_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtsepi64_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi64_ storeu_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi64_ storeu_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi64_ storeu_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtss_ sd avx512f
- Convert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ cvttpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvttpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvttps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvttps_ epu32 avx512fandavx512vl
- Convert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ cvtusepi16_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi16_ storeu_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtusepi32_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtusepi32_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi32_ storeu_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi32_ storeu_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ cvtusepi64_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtusepi64_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtusepi64_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi64_ storeu_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi64_ storeu_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi64_ storeu_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ dbsad_ epu8 avx512bwandavx512vl
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm_mask_ div_ pd avx512fandavx512vl
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ div_ ps avx512fandavx512vl
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ div_ round_ sd avx512f
- Divide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ div_ round_ ss avx512f
- Divide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ div_ sd avx512f
- Divide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ div_ ss avx512f
- Divide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ dpbf16_ ps avx512bf16andavx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm_mask_ dpbusd_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ dpbusds_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ dpwssd_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ dpwssds_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ expand_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ expand_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ expand_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ expand_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ expand_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ expand_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ fixupimm_ pd avx512fandavx512vl
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_mask_ fixupimm_ ps avx512fandavx512vl
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_mask_ fixupimm_ round_ sd avx512f
- Fix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ fixupimm_ round_ ss avx512f
- Fix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ fixupimm_ sd avx512f
- Fix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
- _mm_mask_ fixupimm_ ss avx512f
- Fix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
- _mm_mask_ fmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ fmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ fmadd_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ fmadd_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ fmaddsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmaddsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ fmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ fmsub_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ fmsub_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ fmsubadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmsubadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fnmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fnmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fnmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ fnmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ fnmadd_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ fnmadd_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ fnmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fnmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ fnmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ fnmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ fnmsub_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ fnmsub_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ fpclass_ pd_ mask avx512dqandavx512vl
- Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ fpclass_ ps_ mask avx512dqandavx512vl
- Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ fpclass_ sd_ mask avx512dq
- Test the lower double-precision (64-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ fpclass_ ss_ mask avx512dq
- Test the lower single-precision (32-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ getexp_ pd avx512fandavx512vl
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_mask_ getexp_ ps avx512fandavx512vl
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_mask_ getexp_ round_ sd avx512f
- Convert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ getexp_ round_ ss avx512f
- Convert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ getexp_ sd avx512f
- Convert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_mask_ getexp_ ss avx512f
- Convert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_mask_ getmant_ pd avx512fandavx512vl
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_mask_ getmant_ ps avx512fandavx512vl
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_mask_ getmant_ round_ sd avx512f
- Normalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ getmant_ round_ ss avx512f
- Normalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ getmant_ sd avx512f
- Normalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ getmant_ ss avx512f
- Normalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ gf2p8affine_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_mask_ gf2p8affineinv_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_mask_ gf2p8mul_ epi8 gfniandavx512bwandavx512vl
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm_mask_ ⚠i32gather_ epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i32gather_ epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i32gather_ pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i32gather_ ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i32scatter_ epi32 avx512fandavx512vl
- Stores 4 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i32scatter_ epi64 avx512fandavx512vl
- Stores 2 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i32scatter_ pd avx512fandavx512vl
- Stores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i32scatter_ ps avx512fandavx512vl
- Stores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64gather_ epi32 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64gather_ epi64 avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64gather_ pd avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64gather_ ps avx2
- Returns values from sliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64scatter_ epi32 avx512fandavx512vl
- Stores 2 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64scatter_ epi64 avx512fandavx512vl
- Stores 2 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64scatter_ pd avx512fandavx512vl
- Stores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64scatter_ ps avx512fandavx512vl
- Stores 2 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding
- _mm_mask_ ⚠load_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ sd avx512f
- Load a double-precision (64-bit) floating-point element from memory into the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and set the upper element of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ ss avx512f
- Load a single-precision (32-bit) floating-point element from memory into the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and set the upper 3 packed elements of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠loadu_ epi8 avx512bwandavx512vl
- Load packed 8-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ epi16 avx512bwandavx512vl
- Load packed 16-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ lzcnt_ epi32 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ lzcnt_ epi64 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ madd52hi_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm_mask_ madd52lo_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm_mask_ madd_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ maddubs_ epi16 avx512bwandavx512vl
- Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ max_ round_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ max_ round_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ max_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ max_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ min_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ min_ round_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ min_ round_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ min_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ min_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ mov_ epi8 avx512bwandavx512vl
- Move packed 8-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mov_ epi16 avx512bwandavx512vl
- Move packed 16-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mov_ epi32 avx512fandavx512vl
- Move packed 32-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mov_ epi64 avx512fandavx512vl
- Move packed 64-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mov_ pd avx512fandavx512vl
- Move packed double-precision (64-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mov_ ps avx512fandavx512vl
- Move packed single-precision (32-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ move_ sd avx512f
- Move the lower double-precision (64-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ move_ ss avx512f
- Move the lower single-precision (32-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ movedup_ pd avx512fandavx512vl
- Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ movehdup_ ps avx512fandavx512vl
- Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ moveldup_ ps avx512fandavx512vl
- Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mul_ epi32 avx512fandavx512vl
- Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mul_ epu32 avx512fandavx512vl
- Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mul_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mul_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mul_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ mul_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ mul_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ mul_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ mulhi_ epi16 avx512bwandavx512vl
- Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mulhi_ epu16 avx512bwandavx512vl
- Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mulhrs_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mullo_ epi16 avx512bwandavx512vl
- Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mullo_ epi32 avx512fandavx512vl
- Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mullo_ epi64 avx512dqandavx512vl
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing writemaskk(elements are copied fromsrcif the corresponding bit is not set).
- _mm_mask_ multishift_ epi64_ epi8 avx512vbmiandavx512vl
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ or_ epi32 avx512fandavx512vl
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ or_ epi64 avx512fandavx512vl
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ or_ pd avx512dqandavx512vl
- Compute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ or_ ps avx512dqandavx512vl
- Compute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ packs_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ packs_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ packus_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ packus_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ permute_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ permute_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ permutevar_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ permutevar_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ permutexvar_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ permutexvar_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ popcnt_ epi8 avx512bitalgandavx512vl
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ popcnt_ epi16 avx512bitalgandavx512vl
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ popcnt_ epi32 avx512vpopcntdqandavx512vl
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ popcnt_ epi64 avx512vpopcntdqandavx512vl
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ range_ pd avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ range_ ps avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ range_ round_ sd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ range_ round_ ss avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ range_ sd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ range_ ss avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ rcp14_ pd avx512fandavx512vl
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ rcp14_ ps avx512fandavx512vl
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ rcp14_ sd avx512f
- Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ rcp14_ ss avx512f
- Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ reduce_ add_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm_mask_ reduce_ add_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm_mask_ reduce_ and_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm_mask_ reduce_ and_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm_mask_ reduce_ max_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ reduce_ max_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ reduce_ max_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ reduce_ max_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ reduce_ min_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ reduce_ min_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ reduce_ min_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ reduce_ min_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ reduce_ mul_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm_mask_ reduce_ mul_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm_mask_ reduce_ or_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm_mask_ reduce_ or_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm_mask_ reduce_ pd avx512dqandavx512vl
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ reduce_ ps avx512dqandavx512vl
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ reduce_ round_ sd avx512dq
- Extract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ reduce_ round_ ss avx512dq
- Extract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ reduce_ sd avx512dq
- Extract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ reduce_ ss avx512dq
- Extract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ rol_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ rol_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ rolv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ rolv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ror_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ror_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ rorv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ rorv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ roundscale_ pd avx512fandavx512vl
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ roundscale_ ps avx512fandavx512vl
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ roundscale_ round_ sd avx512f
- Round the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ roundscale_ round_ ss avx512f
- Round the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ roundscale_ sd avx512f
- Round the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ roundscale_ ss avx512f
- Round the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ rsqrt14_ pd avx512fandavx512vl
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ rsqrt14_ ps avx512fandavx512vl
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ rsqrt14_ sd avx512f
- Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ rsqrt14_ ss avx512f
- Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ scalef_ pd avx512fandavx512vl
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ scalef_ ps avx512fandavx512vl
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ scalef_ round_ sd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ scalef_ round_ ss avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ scalef_ sd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ scalef_ ss avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ set1_ epi8 avx512bwandavx512vl
- Broadcast 8-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ set1_ epi16 avx512bwandavx512vl
- Broadcast 16-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ set1_ epi32 avx512fandavx512vl
- Broadcast 32-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ set1_ epi64 avx512fandavx512vl
- Broadcast 64-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shldi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shldi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shldi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shldv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ shldv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ shldv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ shrdi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shrdi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shrdi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using writemask k (elements are copied from src“ when the corresponding mask bit is not set).
- _mm_mask_ shrdv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ shrdv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ shrdv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ shuffle_ epi8 avx512bwandavx512vl
- Shuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shuffle_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shuffle_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shuffle_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shufflehi_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ shufflelo_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sll_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sll_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sll_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ slli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ slli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ slli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sllv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sllv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sllv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sqrt_ pd avx512fandavx512vl
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sqrt_ ps avx512fandavx512vl
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sqrt_ round_ sd avx512f
- Compute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ sqrt_ round_ ss avx512f
- Compute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ sqrt_ sd avx512f
- Compute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ sqrt_ ss avx512f
- Compute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ sra_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sra_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sra_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srai_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srai_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srai_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srav_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srav_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srav_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srl_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srl_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srl_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srlv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srlv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ srlv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠store_ epi32 avx512fandavx512vl
- Store packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ epi64 avx512fandavx512vl
- Store packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ pd avx512fandavx512vl
- Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ ps avx512fandavx512vl
- Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ sd avx512f
- Store a double-precision (64-bit) floating-point element from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ ss avx512f
- Store a single-precision (32-bit) floating-point element from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠storeu_ epi8 avx512bwandavx512vl
- Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ epi16 avx512bwandavx512vl
- Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ epi32 avx512fandavx512vl
- Store packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ epi64 avx512fandavx512vl
- Store packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ pd avx512fandavx512vl
- Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ ps avx512fandavx512vl
- Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ sub_ epi8 avx512bwandavx512vl
- Subtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sub_ epi16 avx512bwandavx512vl
- Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sub_ epi32 avx512fandavx512vl
- Subtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sub_ epi64 avx512fandavx512vl
- Subtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sub_ pd avx512fandavx512vl
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sub_ ps avx512fandavx512vl
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sub_ round_ sd avx512f
- Subtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ sub_ round_ ss avx512f
- Subtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ sub_ sd avx512f
- Subtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ sub_ ss avx512f
- Subtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ subs_ epi8 avx512bwandavx512vl
- Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ subs_ epi16 avx512bwandavx512vl
- Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ subs_ epu8 avx512bwandavx512vl
- Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ subs_ epu16 avx512bwandavx512vl
- Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ternarylogic_ epi32 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 32-bit granularity (32-bit elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ternarylogic_ epi64 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 64-bit granularity (64-bit elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ test_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ test_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ test_ epi32_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ test_ epi64_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ testn_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ testn_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ testn_ epi32_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ testn_ epi64_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ unpackhi_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpackhi_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpackhi_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpackhi_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpackhi_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpackhi_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpacklo_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpacklo_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpacklo_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpacklo_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpacklo_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ unpacklo_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ xor_ epi32 avx512fandavx512vl
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ xor_ epi64 avx512fandavx512vl
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ xor_ pd avx512dqandavx512vl
- Compute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ xor_ ps avx512dqandavx512vl
- Compute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_maskload_ ⚠epi32 avx2
- Loads packed 32-bit integers from memory pointed by mem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm_maskload_ ⚠epi64 avx2
- Loads packed 64-bit integers from memory pointed by mem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm_maskload_ ⚠pd avx
- Loads packed double-precision (64-bit) floating-point elements from memory
into result using mask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm_maskload_ ⚠ps avx
- Loads packed single-precision (32-bit) floating-point elements from memory
into result using mask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm_maskmoveu_ ⚠si128 sse2
- Conditionally store 8-bit integer elements from ainto memory usingmaskflagged as non-temporal (unlikely to be used again soon).
- _mm_maskstore_ ⚠epi32 avx2
- Stores packed 32-bit integers from ainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm_maskstore_ ⚠epi64 avx2
- Stores packed 64-bit integers from ainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm_maskstore_ ⚠pd avx
- Stores packed double-precision (64-bit) floating-point elements from ainto memory usingmask.
- _mm_maskstore_ ⚠ps avx
- Stores packed single-precision (32-bit) floating-point elements from ainto memory usingmask.
- _mm_maskz_ abs_ epi8 avx512bwandavx512vl
- Compute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ abs_ epi16 avx512bwandavx512vl
- Compute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ abs_ epi32 avx512fandavx512vl
- Compute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ abs_ epi64 avx512fandavx512vl
- Compute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ epi8 avx512bwandavx512vl
- Add packed 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ epi16 avx512bwandavx512vl
- Add packed 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ epi32 avx512fandavx512vl
- Add packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ epi64 avx512fandavx512vl
- Add packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ pd avx512fandavx512vl
- Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ ps avx512fandavx512vl
- Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ round_ sd avx512f
- Add the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ add_ round_ ss avx512f
- Add the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ add_ sd avx512f
- Add the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ add_ ss avx512f
- Add the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ adds_ epi8 avx512bwandavx512vl
- Add packed signed 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ adds_ epi16 avx512bwandavx512vl
- Add packed signed 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ adds_ epu8 avx512bwandavx512vl
- Add packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ adds_ epu16 avx512bwandavx512vl
- Add packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ alignr_ epi8 avx512bwandavx512vl
- Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ alignr_ epi32 avx512fandavx512vl
- Concatenate a and b into a 32-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 16 bytes (4 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ alignr_ epi64 avx512fandavx512vl
- Concatenate a and b into a 32-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 16 bytes (2 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ and_ epi32 avx512fandavx512vl
- Compute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ and_ epi64 avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ and_ pd avx512dqandavx512vl
- Compute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ and_ ps avx512dqandavx512vl
- Compute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ andnot_ epi32 avx512fandavx512vl
- Compute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ andnot_ epi64 avx512fandavx512vl
- Compute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ andnot_ pd avx512dqandavx512vl
- Compute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ andnot_ ps avx512dqandavx512vl
- Compute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ avg_ epu8 avx512bwandavx512vl
- Average packed unsigned 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ avg_ epu16 avx512bwandavx512vl
- Average packed unsigned 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ broadcast_ i32x2 avx512dqandavx512vl
- Broadcasts the lower 2 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ broadcastb_ epi8 avx512bwandavx512vl
- Broadcast the low packed 8-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ broadcastd_ epi32 avx512fandavx512vl
- Broadcast the low packed 32-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ broadcastq_ epi64 avx512fandavx512vl
- Broadcast the low packed 64-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ broadcastss_ ps avx512fandavx512vl
- Broadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ broadcastw_ epi16 avx512bwandavx512vl
- Broadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ compress_ epi8 avx512vbmi2andavx512vl
- Contiguously store the active 8-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ compress_ epi16 avx512vbmi2andavx512vl
- Contiguously store the active 16-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ compress_ epi32 avx512fandavx512vl
- Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ compress_ epi64 avx512fandavx512vl
- Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ compress_ pd avx512fandavx512vl
- Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ compress_ ps avx512fandavx512vl
- Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ conflict_ epi32 avx512cdandavx512vl
- Test each 32-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_maskz_ conflict_ epi64 avx512cdandavx512vl
- Test each 64-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_maskz_ cvt_ roundps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ cvt_ roundsd_ ss avx512f
- Convert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_maskz_ cvt_ roundss_ sd avx512f
- Convert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ cvtepi8_ epi16 avx512bwandavx512vl
- Sign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi8_ epi32 avx512fandavx512vl
- Sign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi8_ epi64 avx512fandavx512vl
- Sign extend packed 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi16_ epi8 avx512bwandavx512vl
- Convert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi16_ epi32 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi16_ epi64 avx512fandavx512vl
- Sign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi32_ epi8 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi32_ epi16 avx512fandavx512vl
- Convert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi32_ epi64 avx512fandavx512vl
- Sign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi32_ pd avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi32_ ps avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi64_ epi8 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi64_ epi16 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi64_ epi32 avx512fandavx512vl
- Convert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi64_ pd avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtepi64_ ps avx512dqandavx512vl
- Convert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtepu8_ epi16 avx512bwandavx512vl
- Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu8_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in th elow 4 bytes of a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu8_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu16_ epi32 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu16_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 16-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu32_ epi64 avx512fandavx512vl
- Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu32_ pd avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu64_ pd avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtepu64_ ps avx512dqandavx512vl
- Convert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtne2ps_ pbh avx512bf16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm_maskz_ cvtneps_ pbh avx512bf16andavx512vl
- Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtpbh_ ps avx512bf16andavx512vl
- Converts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtpd_ ps avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtph_ ps avx512fandavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtps_ epu32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtps_ ph avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ cvtsd_ ss avx512f
- Convert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ cvtsepi16_ epi8 avx512bwandavx512vl
- Convert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtsepi32_ epi8 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtsepi32_ epi16 avx512fandavx512vl
- Convert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm_maskz_ cvtsepi64_ epi8 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtsepi64_ epi16 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtsepi64_ epi32 avx512fandavx512vl
- Convert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtss_ sd avx512f
- Convert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ cvttpd_ epi32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttpd_ epi64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvttpd_ epu32 avx512fandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttpd_ epu64 avx512dqandavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvttps_ epi32 avx512fandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttps_ epi64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvttps_ epu32 avx512fandavx512vl
- Convert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttps_ epu64 avx512dqandavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ cvtusepi16_ epi8 avx512bwandavx512vl
- Convert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtusepi32_ epi8 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtusepi32_ epi16 avx512fandavx512vl
- Convert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtusepi64_ epi8 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtusepi64_ epi16 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtusepi64_ epi32 avx512fandavx512vl
- Convert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ dbsad_ epu8 avx512bwandavx512vl
- Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm_maskz_ div_ pd avx512fandavx512vl
- Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ div_ ps avx512fandavx512vl
- Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ div_ round_ sd avx512f
- Divide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ div_ round_ ss avx512f
- Divide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ div_ sd avx512f
- Divide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ div_ ss avx512f
- Divide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ dpbf16_ ps avx512bf16andavx512vl
- Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm_maskz_ dpbusd_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ dpbusds_ epi32 avx512vnniandavx512vl
- Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ dpwssd_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ dpwssds_ epi32 avx512vnniandavx512vl
- Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ expand_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ expand_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ expand_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ expand_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ expand_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ expand_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi8 avx512vbmi2andavx512vl
- Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi16 avx512vbmi2andavx512vl
- Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi32 avx512fandavx512vl
- Load contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi64 avx512fandavx512vl
- Load contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ pd avx512fandavx512vl
- Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ ps avx512fandavx512vl
- Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fixupimm_ pd avx512fandavx512vl
- Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_maskz_ fixupimm_ ps avx512fandavx512vl
- Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_maskz_ fixupimm_ round_ sd avx512f
- Fix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ fixupimm_ round_ ss avx512f
- Fix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ fixupimm_ sd avx512f
- Fix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
- _mm_maskz_ fixupimm_ ss avx512f
- Fix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
- _mm_maskz_ fmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ fmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ fmadd_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ fmadd_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ fmaddsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmaddsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ fmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ fmsub_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ fmsub_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ fmsubadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmsubadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fnmadd_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fnmadd_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fnmadd_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ fnmadd_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ fnmadd_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ fnmadd_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ fnmsub_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fnmsub_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fnmsub_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ fnmsub_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ fnmsub_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ fnmsub_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ getexp_ pd avx512fandavx512vl
- Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_maskz_ getexp_ ps avx512fandavx512vl
- Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_maskz_ getexp_ round_ sd avx512f
- Convert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ getexp_ round_ ss avx512f
- Convert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ getexp_ sd avx512f
- Convert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_maskz_ getexp_ ss avx512f
- Convert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_maskz_ getmant_ pd avx512fandavx512vl
- Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_maskz_ getmant_ ps avx512fandavx512vl
- Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_maskz_ getmant_ round_ sd avx512f
- Normalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ getmant_ round_ ss avx512f
- Normalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ getmant_ sd avx512f
- Normalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ getmant_ ss avx512f
- Normalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ gf2p8affine_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_maskz_ gf2p8affineinv_ epi64_ epi8 gfniandavx512bwandavx512vl
- Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_maskz_ gf2p8mul_ epi8 gfniandavx512bwandavx512vl
- Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm_maskz_ ⚠load_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ sd avx512f
- Load a double-precision (64-bit) floating-point element from memory into the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and set the upper element of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ ss avx512f
- Load a single-precision (32-bit) floating-point element from memory into the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and set the upper 3 packed elements of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠loadu_ epi8 avx512bwandavx512vl
- Load packed 8-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ epi16 avx512bwandavx512vl
- Load packed 16-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ epi32 avx512fandavx512vl
- Load packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ epi64 avx512fandavx512vl
- Load packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ pd avx512fandavx512vl
- Load packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ ps avx512fandavx512vl
- Load packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ lzcnt_ epi32 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ lzcnt_ epi64 avx512cdandavx512vl
- Counts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ madd52hi_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ madd52lo_ epu64 avx512ifmaandavx512vl
- Multiply packed unsigned 52-bit integers in each 64-bit element of
bandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ madd_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ maddubs_ epi16 avx512bwandavx512vl
- Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ max_ round_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ max_ round_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ max_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ max_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ min_ epi8 avx512bwandavx512vl
- Compare packed signed 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ epi16 avx512bwandavx512vl
- Compare packed signed 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ epi32 avx512fandavx512vl
- Compare packed signed 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ epu8 avx512bwandavx512vl
- Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ epu16 avx512bwandavx512vl
- Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ epu32 avx512fandavx512vl
- Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ pd avx512fandavx512vl
- Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ ps avx512fandavx512vl
- Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ min_ round_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ min_ round_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ min_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ min_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ mov_ epi8 avx512bwandavx512vl
- Move packed 8-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mov_ epi16 avx512bwandavx512vl
- Move packed 16-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mov_ epi32 avx512fandavx512vl
- Move packed 32-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mov_ epi64 avx512fandavx512vl
- Move packed 64-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mov_ pd avx512fandavx512vl
- Move packed double-precision (64-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mov_ ps avx512fandavx512vl
- Move packed single-precision (32-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ move_ sd avx512f
- Move the lower double-precision (64-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ move_ ss avx512f
- Move the lower single-precision (32-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ movedup_ pd avx512fandavx512vl
- Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ movehdup_ ps avx512fandavx512vl
- Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ moveldup_ ps avx512fandavx512vl
- Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mul_ epi32 avx512fandavx512vl
- Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mul_ epu32 avx512fandavx512vl
- Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mul_ pd avx512fandavx512vl
- Multiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mul_ ps avx512fandavx512vl
- Multiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mul_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ mul_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ mul_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ mul_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ mulhi_ epi16 avx512bwandavx512vl
- Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mulhi_ epu16 avx512bwandavx512vl
- Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mulhrs_ epi16 avx512bwandavx512vl
- Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mullo_ epi16 avx512bwandavx512vl
- Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mullo_ epi32 avx512fandavx512vl
- Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mullo_ epi64 avx512dqandavx512vl
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing zeromaskk(elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ multishift_ epi64_ epi8 avx512vbmiandavx512vl
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ or_ epi32 avx512fandavx512vl
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ or_ epi64 avx512fandavx512vl
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ or_ pd avx512dqandavx512vl
- Compute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ or_ ps avx512dqandavx512vl
- Compute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ packs_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ packs_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ packus_ epi16 avx512bwandavx512vl
- Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ packus_ epi32 avx512bwandavx512vl
- Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permute_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permute_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutevar_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutevar_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutexvar_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ permutexvar_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ popcnt_ epi8 avx512bitalgandavx512vl
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ popcnt_ epi16 avx512bitalgandavx512vl
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ popcnt_ epi32 avx512vpopcntdqandavx512vl
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ popcnt_ epi64 avx512vpopcntdqandavx512vl
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ range_ pd avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ range_ ps avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ range_ round_ sd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ range_ round_ ss avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ range_ sd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ range_ ss avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ rcp14_ pd avx512fandavx512vl
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ rcp14_ ps avx512fandavx512vl
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ rcp14_ sd avx512f
- Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ rcp14_ ss avx512f
- Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ reduce_ pd avx512dqandavx512vl
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ reduce_ ps avx512dqandavx512vl
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ reduce_ round_ sd avx512dq
- Extract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ reduce_ round_ ss avx512dq
- Extract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ reduce_ sd avx512dq
- Extract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ reduce_ ss avx512dq
- Extract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ rol_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ rol_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ rolv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ rolv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ror_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ror_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ rorv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ rorv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ roundscale_ pd avx512fandavx512vl
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ roundscale_ ps avx512fandavx512vl
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ roundscale_ round_ sd avx512f
- Round the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ roundscale_ round_ ss avx512f
- Round the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ roundscale_ sd avx512f
- Round the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ roundscale_ ss avx512f
- Round the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ rsqrt14_ pd avx512fandavx512vl
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ rsqrt14_ ps avx512fandavx512vl
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ rsqrt14_ sd avx512f
- Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ rsqrt14_ ss avx512f
- Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ scalef_ pd avx512fandavx512vl
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ scalef_ ps avx512fandavx512vl
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ scalef_ round_ sd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ scalef_ round_ ss avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ scalef_ sd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ scalef_ ss avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ set1_ epi8 avx512bwandavx512vl
- Broadcast 8-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ set1_ epi16 avx512bwandavx512vl
- Broadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ set1_ epi32 avx512fandavx512vl
- Broadcast 32-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ set1_ epi64 avx512fandavx512vl
- Broadcast 64-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shldi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shldi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shldi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shldv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shldv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shldv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shrdi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shrdi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shrdi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shrdv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shrdv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shrdv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shuffle_ epi8 avx512bwandavx512vl
- Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shuffle_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shuffle_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shuffle_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shufflehi_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ shufflelo_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_maskz_ sll_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sll_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sll_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ slli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ slli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ slli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sllv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sllv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sllv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sqrt_ pd avx512fandavx512vl
- Compute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sqrt_ ps avx512fandavx512vl
- Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sqrt_ round_ sd avx512f
- Compute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ sqrt_ round_ ss avx512f
- Compute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ sqrt_ sd avx512f
- Compute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ sqrt_ ss avx512f
- Compute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ sra_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sra_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sra_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srai_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srai_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srai_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srav_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srav_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srav_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srl_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srl_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srl_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srli_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srli_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srli_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srlv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srlv_ epi32 avx512fandavx512vl
- Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ srlv_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ epi8 avx512bwandavx512vl
- Subtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ epi16 avx512bwandavx512vl
- Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ epi32 avx512fandavx512vl
- Subtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ epi64 avx512fandavx512vl
- Subtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ pd avx512fandavx512vl
- Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ ps avx512fandavx512vl
- Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ round_ sd avx512f
- Subtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ sub_ round_ ss avx512f
- Subtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ sub_ sd avx512f
- Subtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ sub_ ss avx512f
- Subtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ subs_ epi8 avx512bwandavx512vl
- Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ subs_ epi16 avx512bwandavx512vl
- Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ subs_ epu8 avx512bwandavx512vl
- Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ subs_ epu16 avx512bwandavx512vl
- Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ternarylogic_ epi32 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ternarylogic_ epi64 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpackhi_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpackhi_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpackhi_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpackhi_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpackhi_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpackhi_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpacklo_ epi8 avx512bwandavx512vl
- Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpacklo_ epi16 avx512bwandavx512vl
- Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpacklo_ epi32 avx512fandavx512vl
- Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpacklo_ epi64 avx512fandavx512vl
- Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpacklo_ pd avx512fandavx512vl
- Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ unpacklo_ ps avx512fandavx512vl
- Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ xor_ epi32 avx512fandavx512vl
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ xor_ epi64 avx512fandavx512vl
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ xor_ pd avx512dqandavx512vl
- Compute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ xor_ ps avx512dqandavx512vl
- Compute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_max_ epi8 sse4.1
- Compares packed 8-bit integers in aandband returns packed maximum values in dst.
- _mm_max_ epi16 sse2
- Compares packed 16-bit integers in aandb, and returns the packed maximum values.
- _mm_max_ epi32 sse4.1
- Compares packed 32-bit integers in aandb, and returns packed maximum values.
- _mm_max_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed maximum values in dst.
- _mm_max_ epu8 sse2
- Compares packed unsigned 8-bit integers in aandb, and returns the packed maximum values.
- _mm_max_ epu16 sse4.1
- Compares packed unsigned 16-bit integers in aandb, and returns packed maximum.
- _mm_max_ epu32 sse4.1
- Compares packed unsigned 32-bit integers in aandb, and returns packed maximum values.
- _mm_max_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst.
- _mm_max_ pd sse2
- Returns a new vector with the maximum values from corresponding elements in
aandb.
- _mm_max_ ps sse
- Compares packed single-precision (32-bit) floating-point elements in aandb, and return the corresponding maximum values.
- _mm_max_ round_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_max_ round_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_max_ sd sse2
- Returns a new vector with the low element of areplaced by the maximum of the lower elements ofaandb.
- _mm_max_ ss sse
- Compares the first single-precision (32-bit) floating-point element of aandb, and return the maximum value in the first element of the return value, the other elements are copied froma.
- _mm_mfence ⚠sse2
- Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction.
- _mm_min_ epi8 sse4.1
- Compares packed 8-bit integers in aandband returns packed minimum values in dst.
- _mm_min_ epi16 sse2
- Compares packed 16-bit integers in aandb, and returns the packed minimum values.
- _mm_min_ epi32 sse4.1
- Compares packed 32-bit integers in aandb, and returns packed minimum values.
- _mm_min_ epi64 avx512fandavx512vl
- Compare packed signed 64-bit integers in a and b, and store packed minimum values in dst.
- _mm_min_ epu8 sse2
- Compares packed unsigned 8-bit integers in aandb, and returns the packed minimum values.
- _mm_min_ epu16 sse4.1
- Compares packed unsigned 16-bit integers in aandb, and returns packed minimum.
- _mm_min_ epu32 sse4.1
- Compares packed unsigned 32-bit integers in aandb, and returns packed minimum values.
- _mm_min_ epu64 avx512fandavx512vl
- Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst.
- _mm_min_ pd sse2
- Returns a new vector with the minimum values from corresponding elements in
aandb.
- _mm_min_ ps sse
- Compares packed single-precision (32-bit) floating-point elements in aandb, and return the corresponding minimum values.
- _mm_min_ round_ sd avx512f
- Compare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst , and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_min_ round_ ss avx512f
- Compare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_min_ sd sse2
- Returns a new vector with the low element of areplaced by the minimum of the lower elements ofaandb.
- _mm_min_ ss sse
- Compares the first single-precision (32-bit) floating-point element of aandb, and return the minimum value in the first element of the return value, the other elements are copied froma.
- _mm_minpos_ epu16 sse4.1
- Finds the minimum unsigned 16-bit element in the 128-bit __m128i vector, returning a vector containing its value in its first position, and its index in its second position; all other elements are set to zero.
- _mm_mmask_ ⚠i32gather_ epi32 avx512fandavx512vl
- Loads 4 32-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i32gather_ epi64 avx512fandavx512vl
- Loads 2 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i32gather_ pd avx512fandavx512vl
- Loads 2 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i32gather_ ps avx512fandavx512vl
- Loads 4 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ epi32 avx512fandavx512vl
- Loads 2 32-bit integer elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ epi64 avx512fandavx512vl
- Loads 2 64-bit integer elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ pd avx512fandavx512vl
- Loads 2 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ ps avx512fandavx512vl
- Loads 2 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_move_ epi64 sse2
- Returns a vector where the low element is extracted from aand its upper element is zero.
- _mm_move_ sd sse2
- Constructs a 128-bit floating-point vector of [2 x double]. The lower 64 bits are set to the lower 64 bits of the second parameter. The upper 64 bits are set to the upper 64 bits of the first parameter.
- _mm_move_ ss sse
- Returns a __m128with the first component fromband the remaining components froma.
- _mm_movedup_ pd sse3
- Duplicate the low double-precision (64-bit) floating-point element
from a.
- _mm_movehdup_ ps sse3
- Duplicate odd-indexed single-precision (32-bit) floating-point elements
from a.
- _mm_movehl_ ps sse
- Combine higher half of aandb. The higher half ofboccupies the lower half of result.
- _mm_moveldup_ ps sse3
- Duplicate even-indexed single-precision (32-bit) floating-point elements
from a.
- _mm_movelh_ ps sse
- Combine lower half of aandb. The lower half ofboccupies the higher half of result.
- _mm_movemask_ epi8 sse2
- Returns a mask of the most significant bit of each element in a.
- _mm_movemask_ pd sse2
- Returns a mask of the most significant bit of each element in a.
- _mm_movemask_ ps sse
- Returns a mask of the most significant bit of each element in a.
- _mm_movepi8_ mask avx512bwandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 8-bit integer in a.
- _mm_movepi16_ mask avx512bwandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 16-bit integer in a.
- _mm_movepi32_ mask avx512dqandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 32-bit integer in a.
- _mm_movepi64_ mask avx512dqandavx512vl
- Set each bit of mask register k based on the most significant bit of the corresponding packed 64-bit integer in a.
- _mm_movm_ epi8 avx512bwandavx512vl
- Set each packed 8-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_movm_ epi16 avx512bwandavx512vl
- Set each packed 16-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_movm_ epi32 avx512dqandavx512vl
- Set each packed 32-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_movm_ epi64 avx512dqandavx512vl
- Set each packed 64-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_mpsadbw_ epu8 sse4.1
- Subtracts 8-bit unsigned integer values and computes the absolute values of the differences to the corresponding bits in the destination. Then sums of the absolute differences are returned according to the bit fields in the immediate operand.
- _mm_mul_ epi32 sse4.1
- Multiplies the low 32-bit integers from each packed 64-bit
element in aandb, and returns the signed 64-bit result.
- _mm_mul_ epu32 sse2
- Multiplies the low unsigned 32-bit integers from each packed 64-bit element
in aandb.
- _mm_mul_ pd sse2
- Multiplies packed double-precision (64-bit) floating-point elements in aandb.
- _mm_mul_ ps sse
- Multiplies packed single-precision (32-bit) floating-point elements in aandb.
- _mm_mul_ round_ sd avx512f
- Multiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_mul_ round_ ss avx512f
- Multiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mul_ sd sse2
- Returns a new vector with the low element of areplaced by multiplying the low elements ofaandb.
- _mm_mul_ ss sse
- Multiplies the first component of aandb, the other components are copied froma.
- _mm_mulhi_ epi16 sse2
- Multiplies the packed 16-bit integers in aandb.
- _mm_mulhi_ epu16 sse2
- Multiplies the packed unsigned 16-bit integers in aandb.
- _mm_mulhrs_ epi16 ssse3
- Multiplies packed 16-bit signed integer values, truncate the 32-bit
product to the 18 most significant bits by right-shifting, round the
truncated value by adding 1, and write bits [16:1]to the destination.
- _mm_mullo_ epi16 sse2
- Multiplies the packed 16-bit integers in aandb.
- _mm_mullo_ epi32 sse4.1
- Multiplies the packed 32-bit integers in aandb, producing intermediate 64-bit integers, and returns the lowest 32-bit, whatever they might be, reinterpreted as a signed integer. Whilepmulld __m128i::splat(2), __m128i::splat(2)returns the obvious__m128i::splat(4), due to wrapping arithmeticpmulld __m128i::splat(i32::MAX), __m128i::splat(2)would return a negative number.
- _mm_mullo_ epi64 avx512dqandavx512vl
- Multiply packed 64-bit integers in aandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indst.
- _mm_multishift_ epi64_ epi8 avx512vbmiandavx512vl
- For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst.
- _mm_or_ epi32 avx512fandavx512vl
- Compute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst.
- _mm_or_ epi64 avx512fandavx512vl
- Compute the bitwise OR of packed 64-bit integers in a and b, and store the resut in dst.
- _mm_or_ pd sse2
- Computes the bitwise OR of aandb.
- _mm_or_ ps sse
- Bitwise OR of packed single-precision (32-bit) floating-point elements.
- _mm_or_ si128 sse2
- Computes the bitwise OR of 128 bits (representing integer data) in aandb.
- _mm_packs_ epi16 sse2
- Converts packed 16-bit integers from aandbto packed 8-bit integers using signed saturation.
- _mm_packs_ epi32 sse2
- Converts packed 32-bit integers from aandbto packed 16-bit integers using signed saturation.
- _mm_packus_ epi16 sse2
- Converts packed 16-bit integers from aandbto packed 8-bit integers using unsigned saturation.
- _mm_packus_ epi32 sse4.1
- Converts packed 32-bit integers from aandbto packed 16-bit integers using unsigned saturation
- _mm_pause ⚠
- Provides a hint to the processor that the code sequence is a spin-wait loop.
- _mm_permute_ pd avx
- Shuffles double-precision (64-bit) floating-point elements in ausing the control inimm8.
- _mm_permute_ ps avx
- Shuffles single-precision (32-bit) floating-point elements in ausing the control inimm8.
- _mm_permutevar_ pd avx
- Shuffles double-precision (64-bit) floating-point elements in ausing the control inb.
- _mm_permutevar_ ps avx
- Shuffles single-precision (32-bit) floating-point elements in ausing the control inb.
- _mm_permutex2var_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ epi32 avx512fandavx512vl
- Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ epi64 avx512fandavx512vl
- Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ pd avx512fandavx512vl
- Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ ps avx512fandavx512vl
- Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutexvar_ epi8 avx512vbmiandavx512vl
- Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm_permutexvar_ epi16 avx512bwandavx512vl
- Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm_popcnt_ epi8 avx512bitalgandavx512vl
- For each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm_popcnt_ epi16 avx512bitalgandavx512vl
- For each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm_popcnt_ epi32 avx512vpopcntdqandavx512vl
- For each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm_popcnt_ epi64 avx512vpopcntdqandavx512vl
- For each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm_prefetch ⚠sse
- Fetch the cache line that contains address pusing the givenSTRATEGY.
- _mm_range_ pd avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_range_ ps avx512dqandavx512vl
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_range_ round_ sd avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_range_ round_ ss avx512dq
- Calculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_rcp14_ pd avx512fandavx512vl
- Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp14_ ps avx512fandavx512vl
- Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp14_ sd avx512f
- Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp14_ ss avx512f
- Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp_ ps sse
- Returns the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in a.
- _mm_rcp_ ss sse
- Returns the approximate reciprocal of the first single-precision
(32-bit) floating-point element in a, the other elements are unchanged.
- _mm_reduce_ add_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by addition. Returns the sum of all elements in a.
- _mm_reduce_ add_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by addition. Returns the sum of all elements in a.
- _mm_reduce_ and_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm_reduce_ and_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm_reduce_ max_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ max_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ max_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ max_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ min_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ min_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ min_ epu8 avx512bwandavx512vl
- Reduce the packed unsigned 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ min_ epu16 avx512bwandavx512vl
- Reduce the packed unsigned 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ mul_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm_reduce_ mul_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm_reduce_ or_ epi8 avx512bwandavx512vl
- Reduce the packed 8-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm_reduce_ or_ epi16 avx512bwandavx512vl
- Reduce the packed 16-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm_reduce_ pd avx512dqandavx512vl
- Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ ps avx512dqandavx512vl
- Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ round_ sd avx512dq
- Extract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ round_ ss avx512dq
- Extract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ sd avx512dq
- Extract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using, and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ ss avx512dq
- Extract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_rol_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm_rol_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm_rolv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_rolv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_ror_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm_ror_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm_rorv_ epi32 avx512fandavx512vl
- Rotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_rorv_ epi64 avx512fandavx512vl
- Rotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_round_ pd sse4.1
- Round the packed double-precision (64-bit) floating-point elements in ausing theROUNDINGparameter, and stores the results as packed double-precision floating-point elements. Rounding is done according to the rounding parameter, which can be one of:
- _mm_round_ ps sse4.1
- Round the packed single-precision (32-bit) floating-point elements in ausing theROUNDINGparameter, and stores the results as packed single-precision floating-point elements. Rounding is done according to the rounding parameter, which can be one of:
- _mm_round_ sd sse4.1
- Round the lower double-precision (64-bit) floating-point element in busing theROUNDINGparameter, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copies the upper element fromato the upper element of the intrinsic result. Rounding is done according to the rounding parameter, which can be one of:
- _mm_round_ ss sse4.1
- Round the lower single-precision (32-bit) floating-point element in busing theROUNDINGparameter, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copies the upper 3 packed elements fromato the upper elements of the intrinsic result. Rounding is done according to the rounding parameter, which can be one of:
- _mm_roundscale_ pd avx512fandavx512vl
- Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ ps avx512fandavx512vl
- Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ round_ sd avx512f
- Round the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ round_ ss avx512f
- Round the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ sd avx512f
- Round the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ ss avx512f
- Round the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_rsqrt14_ pd avx512fandavx512vl
- Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt14_ ps avx512fandavx512vl
- Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt14_ sd avx512f
- Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt14_ ss avx512f
- Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt_ ps sse
- Returns the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in a.
- _mm_rsqrt_ ss sse
- Returns the approximate reciprocal square root of the first single-precision
(32-bit) floating-point element in a, the other elements are unchanged.
- _mm_sad_ epu8 sse2
- Sum the absolute differences of packed unsigned 8-bit integers.
- _mm_scalef_ pd avx512fandavx512vl
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_scalef_ ps avx512fandavx512vl
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_scalef_ round_ sd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_scalef_ round_ ss avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_scalef_ sd avx512f
- Scale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_scalef_ ss avx512f
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_set1_ epi8 sse2
- Broadcasts 8-bit integer ato all elements.
- _mm_set1_ epi16 sse2
- Broadcasts 16-bit integer ato all elements.
- _mm_set1_ epi32 sse2
- Broadcasts 32-bit integer ato all elements.
- _mm_set1_ epi64x sse2
- Broadcasts 64-bit integer ato all elements.
- _mm_set1_ pd sse2
- Broadcasts double-precision (64-bit) floating-point value a to all elements of the return value.
- _mm_set1_ ps sse
- Construct a __m128with all element set toa.
- _mm_set_ epi8 sse2
- Sets packed 8-bit integers with the supplied values.
- _mm_set_ epi16 sse2
- Sets packed 16-bit integers with the supplied values.
- _mm_set_ epi32 sse2
- Sets packed 32-bit integers with the supplied values.
- _mm_set_ epi64x sse2
- Sets packed 64-bit integers with the supplied values, from highest to lowest.
- _mm_set_ pd sse2
- Sets packed double-precision (64-bit) floating-point elements in the return value with the supplied values.
- _mm_set_ pd1 sse2
- Broadcasts double-precision (64-bit) floating-point value a to all elements of the return value.
- _mm_set_ ps sse
- Construct a __m128from four floating point values highest to lowest.
- _mm_set_ ps1 sse
- Alias for _mm_set1_ps
- _mm_set_ sd sse2
- Copies double-precision (64-bit) floating-point element ato the lower element of the packed 64-bit return value.
- _mm_set_ ss sse
- Construct a __m128with the lowest element set toaand the rest set to zero.
- _mm_setcsr ⚠Deprecated sse
- Sets the MXCSR register with the 32-bit unsigned integer value.
- _mm_setr_ epi8 sse2
- Sets packed 8-bit integers with the supplied values in reverse order.
- _mm_setr_ epi16 sse2
- Sets packed 16-bit integers with the supplied values in reverse order.
- _mm_setr_ epi32 sse2
- Sets packed 32-bit integers with the supplied values in reverse order.
- _mm_setr_ pd sse2
- Sets packed double-precision (64-bit) floating-point elements in the return value with the supplied values in reverse order.
- _mm_setr_ ps sse
- Construct a __m128from four floating point values lowest to highest.
- _mm_setzero_ pd sse2
- Returns packed double-precision (64-bit) floating-point elements with all zeros.
- _mm_setzero_ ps sse
- Construct a __m128with all elements initialized to zero.
- _mm_setzero_ si128 sse2
- Returns a vector with all elements set to zero.
- _mm_sfence ⚠sse
- Performs a serializing operation on all non-temporal (“streaming”) store instructions that were issued by the current thread prior to this instruction.
- _mm_sha1msg1_ epu32 sha
- Performs an intermediate calculation for the next four SHA1 message values
(unsigned 32-bit integers) using previous message values from aandb, and returning the result.
- _mm_sha1msg2_ epu32 sha
- Performs the final calculation for the next four SHA1 message values
(unsigned 32-bit integers) using the intermediate result in aand the previous message values inb, and returns the result.
- _mm_sha1nexte_ epu32 sha
- Calculate SHA1 state variable E after four rounds of operation from the
current SHA1 state variable a, add that value to the scheduled values (unsigned 32-bit integers) inb, and returns the result.
- _mm_sha1rnds4_ epu32 sha
- Performs four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D)
from aand some pre-computed sum of the next 4 round message values (unsigned 32-bit integers), and state variable E fromb, and return the updated SHA1 state (A,B,C,D).FUNCcontains the logic functions and round constants.
- _mm_sha256msg1_ epu32 sha
- Performs an intermediate calculation for the next four SHA256 message values
(unsigned 32-bit integers) using previous message values from aandb, and return the result.
- _mm_sha256msg2_ epu32 sha
- Performs the final calculation for the next four SHA256 message values
(unsigned 32-bit integers) using previous message values from aandb, and return the result.
- _mm_sha256rnds2_ epu32 sha
- Performs 2 rounds of SHA256 operation using an initial SHA256 state
(C,D,G,H) from a, an initial SHA256 state (A,B,E,F) fromb, and a pre-computed sum of the next 2 round message values (unsigned 32-bit integers) and the corresponding round constants fromk, and store the updated SHA256 state (A,B,E,F) in dst.
- _mm_shldi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst).
- _mm_shldi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst.
- _mm_shldi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst).
- _mm_shldv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst.
- _mm_shldv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst.
- _mm_shldv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst.
- _mm_shrdi_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst.
- _mm_shrdi_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst.
- _mm_shrdi_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst.
- _mm_shrdv_ epi16 avx512vbmi2andavx512vl
- Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst.
- _mm_shrdv_ epi32 avx512vbmi2andavx512vl
- Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.
- _mm_shrdv_ epi64 avx512vbmi2andavx512vl
- Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst.
- _mm_shuffle_ epi8 ssse3
- Shuffles bytes from aaccording to the content ofb.
- _mm_shuffle_ epi32 sse2
- Shuffles 32-bit integers in ausing the control inIMM8.
- _mm_shuffle_ pd sse2
- Constructs a 128-bit floating-point vector of [2 x double]from two 128-bit vector parameters of[2 x double], using the immediate-value parameter as a specifier.
- _mm_shuffle_ ps sse
- Shuffles packed single-precision (32-bit) floating-point elements in aandbusingMASK.
- _mm_shufflehi_ epi16 sse2
- Shuffles 16-bit integers in the high 64 bits of ausing the control inIMM8.
- _mm_shufflelo_ epi16 sse2
- Shuffles 16-bit integers in the low 64 bits of ausing the control inIMM8.
- _mm_sign_ epi8 ssse3
- Negates packed 8-bit integers in awhen the corresponding signed 8-bit integer inbis negative, and returns the result. Elements in result are zeroed out when the corresponding element inbis zero.
- _mm_sign_ epi16 ssse3
- Negates packed 16-bit integers in awhen the corresponding signed 16-bit integer inbis negative, and returns the results. Elements in result are zeroed out when the corresponding element inbis zero.
- _mm_sign_ epi32 ssse3
- Negates packed 32-bit integers in awhen the corresponding signed 32-bit integer inbis negative, and returns the results. Element in result are zeroed out when the corresponding element inbis zero.
- _mm_sll_ epi16 sse2
- Shifts packed 16-bit integers in aleft bycountwhile shifting in zeros.
- _mm_sll_ epi32 sse2
- Shifts packed 32-bit integers in aleft bycountwhile shifting in zeros.
- _mm_sll_ epi64 sse2
- Shifts packed 64-bit integers in aleft bycountwhile shifting in zeros.
- _mm_slli_ epi16 sse2
- Shifts packed 16-bit integers in aleft byIMM8while shifting in zeros.
- _mm_slli_ epi32 sse2
- Shifts packed 32-bit integers in aleft byIMM8while shifting in zeros.
- _mm_slli_ epi64 sse2
- Shifts packed 64-bit integers in aleft byIMM8while shifting in zeros.
- _mm_slli_ si128 sse2
- Shifts aleft byIMM8bytes while shifting in zeros.
- _mm_sllv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm_sllv_ epi32 avx2
- Shifts packed 32-bit integers in aleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm_sllv_ epi64 avx2
- Shifts packed 64-bit integers in aleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm_sm3msg1_ epi32 sm3andavx
- This is one of the two SM3 message scheduling intrinsics. The intrinsic performs an initial calculation for the next four SM3 message words. The calculated results are stored in dst.
- _mm_sm3msg2_ epi32 sm3andavx
- This is one of the two SM3 message scheduling intrinsics. The intrinsic performs the final calculation for the next four SM3 message words. The calculated results are stored in dst.
- _mm_sm3rnds2_ epi32 sm3andavx
- The intrinsic performs two rounds of SM3 operation using initial SM3 state (C, D, G, H)froma, an initial SM3 states(A, B, E, F)fromband a pre-computed words from thec.awith initial SM3 state of(C, D, G, H)assumes input of non-rotated left variables from previous state. The updated SM3 state(A, B, E, F)is written toa. Theimm8should contain the even round number for the first of the two rounds computed by this instruction. The computation masks theimm8value by ANDing it with0x3Eso that only even round numbers from 0 through 62 are used for this operation. The calculated results are stored in dst.
- _mm_sm4key4_ epi32 sm4andavx
- This intrinsic performs four rounds of SM4 key expansion. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in dst.
- _mm_sm4rnds4_ epi32 sm4andavx
- This intrinsic performs four rounds of SM4 encryption. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in dst.
- _mm_sqrt_ pd sse2
- Returns a new vector with the square root of each of the values in a.
- _mm_sqrt_ ps sse
- Returns the square root of packed single-precision (32-bit) floating-point
elements in a.
- _mm_sqrt_ round_ sd avx512f
- Compute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_sqrt_ round_ ss avx512f
- Compute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_sqrt_ sd sse2
- Returns a new vector with the low element of areplaced by the square root of the lower elementb.
- _mm_sqrt_ ss sse
- Returns the square root of the first single-precision (32-bit)
floating-point element in a, the other elements are unchanged.
- _mm_sra_ epi16 sse2
- Shifts packed 16-bit integers in aright bycountwhile shifting in sign bits.
- _mm_sra_ epi32 sse2
- Shifts packed 32-bit integers in aright bycountwhile shifting in sign bits.
- _mm_sra_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm_srai_ epi16 sse2
- Shifts packed 16-bit integers in aright byIMM8while shifting in sign bits.
- _mm_srai_ epi32 sse2
- Shifts packed 32-bit integers in aright byIMM8while shifting in sign bits.
- _mm_srai_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm_srav_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm_srav_ epi32 avx2
- Shifts packed 32-bit integers in aright by the amount specified by the corresponding element incountwhile shifting in sign bits.
- _mm_srav_ epi64 avx512fandavx512vl
- Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm_srl_ epi16 sse2
- Shifts packed 16-bit integers in aright bycountwhile shifting in zeros.
- _mm_srl_ epi32 sse2
- Shifts packed 32-bit integers in aright bycountwhile shifting in zeros.
- _mm_srl_ epi64 sse2
- Shifts packed 64-bit integers in aright bycountwhile shifting in zeros.
- _mm_srli_ epi16 sse2
- Shifts packed 16-bit integers in aright byIMM8while shifting in zeros.
- _mm_srli_ epi32 sse2
- Shifts packed 32-bit integers in aright byIMM8while shifting in zeros.
- _mm_srli_ epi64 sse2
- Shifts packed 64-bit integers in aright byIMM8while shifting in zeros.
- _mm_srli_ si128 sse2
- Shifts aright byIMM8bytes while shifting in zeros.
- _mm_srlv_ epi16 avx512bwandavx512vl
- Shift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm_srlv_ epi32 avx2
- Shifts packed 32-bit integers in aright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm_srlv_ epi64 avx2
- Shifts packed 64-bit integers in aright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm_store1_ ⚠pd sse2
- Stores the lower double-precision (64-bit) floating-point element from ainto 2 contiguous elements in memory.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store1_ ⚠ps sse
- Stores the lowest 32 bit float of arepeated four times into aligned memory.
- _mm_store_ ⚠epi32 avx512fandavx512vl
- Store 128-bits (composed of 4 packed 32-bit integers) from a into memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠epi64 avx512fandavx512vl
- Store 128-bits (composed of 2 packed 64-bit integers) from a into memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠pd sse2
- Stores 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from ainto memory.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠pd1 sse2
- Stores the lower double-precision (64-bit) floating-point element from ainto 2 contiguous elements in memory.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠ps sse
- Stores four 32-bit floats into aligned memory.
- _mm_store_ ⚠ps1 sse
- Alias for _mm_store1_ps
- _mm_store_ ⚠sd sse2
- Stores the lower 64 bits of a 128-bit vector of [2 x double]to a memory location.
- _mm_store_ ⚠si128 sse2
- Stores 128-bits of integer data from ainto memory.
- _mm_store_ ⚠ss sse
- Stores the lowest 32 bit float of ainto memory.
- _mm_storeh_ ⚠pd sse2
- Stores the upper 64 bits of a 128-bit vector of [2 x double]to a memory location.
- _mm_storel_ ⚠epi64 sse2
- Stores the lower 64-bit integer ato a memory location.
- _mm_storel_ ⚠pd sse2
- Stores the lower 64 bits of a 128-bit vector of [2 x double]to a memory location.
- _mm_storer_ ⚠pd sse2
- Stores 2 double-precision (64-bit) floating-point elements from ainto memory in reverse order.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_storer_ ⚠ps sse
- Stores four 32-bit floats into aligned memory in reverse order.
- _mm_storeu_ ⚠epi8 avx512bwandavx512vl
- Store 128-bits (composed of 16 packed 8-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠epi16 avx512bwandavx512vl
- Store 128-bits (composed of 8 packed 16-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠epi32 avx512fandavx512vl
- Store 128-bits (composed of 4 packed 32-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠epi64 avx512fandavx512vl
- Store 128-bits (composed of 2 packed 64-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠pd sse2
- Stores 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from ainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠ps sse
- Stores four 32-bit floats into memory. There are no restrictions on memory
alignment. For aligned memory _mm_store_psmay be faster.
- _mm_storeu_ ⚠si16 sse2
- Store 16-bit integer from the first element of a into memory.
- _mm_storeu_ ⚠si32 sse2
- Store 32-bit integer from the first element of a into memory.
- _mm_storeu_ ⚠si64 sse2
- Store 64-bit integer from the first element of a into memory.
- _mm_storeu_ ⚠si128 sse2
- Stores 128-bits of integer data from ainto memory.
- _mm_stream_ ⚠load_ si128 sse4.1
- Load 128-bits of integer data from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm_stream_ ⚠pd sse2
- Stores a 128-bit floating point vector of [2 x double]to a 128-bit aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm_stream_ ⚠ps sse
- Stores ainto the memory atmem_addrusing a non-temporal memory hint.
- _mm_stream_ ⚠sd sse4a
- Non-temporal store of a.0intop.
- _mm_stream_ ⚠si32 sse2
- Stores a 32-bit integer value in the specified memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm_stream_ ⚠si128 sse2
- Stores a 128-bit integer vector to a 128-bit aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm_stream_ ⚠ss sse4a
- Non-temporal store of a.0intop.
- _mm_sub_ epi8 sse2
- Subtracts packed 8-bit integers in bfrom packed 8-bit integers ina.
- _mm_sub_ epi16 sse2
- Subtracts packed 16-bit integers in bfrom packed 16-bit integers ina.
- _mm_sub_ epi32 sse2
- Subtract packed 32-bit integers in bfrom packed 32-bit integers ina.
- _mm_sub_ epi64 sse2
- Subtract packed 64-bit integers in bfrom packed 64-bit integers ina.
- _mm_sub_ pd sse2
- Subtract packed double-precision (64-bit) floating-point elements in bfroma.
- _mm_sub_ ps sse
- Subtracts packed single-precision (32-bit) floating-point elements in aandb.
- _mm_sub_ round_ sd avx512f
- Subtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_sub_ round_ ss avx512f
- Subtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_sub_ sd sse2
- Returns a new vector with the low element of areplaced by subtracting the low element bybfrom the low element ofa.
- _mm_sub_ ss sse
- Subtracts the first component of bfroma, the other components are copied froma.
- _mm_subs_ epi8 sse2
- Subtract packed 8-bit integers in bfrom packed 8-bit integers inausing saturation.
- _mm_subs_ epi16 sse2
- Subtract packed 16-bit integers in bfrom packed 16-bit integers inausing saturation.
- _mm_subs_ epu8 sse2
- Subtract packed unsigned 8-bit integers in bfrom packed unsigned 8-bit integers inausing saturation.
- _mm_subs_ epu16 sse2
- Subtract packed unsigned 16-bit integers in bfrom packed unsigned 16-bit integers inausing saturation.
- _mm_ternarylogic_ epi32 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm_ternarylogic_ epi64 avx512fandavx512vl
- Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm_test_ all_ ones sse4.1
- Tests whether the specified bits in a128-bit integer vector are all ones.
- _mm_test_ all_ zeros sse4.1
- Tests whether the specified bits in a 128-bit integer vector are all zeros.
- _mm_test_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_test_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_test_ epi32_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_test_ epi64_ mask avx512fandavx512vl
- Compute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_test_ mix_ ones_ zeros sse4.1
- Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones.
- _mm_testc_ pd avx
- Computes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in aandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm_testc_ ps avx
- Computes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in aandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm_testc_ si128 sse4.1
- Tests whether the specified bits in a 128-bit integer vector are all ones.
- _mm_testn_ epi8_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_testn_ epi16_ mask avx512bwandavx512vl
- Compute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_testn_ epi32_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_testn_ epi64_ mask avx512fandavx512vl
- Compute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_testnzc_ pd avx
- Computes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in aandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm_testnzc_ ps avx
- Computes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in aandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm_testnzc_ si128 sse4.1
- Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones.
- _mm_testz_ pd avx
- Computes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in aandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm_testz_ ps avx
- Computes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in aandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm_testz_ si128 sse4.1
- Tests whether the specified bits in a 128-bit integer vector are all zeros.
- _mm_tzcnt_ 32 bmi1
- Counts the number of trailing least significant zero bits.
- _mm_ucomieq_ sd sse2
- Compares the lower element of aandbfor equality.
- _mm_ucomieq_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if they are equal, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomige_ sd sse2
- Compares the lower element of aandbfor greater-than-or-equal.
- _mm_ucomige_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais greater than or equal to the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomigt_ sd sse2
- Compares the lower element of aandbfor greater-than.
- _mm_ucomigt_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais greater than the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomile_ sd sse2
- Compares the lower element of aandbfor less-than-or-equal.
- _mm_ucomile_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais less than or equal to the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomilt_ sd sse2
- Compares the lower element of aandbfor less-than.
- _mm_ucomilt_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if the value fromais less than the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomineq_ sd sse2
- Compares the lower element of aandbfor not-equal.
- _mm_ucomineq_ ss sse
- Compares two 32-bit floats from the low-order bits of aandb. Returns1if they are not equal, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_undefined_ pd sse2
- Returns vector of type __m128d with indeterminate elements.with indetermination elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm_undefined_ ps sse
- Returns vector of type __m128 with indeterminate elements.with indetermination elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm_undefined_ si128 sse2
- Returns vector of type __m128i with indeterminate elements.with indetermination elements.
Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically
picks some valid value and is not equivalent to mem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm_unpackhi_ epi8 sse2
- Unpacks and interleave 8-bit integers from the high half of aandb.
- _mm_unpackhi_ epi16 sse2
- Unpacks and interleave 16-bit integers from the high half of aandb.
- _mm_unpackhi_ epi32 sse2
- Unpacks and interleave 32-bit integers from the high half of aandb.
- _mm_unpackhi_ epi64 sse2
- Unpacks and interleave 64-bit integers from the high half of aandb.
- _mm_unpackhi_ pd sse2
- The resulting __m128delement is composed by the low-order values of the two__m128dinterleaved input elements, i.e.:
- _mm_unpackhi_ ps sse
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the higher half of aandb.
- _mm_unpacklo_ epi8 sse2
- Unpacks and interleave 8-bit integers from the low half of aandb.
- _mm_unpacklo_ epi16 sse2
- Unpacks and interleave 16-bit integers from the low half of aandb.
- _mm_unpacklo_ epi32 sse2
- Unpacks and interleave 32-bit integers from the low half of aandb.
- _mm_unpacklo_ epi64 sse2
- Unpacks and interleave 64-bit integers from the low half of aandb.
- _mm_unpacklo_ pd sse2
- The resulting __m128delement is composed by the high-order values of the two__m128dinterleaved input elements, i.e.:
- _mm_unpacklo_ ps sse
- Unpacks and interleave single-precision (32-bit) floating-point elements
from the lower half of aandb.
- _mm_xor_ epi32 avx512fandavx512vl
- Compute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst.
- _mm_xor_ epi64 avx512fandavx512vl
- Compute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst.
- _mm_xor_ pd sse2
- Computes the bitwise XOR of aandb.
- _mm_xor_ ps sse
- Bitwise exclusive OR of packed single-precision (32-bit) floating-point elements.
- _mm_xor_ si128 sse2
- Computes the bitwise XOR of 128 bits (representing integer data) in aandb.
- _mulx_u32 bmi2
- Unsigned multiply without affecting flags.
- _pdep_u32 bmi2
- Scatter contiguous low order bits of ato the result at the positions specified by themask.
- _pext_u32 bmi2
- Gathers the bits of xspecified by themaskinto the contiguous low order bit positions of the result.
- _popcnt32popcnt
- Counts the bits that are set.
- _rdrand16_step ⚠rdrand
- Read a hardware generated 16-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise.
- _rdrand32_step ⚠rdrand
- Read a hardware generated 32-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise.
- _rdseed16_step ⚠rdseed
- Read a 16-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise.
- _rdseed32_step ⚠rdseed
- Read a 32-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise.
- _rdtsc⚠
- Reads the current value of the processor’s time-stamp counter.
- _store_mask8 ⚠avx512dq
- Store 8-bit mask to memory
- _store_mask16 ⚠avx512f
- Store 16-bit mask to memory
- _store_mask32 ⚠avx512bw
- Store 32-bit mask from a into memory.
- _store_mask64 ⚠avx512bw
- Store 64-bit mask from a into memory.
- _subborrow_u32 ⚠
- Adds unsigned 32-bit integers aandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 32-bit result inout, and the carry-out is returned (carry or overflow flag).
- _t1mskc_u32 ⚠tbm
- Clears all bits below the least significant zero of xand sets all other bits.
- _tzcnt_u16 bmi1
- Counts the number of trailing least significant zero bits.
- _tzcnt_u32 bmi1
- Counts the number of trailing least significant zero bits.
- _tzmsk_u32 ⚠tbm
- Sets all bits below the least significant one of xand clears all other bits.
- _xgetbv⚠xsave
- Reads the contents of the extended control register XCRspecified inxcr_no.
- _xrstor⚠xsave
- Performs a full or partial restore of the enabled processor states using
the state information stored in memory at mem_addr.
- _xrstors⚠xsaveandxsaves
- Performs a full or partial restore of the enabled processor states using the
state information stored in memory at mem_addr.
- _xsave⚠xsave
- Performs a full or partial save of the enabled processor states to memory at
mem_addr.
- _xsavec⚠xsaveandxsavec
- Performs a full or partial save of the enabled processor states to memory
at mem_addr.
- _xsaveopt⚠xsaveandxsaveopt
- Performs a full or partial save of the enabled processor states to memory at
mem_addr.
- _xsaves⚠xsaveandxsaves
- Performs a full or partial save of the enabled processor states to memory at
mem_addr
- _xsetbv⚠xsave
- Copies 64-bits from valto the extended control register (XCR) specified bya.
- _MM_SHUFFLE Experimental 
- A utility function for creating masks to use with Intel shuffle and permute intrinsics.
- _mm256_abs_ ph Experimental avx512fp16andavx512vl
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm256_add_ ph Experimental avx512fp16andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_bcstnebf16_ ⚠ps Experimental avxneconvert
- Convert scalar BF16 (16-bit) floating point element stored at memory locations starting at location a to single precision (32-bit) floating-point, broadcast it to packed single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_bcstnesh_ ⚠ps Experimental avxneconvert
- Convert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location a to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_castpd_ ph Experimental avx512fp16
- Cast vector of type __m256dto type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph128_ ph256 Experimental avx512fp16
- Cast vector of type __m128hto type__m256h. The upper 8 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm256_castph256_ ph128 Experimental avx512fp16
- Cast vector of type __m256hto type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph_ pd Experimental avx512fp16
- Cast vector of type __m256hto type__m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph_ ps Experimental avx512fp16
- Cast vector of type __m256hto type__m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph_ si256 Experimental avx512fp16
- Cast vector of type __m256hto type__m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castps_ ph Experimental avx512fp16
- Cast vector of type __m256to type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castsi256_ ph Experimental avx512fp16
- Cast vector of type __m256ito type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_cmp_ ph_ mask Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_conj_ pch Experimental avx512fp16andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number
is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_cvtepi16_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepi32_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepi64_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_cvtepu16_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu32_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu64_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_cvtneeph_ ⚠ps Experimental avxneconvert
- Convert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneoph_ ⚠ps Experimental avxneconvert
- Convert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtpd_ ph Experimental avx512fp16andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_cvtph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm256_cvtph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm256_cvtph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm256_cvtph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm256_cvtph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm256_cvtph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm256_cvtph_ pd Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtsh_ h Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from atodst.
- _mm256_cvttph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm256_cvttph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm256_cvtxph_ ps Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtxps_ ph Experimental avx512fp16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_div_ ph Experimental avx512fp16andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm256_fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_fcmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm256_fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm256_fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_fmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm256_fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm256_fpclass_ ph_ mask Experimental avx512fp16andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_getexp_ ph Experimental avx512fp16andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates floor(log2(x))for each element.
- _mm256_getmant_ ph Experimental avx512fp16andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm256_load_ ⚠ph Experimental avx512fp16andavx512vl
- Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_loadu_ ⚠ph Experimental avx512fp16andavx512vl
- Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm256_mask3_ fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask3_ fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask3_ fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask_ add_ ph Experimental avx512fp16andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ blend_ ph Experimental avx512fp16andavx512vl
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_mask_ cmp_ ph_ mask Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ cmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ conj_ pch Experimental avx512fp16andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ cvtepi16_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi32_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ cvtepi64_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_mask_ cvtepu16_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu32_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ cvtepu64_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_mask_ cvtpd_ ph Experimental avx512fp16andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_mask_ cvtph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtph_ pd Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ cvttph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvttph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ cvtxph_ ps Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ cvtxps_ ph Experimental avx512fp16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ div_ ph Experimental avx512fp16andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ fcmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask_ fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask_ fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ fpclass_ ph_ mask Experimental avx512fp16andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_mask_ getexp_ ph Experimental avx512fp16andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element.
- _mm256_mask_ getmant_ ph Experimental avx512fp16andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm256_mask_ max_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_mask_ min_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_mask_ mul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask_ mul_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ rcp_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_mask_ reduce_ ph Experimental avx512fp16andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ roundscale_ ph Experimental avx512fp16andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ rsqrt_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm256_mask_ scalef_ ph Experimental avx512fp16andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sqrt_ ph Experimental avx512fp16andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ sub_ ph Experimental avx512fp16andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_maskz_ add_ ph Experimental avx512fp16andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ conj_ pch Experimental avx512fp16andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ cvtepi16_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi32_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepi64_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_maskz_ cvtepu16_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu32_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtepu64_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_maskz_ cvtpd_ ph Experimental avx512fp16andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_maskz_ cvtph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtph_ pd Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvttph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtxph_ ps Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ cvtxps_ ph Experimental avx512fp16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ div_ ph Experimental avx512fp16andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ fcmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_maskz_ fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_maskz_ fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ getexp_ ph Experimental avx512fp16andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element.
- _mm256_maskz_ getmant_ ph Experimental avx512fp16andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm256_maskz_ max_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_maskz_ min_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_maskz_ mul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_maskz_ mul_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ rcp_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_maskz_ reduce_ ph Experimental avx512fp16andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ roundscale_ ph Experimental avx512fp16andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ rsqrt_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm256_maskz_ scalef_ ph Experimental avx512fp16andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sqrt_ ph Experimental avx512fp16andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ sub_ ph Experimental avx512fp16andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_max_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_min_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_mul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mul_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_permutex2var_ ph Experimental avx512fp16andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutexvar_ ph Experimental avx512fp16andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm256_rcp_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_reduce_ add_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm256_reduce_ max_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ min_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ mul_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm256_reduce_ ph Experimental avx512fp16andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm256_roundscale_ ph Experimental avx512fp16andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm256_rsqrt_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm256_scalef_ ph Experimental avx512fp16andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_set1_ ph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm256_set_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm256_setr_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm256_setzero_ ph Experimental avx512fp16andavx512vl
- Return vector of type __m256h with all elements set to zero.
- _mm256_sqrt_ ph Experimental avx512fp16andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm256_store_ ⚠ph Experimental avx512fp16andavx512vl
- Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_storeu_ ⚠ph Experimental avx512fp16andavx512vl
- Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm256_sub_ ph Experimental avx512fp16andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm256_undefined_ ph Experimental avx512fp16andavx512vl
- Return vector of type __m256hwith indetermination elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm256_zextph128_ ph256 Experimental avx512fp16
- Cast vector of type __m256hto type__m128h. The upper 8 elements of the result are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_abs_ ph Experimental avx512fp16
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm512_add_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_add_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_castpd_ ph Experimental avx512fp16
- Cast vector of type __m512dto type__m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph128_ ph512 Experimental avx512fp16
- Cast vector of type __m128hto type__m512h. The upper 24 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_castph256_ ph512 Experimental avx512fp16
- Cast vector of type __m256hto type__m512h. The upper 16 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_castph512_ ph128 Experimental avx512fp16
- Cast vector of type __m512hto type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph512_ ph256 Experimental avx512fp16
- Cast vector of type __m512hto type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph_ pd Experimental avx512fp16
- Cast vector of type __m512hto type__m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph_ ps Experimental avx512fp16
- Cast vector of type __m512hto type__m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph_ si512 Experimental avx512fp16
- Cast vector of type __m512hto type__m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps_ ph Experimental avx512fp16
- Cast vector of type __m512to type__m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ ph Experimental avx512fp16
- Cast vector of type __m512ito type__m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_cmp_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ round_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_cmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_conj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number
is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_cvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_cvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_cvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_cvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_cvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_cvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_cvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtsh_ h Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from atodst.
- _mm512_cvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_div_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm512_div_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_fcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_fcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
- _mm512_fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_fmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fpclass_ ph_ mask Experimental avx512fp16
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_getexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates floor(log2(x))for each element.
- _mm512_getexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates floor(log2(x))for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_getmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm512_getmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_load_ ⚠ph Experimental avx512fp16
- Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_loadu_ ⚠ph Experimental avx512fp16
- Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm512_mask3_ fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask3_ fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c using writemask k (the element is copied from c when the corresponding
mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask3_ fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask3_ fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask3_ fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask_ add_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ add_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ blend_ ph Experimental avx512fp16
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_mask_ cmp_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmp_ round_ ph_ mask Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ cmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ cmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ conj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ cvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ cvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ cvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ div_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ div_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ fcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ fcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ fmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ fpclass_ ph_ mask Experimental avx512fp16
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_mask_ getexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element.
- _mm512_mask_ getexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_mask_ getmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm512_mask_ getmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_mask_ max_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_mask_ max_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_mask_ min_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_mask_ min_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_mask_ mul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ mul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ mul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ mul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ rcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_mask_ reduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ reduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ roundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ roundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_mask_ rsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm512_mask_ scalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ scalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ sub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ sub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ add_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ add_ round_ ph Experimental avx512fp16
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ cmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ cmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ conj_ pch Experimental avx512fp16
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ cvt_ roundepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvt_ roundph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi16_ ph Experimental avx512fp16
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi32_ ph Experimental avx512fp16
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepi64_ ph Experimental avx512fp16
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu16_ ph Experimental avx512fp16
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu32_ ph Experimental avx512fp16
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtepu64_ ph Experimental avx512fp16
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtpd_ ph Experimental avx512fp16
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtph_ pd Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtt_ roundph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtt_ roundph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtt_ roundph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtt_ roundph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtt_ roundph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtt_ roundph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttph_ epi16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttph_ epi32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttph_ epi64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttph_ epu16 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttph_ epu32 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvttph_ epu64 Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtx_ roundph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtx_ roundps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtxph_ ps Experimental avx512fp16
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ cvtxps_ ph Experimental avx512fp16
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ div_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ div_ round_ ph Experimental avx512fp16
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ fcmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ fcmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c using zeromask k (the element is zeroed out when the corresponding
mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ fcmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ fcmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ fmadd_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ fmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmadd_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ fmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmaddsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmaddsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsubadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmsubadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fmul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ fmul_ round_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ fnmadd_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fnmadd_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fnmsub_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ fnmsub_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ getexp_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element.
- _mm512_maskz_ getexp_ round_ ph Experimental avx512fp16
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_maskz_ getmant_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm512_maskz_ getmant_ round_ ph Experimental avx512fp16
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_maskz_ max_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ max_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ min_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ min_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ mul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ mul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ mul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ mul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ rcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_maskz_ reduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ reduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ roundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ roundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_maskz_ rsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm512_maskz_ scalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ scalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ sub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ sub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_max_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_max_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_min_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_min_ round_ ph Experimental avx512fp16
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_mul_ pch Experimental avx512fp16
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mul_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_mul_ round_ pch Experimental avx512fp16
- Multiply the packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mul_ round_ ph Experimental avx512fp16
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_permutex2var_ ph Experimental avx512fp16
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutexvar_ ph Experimental avx512fp16
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm512_rcp_ ph Experimental avx512fp16
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_reduce_ add_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ max_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ min_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠mul_ ph Experimental avx512fp16
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_reduce_ round_ ph Experimental avx512fp16
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_roundscale_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm512_roundscale_ round_ ph Experimental avx512fp16
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_rsqrt_ ph Experimental avx512fp16
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm512_scalef_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_scalef_ round_ ph Experimental avx512fp16
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_set1_ ph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm512_set_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm512_setr_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm512_setzero_ ph Experimental avx512fp16
- Return vector of type __m512h with all elements set to zero.
- _mm512_sqrt_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm512_sqrt_ round_ ph Experimental avx512fp16
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_store_ ⚠ph Experimental avx512fp16
- Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_storeu_ ⚠ph Experimental avx512fp16
- Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm512_sub_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm512_sub_ round_ ph Experimental avx512fp16
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_undefined_ ph Experimental avx512fp16
- Return vector of type __m512hwith indetermination elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _mm512_zextph128_ ph512 Experimental avx512fp16
- Cast vector of type __m128hto type__m512h. The upper 24 elements of the result are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_zextph256_ ph512 Experimental avx512fp16
- Cast vector of type __m256hto type__m512h. The upper 16 elements of the result are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm_abs_ ph Experimental avx512fp16andavx512vl
- Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the results in dst.
- _mm_add_ ph Experimental avx512fp16andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_add_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_add_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_bcstnebf16_ ⚠ps Experimental avxneconvert
- Convert scalar BF16 (16-bit) floating point element stored at memory locations starting at location a to single precision (32-bit) floating-point, broadcast it to packed single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_bcstnesh_ ⚠ps Experimental avxneconvert
- Convert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location a to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_castpd_ ph Experimental avx512fp16
- Cast vector of type __m128dto type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castph_ pd Experimental avx512fp16
- Cast vector of type __m128hto type__m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castph_ ps Experimental avx512fp16
- Cast vector of type __m128hto type__m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castph_ si128 Experimental avx512fp16
- Cast vector of type __m128hto type__m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castps_ ph Experimental avx512fp16
- Cast vector of type __m128to type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castsi128_ ph Experimental avx512fp16
- Cast vector of type __m128ito type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_cmp_ ph_ mask Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ round_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cmp_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_cmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_cmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
- _mm_cmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
- _mm_comi_ round_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_comi_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
- _mm_comieq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1).
- _mm_comige_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
- _mm_comigt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1).
- _mm_comile_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1).
- _mm_comilt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1).
- _mm_comineq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1).
- _mm_conj_ pch Experimental avx512fp16andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex
number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines
the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_cvt_ roundi32_ sh Experimental avx512fp16
- Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvt_ roundsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvt_ roundsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_cvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvt_ roundu32_ sh Experimental avx512fp16
- Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtepi16_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtepi32_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_cvtepi64_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_cvtepu16_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu32_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_cvtepu64_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_cvti32_ sh Experimental avx512fp16
- Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtneeph_ ⚠ps Experimental avxneconvert
- Convert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtneoph_ ⚠ps Experimental avxneconvert
- Convert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtness_ sbh Experimental avx512bf16andavx512vl
- Converts a single-precision (32-bit) floating-point element in a to a BF16 (16-bit) floating-point element, and store the result in dst.
- _mm_cvtpd_ ph Experimental avx512fp16andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_cvtph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm_cvtph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_cvtph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm_cvtph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm_cvtph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_cvtph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm_cvtph_ pd Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtsbh_ ss Experimental avx512bf16andavx512f
- Converts a single BF16 (16-bit) floating-point element in a to a single-precision (32-bit) floating-point element, and store the result in dst.
- _mm_cvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtsh_ h Experimental avx512fp16
- Copy the lower half-precision (16-bit) floating-point element from atodst.
- _mm_cvtsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvtsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_cvtsi16_ si128 Experimental avx512fp16
- Copy 16-bit integer a to the lower elements of dst, and zero the upper elements of dst.
- _mm_cvtsi128_ si16 Experimental avx512fp16
- Copy the lower 16-bit integer in a to dst.
- _mm_cvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtt_ roundsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvtt_ roundsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_cvttph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm_cvttph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm_cvttsh_ i32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvttsh_ u32 Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_cvtu32_ sh Experimental avx512fp16
- Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtxph_ ps Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtxps_ ph Experimental avx512fp16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_div_ ph Experimental avx512fp16andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm_div_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_div_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
- _mm_fcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm_fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the
upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm_fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_fmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed
of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm_fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm_fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fpclass_ ph_ mask Experimental avx512fp16andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ sh_ mask Experimental avx512fp16
- Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k. imm can be a combination of:
- _mm_getexp_ ph Experimental avx512fp16andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst.
This intrinsic essentially calculates floor(log2(x))for each element.
- _mm_getexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially
calculates floor(log2(x))for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_getexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially
calculates floor(log2(x))for the lower element.
- _mm_getmant_ ph Experimental avx512fp16andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_getmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper
elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_getmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper
elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_load_ ⚠ph Experimental avx512fp16andavx512vl
- Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_load_ ⚠sh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector, and zero the upper elements
- _mm_loadu_ ⚠ph Experimental avx512fp16andavx512vl
- Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm_mask3_ fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask3_ fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask3_ fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask3_ fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from c when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask3_ fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from c when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask3_ fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from c when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask3_ fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask_ add_ ph Experimental avx512fp16andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ add_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ add_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_mask_ blend_ ph Experimental avx512fp16andavx512vl
- Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_mask_ cmp_ ph_ mask Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ cmp_ round_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ cmp_ sh_ mask Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1.
- _mm_mask_ cmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ cmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ cmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
- _mm_mask_ conj_ pch Experimental avx512fp16andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k
(the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two
adjacent half-precision (16-bit) floating-point elements, which defines the complex number
complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ cvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ cvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ cvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ cvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ cvtepi16_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ cvtepi32_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_mask_ cvtepi64_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_mask_ cvtepu16_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ cvtepu32_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_mask_ cvtepu64_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_mask_ cvtpd_ ph Experimental avx512fp16andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_mask_ cvtph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtph_ pd Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ cvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ cvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ cvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ cvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ cvttph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvttph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ cvtxph_ ps Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ cvtxps_ ph Experimental avx512fp16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_mask_ div_ ph Experimental avx512fp16andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ div_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ div_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_mask_ fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is
copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ fcmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ fcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ fcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using writemask k (the element is copied from a when the corresponding
mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from a when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using writemask k (elements are copied from a when
mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst.
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements,
which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ fmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ fmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ fmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ fpclass_ ph_ mask Experimental avx512fp16andavx512vl
- Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ fpclass_ sh_ mask Experimental avx512fp16
- Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ getexp_ ph Experimental avx512fp16andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k
(elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element.
- _mm_mask_ getexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7
packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x))for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_mask_ getexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7
packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x))for the lower element.
- _mm_mask_ getmant_ ph Experimental avx512fp16andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_mask_ getmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_mask_ getmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_mask_ ⚠load_ sh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using writemask k (the element is copied from src when mask bit 0 is not set), and zero the upper elements.
- _mm_mask_ max_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_mask_ max_ round_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_mask_ max_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_mask_ min_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_mask_ min_ round_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_mask_ min_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_mask_ move_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ mul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element
is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ mul_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ mul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed
elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ mul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ mul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed
elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ mul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_mask_ rcp_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_mask_ rcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst using writemask k (the element is copied from src when
mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_mask_ reduce_ ph Experimental avx512fp16andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ reduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ reduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ roundscale_ ph Experimental avx512fp16andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ roundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ roundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ rsqrt_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using writemask k (elements are copied from src when
the corresponding mask bit is not set).
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_mask_ rsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst using writemask k (the element is copied from src
when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_mask_ scalef_ ph Experimental avx512fp16andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ scalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ scalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ sqrt_ ph Experimental avx512fp16andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ sqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠store_ sh Experimental avx512fp16
- Store the lower half-precision (16-bit) floating-point element from a into memory using writemask k
- _mm_mask_ sub_ ph Experimental avx512fp16andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ sub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ sub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_maskz_ add_ ph Experimental avx512fp16andavx512vl
- Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ add_ round_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ add_ sh Experimental avx512fp16
- Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_maskz_ cmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ cmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ cmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1],
- _mm_maskz_ conj_ pch Experimental avx512fp16andavx512vl
- Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k
(the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ cvt_ roundsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ cvt_ roundsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ cvt_ roundsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ cvt_ roundss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ cvtepi16_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepi32_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_maskz_ cvtepi64_ ph Experimental avx512fp16andavx512vl
- Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_maskz_ cvtepu16_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtepu32_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_maskz_ cvtepu64_ ph Experimental avx512fp16andavx512vl
- Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_maskz_ cvtpd_ ph Experimental avx512fp16andavx512vl
- Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_maskz_ cvtph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtph_ pd Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtsd_ sh Experimental avx512fp16
- Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ cvtsh_ sd Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ cvtsh_ ss Experimental avx512fp16
- Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ cvtss_ sh Experimental avx512fp16
- Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ cvttph_ epi16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttph_ epi32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttph_ epi64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttph_ epu16 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttph_ epu32 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvttph_ epu64 Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtxph_ ps Experimental avx512fp16andavx512vl
- Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ cvtxps_ ph Experimental avx512fp16andavx512vl
- Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_maskz_ div_ ph Experimental avx512fp16andavx512vl
- Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ div_ round_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ div_ sh Experimental avx512fp16
- Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_maskz_ fcmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate
to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is
zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ fcmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c using zeromask k (the element is zeroed out when the corresponding
mask bit is not set), and store the result in the lower elements of dst, and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit)
floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ fcmadd_ sch Experimental avx512fp16
- Multiply the lower complex number in a by the complex conjugate of the lower complex number in b,
accumulate to the lower complex number in c, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when the corresponding mask bit is not set), and copy the upper
6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ fcmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and
store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ fcmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ fcmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b,
and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ fmadd_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c,
and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask
bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point
elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ fmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmadd_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask
bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each
complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ fmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ fmadd_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and
store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask
bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each
complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which
defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ fmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ fmaddsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ fmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ fmsubadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fmul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ fmul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ fmul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ fnmadd_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fnmadd_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ fnmadd_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ fnmsub_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ fnmsub_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ fnmsub_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ getexp_ ph Experimental avx512fp16andavx512vl
- Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision
(16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask
k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates
floor(log2(x))for each element.
- _mm_maskz_ getexp_ round_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed
elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x))for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_maskz_ getexp_ sh Experimental avx512fp16
- Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision
(16-bit) floating-point number representing the integer exponent, store the result in the lower element
of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed
elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x))for the lower element.
- _mm_maskz_ getmant_ ph Experimental avx512fp16andavx512vl
- Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store
the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_maskz_ getmant_ round_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_maskz_ getmant_ sh Experimental avx512fp16
- Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store
the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set),
and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates
±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_maskz_ ⚠load_ sh Experimental avx512fp16
- Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using zeromask k (the element is zeroed out when mask bit 0 is not set), and zero the upper elements.
- _mm_maskz_ max_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_maskz_ max_ round_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_maskz_ max_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_maskz_ min_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_maskz_ min_ round_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_maskz_ min_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_maskz_ move_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ mul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element
is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent
half-precision (16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ mul_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ mul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ mul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ mul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using
zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements
from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision
(16-bit) floating-point elements, which defines the complex number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ mul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_maskz_ rcp_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_maskz_ rcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0
is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_maskz_ reduce_ ph Experimental avx512fp16andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ reduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ reduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ roundscale_ ph Experimental avx512fp16andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ roundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ roundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ rsqrt_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst using zeromask k (elements are zeroed out when the
corresponding mask bit is not set).
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_maskz_ rsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when
mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_maskz_ scalef_ ph Experimental avx512fp16andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ scalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ scalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ sqrt_ ph Experimental avx512fp16andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ sqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ sub_ ph Experimental avx512fp16andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ sub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ sub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_max_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_max_ round_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_max_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_min_ ph Experimental avx512fp16andavx512vl
- Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_min_ round_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_min_ sh Experimental avx512fp16andavx512vl
- Compare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_move_ sh Experimental avx512fp16
- Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mul_ pch Experimental avx512fp16andavx512vl
- Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mul_ ph Experimental avx512fp16andavx512vl
- Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_mul_ round_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mul_ round_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_mul_ sch Experimental avx512fp16
- Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst,
and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is
composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex
number complex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mul_ sh Experimental avx512fp16
- Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_permutex2var_ ph Experimental avx512fp16andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutexvar_ ph Experimental avx512fp16andavx512vl
- Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm_rcp_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal of packed 16-bit floating-point elements in aand stores the results indst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_rcp_ sh Experimental avx512fp16
- Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b,
store the result in the lower element of dst, and copy the upper 7 packed elements from a to the
upper elements of dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_reduce_ add_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm_reduce_ max_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ min_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ mul_ ph Experimental avx512fp16andavx512vl
- Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm_reduce_ ph Experimental avx512fp16andavx512vl
- Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm_reduce_ round_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_reduce_ sh Experimental avx512fp16
- Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_roundscale_ ph Experimental avx512fp16andavx512vl
- Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm_roundscale_ round_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_roundscale_ sh Experimental avx512fp16
- Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_rsqrt_ ph Experimental avx512fp16andavx512vl
- Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point
elements in a, and store the results in dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_rsqrt_ sh Experimental avx512fp16
- Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point
element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a
to the upper elements of dst.
The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm_scalef_ ph Experimental avx512fp16andavx512vl
- Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_scalef_ round_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_scalef_ sh Experimental avx512fp16
- Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_set1_ ph Experimental avx512fp16
- Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm_set_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm_set_ sh Experimental avx512fp16
- Copy half-precision (16-bit) floating-point elements from a to the lower element of dst and zero the upper 7 elements.
- _mm_setr_ ph Experimental avx512fp16
- Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm_setzero_ ph Experimental avx512fp16andavx512vl
- Return vector of type __m128h with all elements set to zero.
- _mm_sqrt_ ph Experimental avx512fp16andavx512vl
- Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm_sqrt_ round_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_sqrt_ sh Experimental avx512fp16
- Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_store_ ⚠ph Experimental avx512fp16andavx512vl
- Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_store_ ⚠sh Experimental avx512fp16
- Store the lower half-precision (16-bit) floating-point element from a into memory.
- _mm_storeu_ ⚠ph Experimental avx512fp16andavx512vl
- Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm_sub_ ph Experimental avx512fp16andavx512vl
- Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm_sub_ round_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_sub_ sh Experimental avx512fp16
- Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_ucomieq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomige_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomigt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomile_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomilt_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomineq_ sh Experimental avx512fp16
- Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_undefined_ ph Experimental avx512fp16andavx512vl
- Return vector of type __m128hwith indetermination elements. Despite using the word “undefined” (following Intel’s naming scheme), this non-deterministically picks some valid value and is not equivalent tomem::MaybeUninit. In practice, this is typically equivalent tomem::zeroed.
- _xabort⚠Experimental rtm
- Forces a restricted transactional memory (RTM) region to abort.
- _xabort_code Experimental 
- Retrieves the parameter passed to _xabortwhen_xbegin’s status has the_XABORT_EXPLICITflag set.
- _xbegin⚠Experimental rtm
- Specifies the start of a restricted transactional memory (RTM) code region and returns a value indicating status.
- _xend⚠Experimental rtm
- Specifies the end of a restricted transactional memory (RTM) code region.
- _xtest⚠Experimental rtm
- Queries whether the processor is executing in a transactional region identified by restricted transactional memory (RTM) or hardware lock elision (HLE).
Type Aliases§
- _MM_CMPINT_ ENUM 
- The _MM_CMPINT_ENUMtype used to specify comparison operations in AVX-512 intrinsics.
- _MM_MANTISSA_ NORM_ ENUM 
- The MM_MANTISSA_NORM_ENUMtype used to specify mantissa normalized operations in AVX-512 intrinsics.
- _MM_MANTISSA_ SIGN_ ENUM 
- The MM_MANTISSA_SIGN_ENUMtype used to specify mantissa signed operations in AVX-512 intrinsics.
- _MM_PERM_ ENUM 
- The MM_PERM_ENUMtype used to specify shuffle operations in AVX-512 intrinsics.
- __mmask8 
- The __mmask8type used in AVX-512 intrinsics, a 8-bit integer
- __mmask16 
- The __mmask16type used in AVX-512 intrinsics, a 16-bit integer
- __mmask32 
- The __mmask32type used in AVX-512 intrinsics, a 32-bit integer
- __mmask64 
- The __mmask64type used in AVX-512 intrinsics, a 64-bit integer