1.27.0[−][src]Module core::arch::x86_64
Platformspecific intrinsics for the x86_64
platform.
See the module documentation for more details.
Structs
__m512  Experimental 512bit wide set of sixteen 
__m512d  Experimental 512bit wide set of eight 
__m512i  Experimental 512bit wide integer vector type, x86specific 
CpuidResult  Result of the 
__m128  128bit wide set of four 
__m128d  128bit wide set of two 
__m128i  128bit wide integer vector type, x86specific 
__m256  256bit wide set of eight 
__m256d  256bit wide set of four 
__m256i  256bit wide integer vector type, x86specific 
Constants
_MM_CMPINT_EQ  Experimental Equal 
_MM_CMPINT_FALSE  Experimental False 
_MM_CMPINT_LE  Experimental Lessthanorequal 
_MM_CMPINT_LT  Experimental Lessthan 
_MM_CMPINT_NE  Experimental Notequal 
_MM_CMPINT_NLE  Experimental Not lessthanorequal 
_MM_CMPINT_NLT  Experimental Not lessthan 
_MM_CMPINT_TRUE  Experimental True 
_MM_MANT_NORM_1_2  Experimental interval [1, 2) 
_MM_MANT_NORM_P5_1  Experimental interval [0.5, 1) 
_MM_MANT_NORM_P5_2  Experimental interval [0.5, 2) 
_MM_MANT_NORM_P75_1P5  Experimental interval [0.75, 1.5) 
_MM_MANT_SIGN_NAN  Experimental DEST = NaN if sign(SRC) = 1 
_MM_MANT_SIGN_SRC  Experimental sign = sign(SRC) 
_MM_MANT_SIGN_ZERO  Experimental sign = 0 
_MM_PERM_AAAA  Experimental 
_MM_PERM_AAAB  Experimental 
_MM_PERM_AAAC  Experimental 
_MM_PERM_AAAD  Experimental 
_MM_PERM_AABA  Experimental 
_MM_PERM_AABB  Experimental 
_MM_PERM_AABC  Experimental 
_MM_PERM_AABD  Experimental 
_MM_PERM_AACA  Experimental 
_MM_PERM_AACB  Experimental 
_MM_PERM_AACC  Experimental 
_MM_PERM_AACD  Experimental 
_MM_PERM_AADA  Experimental 
_MM_PERM_AADB  Experimental 
_MM_PERM_AADC  Experimental 
_MM_PERM_AADD  Experimental 
_MM_PERM_ABAA  Experimental 
_MM_PERM_ABAB  Experimental 
_MM_PERM_ABAC  Experimental 
_MM_PERM_ABAD  Experimental 
_MM_PERM_ABBA  Experimental 
_MM_PERM_ABBB  Experimental 
_MM_PERM_ABBC  Experimental 
_MM_PERM_ABBD  Experimental 
_MM_PERM_ABCA  Experimental 
_MM_PERM_ABCB  Experimental 
_MM_PERM_ABCC  Experimental 
_MM_PERM_ABCD  Experimental 
_MM_PERM_ABDA  Experimental 
_MM_PERM_ABDB  Experimental 
_MM_PERM_ABDC  Experimental 
_MM_PERM_ABDD  Experimental 
_MM_PERM_ACAA  Experimental 
_MM_PERM_ACAB  Experimental 
_MM_PERM_ACAC  Experimental 
_MM_PERM_ACAD  Experimental 
_MM_PERM_ACBA  Experimental 
_MM_PERM_ACBB  Experimental 
_MM_PERM_ACBC  Experimental 
_MM_PERM_ACBD  Experimental 
_MM_PERM_ACCA  Experimental 
_MM_PERM_ACCB  Experimental 
_MM_PERM_ACCC  Experimental 
_MM_PERM_ACCD  Experimental 
_MM_PERM_ACDA  Experimental 
_MM_PERM_ACDB  Experimental 
_MM_PERM_ACDC  Experimental 
_MM_PERM_ACDD  Experimental 
_MM_PERM_ADAA  Experimental 
_MM_PERM_ADAB  Experimental 
_MM_PERM_ADAC  Experimental 
_MM_PERM_ADAD  Experimental 
_MM_PERM_ADBA  Experimental 
_MM_PERM_ADBB  Experimental 
_MM_PERM_ADBC  Experimental 
_MM_PERM_ADBD  Experimental 
_MM_PERM_ADCA  Experimental 
_MM_PERM_ADCB  Experimental 
_MM_PERM_ADCC  Experimental 
_MM_PERM_ADCD  Experimental 
_MM_PERM_ADDA  Experimental 
_MM_PERM_ADDB  Experimental 
_MM_PERM_ADDC  Experimental 
_MM_PERM_ADDD  Experimental 
_MM_PERM_BAAA  Experimental 
_MM_PERM_BAAB  Experimental 
_MM_PERM_BAAC  Experimental 
_MM_PERM_BAAD  Experimental 
_MM_PERM_BABA  Experimental 
_MM_PERM_BABB  Experimental 
_MM_PERM_BABC  Experimental 
_MM_PERM_BABD  Experimental 
_MM_PERM_BACA  Experimental 
_MM_PERM_BACB  Experimental 
_MM_PERM_BACC  Experimental 
_MM_PERM_BACD  Experimental 
_MM_PERM_BADA  Experimental 
_MM_PERM_BADB  Experimental 
_MM_PERM_BADC  Experimental 
_MM_PERM_BADD  Experimental 
_MM_PERM_BBAA  Experimental 
_MM_PERM_BBAB  Experimental 
_MM_PERM_BBAC  Experimental 
_MM_PERM_BBAD  Experimental 
_MM_PERM_BBBA  Experimental 
_MM_PERM_BBBB  Experimental 
_MM_PERM_BBBC  Experimental 
_MM_PERM_BBBD  Experimental 
_MM_PERM_BBCA  Experimental 
_MM_PERM_BBCB  Experimental 
_MM_PERM_BBCC  Experimental 
_MM_PERM_BBCD  Experimental 
_MM_PERM_BBDA  Experimental 
_MM_PERM_BBDB  Experimental 
_MM_PERM_BBDC  Experimental 
_MM_PERM_BBDD  Experimental 
_MM_PERM_BCAA  Experimental 
_MM_PERM_BCAB  Experimental 
_MM_PERM_BCAC  Experimental 
_MM_PERM_BCAD  Experimental 
_MM_PERM_BCBA  Experimental 
_MM_PERM_BCBB  Experimental 
_MM_PERM_BCBC  Experimental 
_MM_PERM_BCBD  Experimental 
_MM_PERM_BCCA  Experimental 
_MM_PERM_BCCB  Experimental 
_MM_PERM_BCCC  Experimental 
_MM_PERM_BCCD  Experimental 
_MM_PERM_BCDA  Experimental 
_MM_PERM_BCDB  Experimental 
_MM_PERM_BCDC  Experimental 
_MM_PERM_BCDD  Experimental 
_MM_PERM_BDAA  Experimental 
_MM_PERM_BDAB  Experimental 
_MM_PERM_BDAC  Experimental 
_MM_PERM_BDAD  Experimental 
_MM_PERM_BDBA  Experimental 
_MM_PERM_BDBB  Experimental 
_MM_PERM_BDBC  Experimental 
_MM_PERM_BDBD  Experimental 
_MM_PERM_BDCA  Experimental 
_MM_PERM_BDCB  Experimental 
_MM_PERM_BDCC  Experimental 
_MM_PERM_BDCD  Experimental 
_MM_PERM_BDDA  Experimental 
_MM_PERM_BDDB  Experimental 
_MM_PERM_BDDC  Experimental 
_MM_PERM_BDDD  Experimental 
_MM_PERM_CAAA  Experimental 
_MM_PERM_CAAB  Experimental 
_MM_PERM_CAAC  Experimental 
_MM_PERM_CAAD  Experimental 
_MM_PERM_CABA  Experimental 
_MM_PERM_CABB  Experimental 
_MM_PERM_CABC  Experimental 
_MM_PERM_CABD  Experimental 
_MM_PERM_CACA  Experimental 
_MM_PERM_CACB  Experimental 
_MM_PERM_CACC  Experimental 
_MM_PERM_CACD  Experimental 
_MM_PERM_CADA  Experimental 
_MM_PERM_CADB  Experimental 
_MM_PERM_CADC  Experimental 
_MM_PERM_CADD  Experimental 
_MM_PERM_CBAA  Experimental 
_MM_PERM_CBAB  Experimental 
_MM_PERM_CBAC  Experimental 
_MM_PERM_CBAD  Experimental 
_MM_PERM_CBBA  Experimental 
_MM_PERM_CBBB  Experimental 
_MM_PERM_CBBC  Experimental 
_MM_PERM_CBBD  Experimental 
_MM_PERM_CBCA  Experimental 
_MM_PERM_CBCB  Experimental 
_MM_PERM_CBCC  Experimental 
_MM_PERM_CBCD  Experimental 
_MM_PERM_CBDA  Experimental 
_MM_PERM_CBDB  Experimental 
_MM_PERM_CBDC  Experimental 
_MM_PERM_CBDD  Experimental 
_MM_PERM_CCAA  Experimental 
_MM_PERM_CCAB  Experimental 
_MM_PERM_CCAC  Experimental 
_MM_PERM_CCAD  Experimental 
_MM_PERM_CCBA  Experimental 
_MM_PERM_CCBB  Experimental 
_MM_PERM_CCBC  Experimental 
_MM_PERM_CCBD  Experimental 
_MM_PERM_CCCA  Experimental 
_MM_PERM_CCCB  Experimental 
_MM_PERM_CCCC  Experimental 
_MM_PERM_CCCD  Experimental 
_MM_PERM_CCDA  Experimental 
_MM_PERM_CCDB  Experimental 
_MM_PERM_CCDC  Experimental 
_MM_PERM_CCDD  Experimental 
_MM_PERM_CDAA  Experimental 
_MM_PERM_CDAB  Experimental 
_MM_PERM_CDAC  Experimental 
_MM_PERM_CDAD  Experimental 
_MM_PERM_CDBA  Experimental 
_MM_PERM_CDBB  Experimental 
_MM_PERM_CDBC  Experimental 
_MM_PERM_CDBD  Experimental 
_MM_PERM_CDCA  Experimental 
_MM_PERM_CDCB  Experimental 
_MM_PERM_CDCC  Experimental 
_MM_PERM_CDCD  Experimental 
_MM_PERM_CDDA  Experimental 
_MM_PERM_CDDB  Experimental 
_MM_PERM_CDDC  Experimental 
_MM_PERM_CDDD  Experimental 
_MM_PERM_DAAA  Experimental 
_MM_PERM_DAAB  Experimental 
_MM_PERM_DAAC  Experimental 
_MM_PERM_DAAD  Experimental 
_MM_PERM_DABA  Experimental 
_MM_PERM_DABB  Experimental 
_MM_PERM_DABC  Experimental 
_MM_PERM_DABD  Experimental 
_MM_PERM_DACA  Experimental 
_MM_PERM_DACB  Experimental 
_MM_PERM_DACC  Experimental 
_MM_PERM_DACD  Experimental 
_MM_PERM_DADA  Experimental 
_MM_PERM_DADB  Experimental 
_MM_PERM_DADC  Experimental 
_MM_PERM_DADD  Experimental 
_MM_PERM_DBAA  Experimental 
_MM_PERM_DBAB  Experimental 
_MM_PERM_DBAC  Experimental 
_MM_PERM_DBAD  Experimental 
_MM_PERM_DBBA  Experimental 
_MM_PERM_DBBB  Experimental 
_MM_PERM_DBBC  Experimental 
_MM_PERM_DBBD  Experimental 
_MM_PERM_DBCA  Experimental 
_MM_PERM_DBCB  Experimental 
_MM_PERM_DBCC  Experimental 
_MM_PERM_DBCD  Experimental 
_MM_PERM_DBDA  Experimental 
_MM_PERM_DBDB  Experimental 
_MM_PERM_DBDC  Experimental 
_MM_PERM_DBDD  Experimental 
_MM_PERM_DCAA  Experimental 
_MM_PERM_DCAB  Experimental 
_MM_PERM_DCAC  Experimental 
_MM_PERM_DCAD  Experimental 
_MM_PERM_DCBA  Experimental 
_MM_PERM_DCBB  Experimental 
_MM_PERM_DCBC  Experimental 
_MM_PERM_DCBD  Experimental 
_MM_PERM_DCCA  Experimental 
_MM_PERM_DCCB  Experimental 
_MM_PERM_DCCC  Experimental 
_MM_PERM_DCCD  Experimental 
_MM_PERM_DCDA  Experimental 
_MM_PERM_DCDB  Experimental 
_MM_PERM_DCDC  Experimental 
_MM_PERM_DCDD  Experimental 
_MM_PERM_DDAA  Experimental 
_MM_PERM_DDAB  Experimental 
_MM_PERM_DDAC  Experimental 
_MM_PERM_DDAD  Experimental 
_MM_PERM_DDBA  Experimental 
_MM_PERM_DDBB  Experimental 
_MM_PERM_DDBC  Experimental 
_MM_PERM_DDBD  Experimental 
_MM_PERM_DDCA  Experimental 
_MM_PERM_DDCB  Experimental 
_MM_PERM_DDCC  Experimental 
_MM_PERM_DDCD  Experimental 
_MM_PERM_DDDA  Experimental 
_MM_PERM_DDDB  Experimental 
_MM_PERM_DDDC  Experimental 
_MM_PERM_DDDD  Experimental 
_XABORT_CAPACITY  Experimental Transaction abort due to the transaction using too much memory. 
_XABORT_CONFLICT  Experimental Transaction abort due to a memory conflict with another thread. 
_XABORT_DEBUG  Experimental Transaction abort due to a debug trap. 
_XABORT_EXPLICIT  Experimental Transaction explicitly aborted with xabort. The parameter passed to xabort is available with

_XABORT_NESTED  Experimental Transaction abort in a inner nested transaction. 
_XABORT_RETRY  Experimental Transaction retry is possible. 
_XBEGIN_STARTED  Experimental Transaction successfully started. 
_CMP_EQ_OQ  Equal (ordered, nonsignaling) 
_CMP_EQ_OS  Equal (ordered, signaling) 
_CMP_EQ_UQ  Equal (unordered, nonsignaling) 
_CMP_EQ_US  Equal (unordered, signaling) 
_CMP_FALSE_OQ  False (ordered, nonsignaling) 
_CMP_FALSE_OS  False (ordered, signaling) 
_CMP_GE_OQ  Greaterthanorequal (ordered, nonsignaling) 
_CMP_GE_OS  Greaterthanorequal (ordered, signaling) 
_CMP_GT_OQ  Greaterthan (ordered, nonsignaling) 
_CMP_GT_OS  Greaterthan (ordered, signaling) 
_CMP_LE_OQ  Lessthanorequal (ordered, nonsignaling) 
_CMP_LE_OS  Lessthanorequal (ordered, signaling) 
_CMP_LT_OQ  Lessthan (ordered, nonsignaling) 
_CMP_LT_OS  Lessthan (ordered, signaling) 
_CMP_NEQ_OQ  Notequal (ordered, nonsignaling) 
_CMP_NEQ_OS  Notequal (ordered, signaling) 
_CMP_NEQ_UQ  Notequal (unordered, nonsignaling) 
_CMP_NEQ_US  Notequal (unordered, signaling) 
_CMP_NGE_UQ  Notgreaterthanorequal (unordered, nonsignaling) 
_CMP_NGE_US  Notgreaterthanorequal (unordered, signaling) 
_CMP_NGT_UQ  Notgreaterthan (unordered, nonsignaling) 
_CMP_NGT_US  Notgreaterthan (unordered, signaling) 
_CMP_NLE_UQ  Notlessthanorequal (unordered, nonsignaling) 
_CMP_NLE_US  Notlessthanorequal (unordered, signaling) 
_CMP_NLT_UQ  Notlessthan (unordered, nonsignaling) 
_CMP_NLT_US  Notlessthan (unordered, signaling) 
_CMP_ORD_Q  Ordered (nonsignaling) 
_CMP_ORD_S  Ordered (signaling) 
_CMP_TRUE_UQ  True (unordered, nonsignaling) 
_CMP_TRUE_US  True (unordered, signaling) 
_CMP_UNORD_Q  Unordered (nonsignaling) 
_CMP_UNORD_S  Unordered (signaling) 
_MM_EXCEPT_DENORM  See 
_MM_EXCEPT_DIV_ZERO  See 
_MM_EXCEPT_INEXACT  See 
_MM_EXCEPT_INVALID  See 
_MM_EXCEPT_MASK  
_MM_EXCEPT_OVERFLOW  See 
_MM_EXCEPT_UNDERFLOW  See 
_MM_FLUSH_ZERO_MASK  
_MM_FLUSH_ZERO_OFF  See 
_MM_FLUSH_ZERO_ON  See 
_MM_FROUND_CEIL  round up and do not suppress exceptions 
_MM_FROUND_CUR_DIRECTION  use MXCSR.RC; see 
_MM_FROUND_FLOOR  round down and do not suppress exceptions 
_MM_FROUND_NEARBYINT  use MXCSR.RC and suppress exceptions; see 
_MM_FROUND_NINT  round to nearest and do not suppress exceptions 
_MM_FROUND_NO_EXC  suppress exceptions 
_MM_FROUND_RAISE_EXC  do not suppress exceptions 
_MM_FROUND_RINT  use MXCSR.RC and do not suppress exceptions; see

_MM_FROUND_TO_NEAREST_INT  round to nearest 
_MM_FROUND_TO_NEG_INF  round down 
_MM_FROUND_TO_POS_INF  round up 
_MM_FROUND_TO_ZERO  truncate 
_MM_FROUND_TRUNC  truncate and do not suppress exceptions 
_MM_HINT_NTA  See 
_MM_HINT_T0  See 
_MM_HINT_T1  See 
_MM_HINT_T2  See 
_MM_MASK_DENORM  See 
_MM_MASK_DIV_ZERO  See 
_MM_MASK_INEXACT  See 
_MM_MASK_INVALID  See 
_MM_MASK_MASK  
_MM_MASK_OVERFLOW  See 
_MM_MASK_UNDERFLOW  See 
_MM_ROUND_DOWN  See 
_MM_ROUND_MASK  
_MM_ROUND_NEAREST  See 
_MM_ROUND_TOWARD_ZERO  See 
_MM_ROUND_UP  See 
_SIDD_BIT_MASK  Mask only: return the bit mask 
_SIDD_CMP_EQUAL_ANY  For each character in 
_SIDD_CMP_EQUAL_EACH  The strings defined by 
_SIDD_CMP_EQUAL_ORDERED  Search for the defined substring in the target 
_SIDD_CMP_RANGES  For each character in 
_SIDD_LEAST_SIGNIFICANT  Index only: return the least significant bit (Default) 
_SIDD_MASKED_NEGATIVE_POLARITY  Negates results only before the end of the string 
_SIDD_MASKED_POSITIVE_POLARITY  Do not negate results before the end of the string 
_SIDD_MOST_SIGNIFICANT  Index only: return the most significant bit 
_SIDD_NEGATIVE_POLARITY  Negates results 
_SIDD_POSITIVE_POLARITY  Do not negate results (Default) 
_SIDD_SBYTE_OPS  String contains signed 8bit characters 
_SIDD_SWORD_OPS  String contains unsigned 16bit characters 
_SIDD_UBYTE_OPS  String contains unsigned 8bit characters (Default) 
_SIDD_UNIT_MASK  Mask only: return the byte mask 
_SIDD_UWORD_OPS  String contains unsigned 16bit characters 
_XCR_XFEATURE_ENABLED_MASK 

Functions
_MM_SHUFFLE  Experimental A utility function for creating masks to use with Intel shuffle and permute intrinsics. 
_bittest^{⚠}  Experimental Returns the bit in position 
_bittest64^{⚠}  Experimental Returns the bit in position 
_bittestandcomplement^{⚠}  Experimental Returns the bit in position 
_bittestandcomplement64^{⚠}  Experimental Returns the bit in position 
_bittestandreset^{⚠}  Experimental Returns the bit in position 
_bittestandreset64^{⚠}  Experimental Returns the bit in position 
_bittestandset^{⚠}  Experimental Returns the bit in position 
_bittestandset64^{⚠}  Experimental Returns the bit in position 
_kand_mask16^{⚠}  Experimentalavx512f Compute the bitwise AND of 16bit masks a and b, and store the result in k. 
_kandn_mask16^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit masks a and then AND with b, and store the result in k. 
_knot_mask16^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit mask a, and store the result in k. 
_kor_mask16^{⚠}  Experimentalavx512f Compute the bitwise OR of 16bit masks a and b, and store the result in k. 
_kxnor_mask16^{⚠}  Experimentalavx512f Compute the bitwise XNOR of 16bit masks a and b, and store the result in k. 
_kxor_mask16^{⚠}  Experimentalavx512f Compute the bitwise XOR of 16bit masks a and b, and store the result in k. 
_mm256_cvtph_ps^{⚠}  Experimentalf16c Converts the 8 x 16bit halfprecision float values in the 128bit vector

_mm256_cvtps_ph^{⚠}  Experimentalf16c Converts the 8 x 32bit float values in the 256bit vector 
_mm256_madd52hi_epu64^{⚠}  Experimentalavx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm256_madd52lo_epu64^{⚠}  Experimentalavx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_abs_epi32^{⚠}  Experimentalavx512f Computes the absolute values of packed 32bit integers in 
_mm512_abs_epi64^{⚠}  Experimentalavx512f Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst. 
_mm512_abs_pd^{⚠}  Experimentalavx512f Finds the absolute value of each packed doubleprecision (64bit) floatingpoint element in v2, storing the results in dst. 
_mm512_abs_ps^{⚠}  Experimentalavx512f Finds the absolute value of each packed singleprecision (32bit) floatingpoint element in v2, storing the results in dst. 
_mm512_add_epi32^{⚠}  Experimentalavx512f Add packed 32bit integers in a and b, and store the results in dst. 
_mm512_add_epi64^{⚠}  Experimentalavx512f Add packed 64bit integers in a and b, and store the results in dst. 
_mm512_add_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_add_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_add_round_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_add_round_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_and_epi32^{⚠}  Experimentalavx512f Compute the bitwise AND of packed 32bit integers in a and b, and store the results in dst. 
_mm512_and_epi64^{⚠}  Experimentalavx512f Compute the bitwise AND of 512 bits (composed of packed 64bit integers) in a and b, and store the results in dst. 
_mm512_and_si512^{⚠}  Experimentalavx512f Compute the bitwise AND of 512 bits (representing integer data) in a and b, and store the result in dst. 
_mm512_andnot_epi32^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 32bit integers in a and then AND with b, and store the results in dst. 
_mm512_andnot_epi64^{⚠}  Experimentalavx512f Compute the bitwise NOT of 512 bits (composed of packed 64bit integers) in a and then AND with b, and store the results in dst. 
_mm512_andnot_si512^{⚠}  Experimentalavx512f Compute the bitwise NOT of 512 bits (representing integer data) in a and then AND with b, and store the result in dst. 
_mm512_broadcast_f32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst. 
_mm512_broadcast_f64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed doubleprecision (64bit) floatingpoint elements from a to all elements of dst. 
_mm512_broadcast_i32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 32bit integers from a to all elements of dst. 
_mm512_broadcast_i64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 64bit integers from a to all elements of dst. 
_mm512_broadcastd_epi32^{⚠}  Experimentalavx512f Broadcast the low packed 32bit integer from a to all elements of dst. 
_mm512_broadcastq_epi64^{⚠}  Experimentalavx512f Broadcast the low packed 64bit integer from a to all elements of dst. 
_mm512_broadcastsd_pd^{⚠}  Experimentalavx512f Broadcast the low doubleprecision (64bit) floatingpoint element from a to all elements of dst. 
_mm512_broadcastss_ps^{⚠}  Experimentalavx512f Broadcast the low singleprecision (32bit) floatingpoint element from a to all elements of dst. 
_mm512_castpd128_pd512^{⚠}  Experimentalavx512f Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd256_pd512^{⚠}  Experimentalavx512f Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd512_pd128^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd512_pd256^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd_ps^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd_si512^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps128_ps512^{⚠}  Experimentalavx512f Cast vector of type __m128 to type __m512; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps256_ps512^{⚠}  Experimentalavx512f Cast vector of type __m256 to type __m512; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps512_ps128^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps512_ps256^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps_pd^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps_si512^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi128_si512^{⚠}  Experimentalavx512f Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi256_si512^{⚠}  Experimentalavx512f Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_pd^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_ps^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_si128^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_si256^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_cmp_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b based on the comparison operand specified by op. 
_mm512_cmp_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b based on the comparison operand specified by op. 
_mm512_cmp_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b based on the comparison operand specified by op. 
_mm512_cmp_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b based on the comparison operand specified by op. 
_mm512_cmp_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by op. 
_mm512_cmp_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by op. 
_mm512_cmp_round_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by op. 
_mm512_cmp_round_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by op. 
_mm512_cmpeq_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for equality, and store the results in a mask vector. 
_mm512_cmpeq_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for equality, and store the results in a mask vector. 
_mm512_cmpeq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for equality, and store the results in a mask vector. 
_mm512_cmpeq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for equality, and store the results in a mask vector. 
_mm512_cmpeq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for equality, and store the results in a mask vector. 
_mm512_cmpeq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for equality, and store the results in a mask vector. 
_mm512_cmpge_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthanorequal, and store the results in a mask vector. 
_mm512_cmpge_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthanorequal, and store the results in a mask vector. 
_mm512_cmpge_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthanorequal, and store the results in a mask vector. 
_mm512_cmpge_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthanorequal, and store the results in a mask vector. 
_mm512_cmpgt_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmpgt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmpgt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmpgt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmple_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for lessthanorequal, and store the results in a mask vector. 
_mm512_cmple_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthanorequal, and store the results in a mask vector. 
_mm512_cmple_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthanorequal, and store the results in a mask vector. 
_mm512_cmple_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthanorequal, and store the results in a mask vector. 
_mm512_cmple_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthanorequal, and store the results in a mask vector. 
_mm512_cmple_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthanorequal, and store the results in a mask vector. 
_mm512_cmplt_epi32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in a mask vector. 
_mm512_cmplt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthan, and store the results in a mask vector. 
_mm512_cmplt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in a mask vector. 
_mm512_cmplt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthan, and store the results in a mask vector. 
_mm512_cmplt_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthan, and store the results in a mask vector. 
_mm512_cmplt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthan, and store the results in a mask vector. 
_mm512_cmpneq_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for inequality, and store the results in a mask vector. 
_mm512_cmpneq_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for inequality, and store the results in a mask vector. 
_mm512_cmpneq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for inequality, and store the results in a mask vector. 
_mm512_cmpneq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for inequality, and store the results in a mask vector. 
_mm512_cmpneq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for inequality, and store the results in a mask vector. 
_mm512_cmpneq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for inequality, and store the results in a mask vector. 
_mm512_cmpnle_pd_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmpnle_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmpnlt_pd_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmpnlt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector. 
_mm512_cmpord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in a mask vector. 
_mm512_cmpord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in a mask vector. 
_mm512_cmpunord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if either is NaN, and store the results in a mask vector. 
_mm512_cmpunord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if either is NaN, and store the results in a mask vector. 
_mm512_cvt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst. 
_mm512_cvt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm512_cvt_roundps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_cvtps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst. 
_mm512_cvtps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm512_cvtps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtt_roundpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_cvtt_roundpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_cvtt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_cvtt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_cvttpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. 
_mm512_cvttpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm512_cvttps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. 
_mm512_cvttps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm512_div_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst. 
_mm512_div_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst. 
_mm512_div_round_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, =and store the results in dst. 
_mm512_div_round_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst. 
_mm512_extractf32x4_ps^{⚠}  Experimentalavx512f Extract 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from a, selected with imm8, and store the result in dst. 
_mm512_fmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmaddsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmaddsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmaddsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmaddsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsubadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fmsubadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fmsubadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fmsubadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fnmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_fnmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_fnmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_fnmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_getexp_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_getexp_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_getexp_round_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_getexp_round_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_getmant_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm512_getmant_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm512_getmant_round_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_getmant_round_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_i32gather_epi32^{⚠}  Experimentalavx512f Gather 32bit integers from memory using 32bit indices. 
_mm512_i32gather_epi64^{⚠}  Experimentalavx512f Gather 64bit integers from memory using 32bit indices. 
_mm512_i32gather_pd^{⚠}  Experimentalavx512f Gather doubleprecision (64bit) floatingpoint elements from memory using 32bit indices. 
_mm512_i32gather_ps^{⚠}  Experimentalavx512f Gather singleprecision (32bit) floatingpoint elements from memory using 32bit indices. 
_mm512_i32scatter_epi32^{⚠}  Experimentalavx512f Scatter 32bit integers from src into memory using 32bit indices. 
_mm512_i32scatter_epi64^{⚠}  Experimentalavx512f Scatter 64bit integers from src into memory using 32bit indices. 
_mm512_i32scatter_pd^{⚠}  Experimentalavx512f Scatter doubleprecision (64bit) floatingpoint elements from memory using 32bit indices. 
_mm512_i32scatter_ps^{⚠}  Experimentalavx512f Scatter singleprecision (32bit) floatingpoint elements from memory using 32bit indices. 
_mm512_i64gather_epi32^{⚠}  Experimentalavx512f Gather 32bit integers from memory using 64bit indices. 
_mm512_i64gather_epi64^{⚠}  Experimentalavx512f Gather 64bit integers from memory using 64bit indices. 
_mm512_i64gather_pd^{⚠}  Experimentalavx512f Gather doubleprecision (64bit) floatingpoint elements from memory using 64bit indices. 
_mm512_i64gather_ps^{⚠}  Experimentalavx512f Gather singleprecision (32bit) floatingpoint elements from memory using 64bit indices. 
_mm512_i64scatter_epi32^{⚠}  Experimentalavx512f Scatter 32bit integers from src into memory using 64bit indices. 
_mm512_i64scatter_epi64^{⚠}  Experimentalavx512f Scatter 64bit integers from src into memory using 64bit indices. 
_mm512_i64scatter_pd^{⚠}  Experimentalavx512f Scatter doubleprecision (64bit) floatingpoint elements from src into memory using 64bit indices. 
_mm512_i64scatter_ps^{⚠}  Experimentalavx512f Scatter singleprecision (32bit) floatingpoint elements from src into memory using 64bit indices. 
_mm512_insertf32x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from b into dst at the location specified by imm8. 
_mm512_insertf64x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 256 bits (composed of 4 packed doubleprecision (64bit) floatingpoint elements) from b into dst at the location specified by imm8. 
_mm512_inserti32x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 128 bits (composed of 4 packed 32bit integers) from b into dst at the location specified by imm8. 
_mm512_inserti64x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 256 bits (composed of 4 packed 64bit integers) from b into dst at the location specified by imm8. 
_mm512_kand^{⚠}  Experimentalavx512f Compute the bitwise AND of 16bit masks a and b, and store the result in k. 
_mm512_kandn^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit masks a and then AND with b, and store the result in k. 
_mm512_kmov^{⚠}  Experimentalavx512f Copy 16bit mask a to k. 
_mm512_knot^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit mask a, and store the result in k. 
_mm512_kor^{⚠}  Experimentalavx512f Compute the bitwise OR of 16bit masks a and b, and store the result in k. 
_mm512_kxnor^{⚠}  Experimentalavx512f Compute the bitwise XNOR of 16bit masks a and b, and store the result in k. 
_mm512_kxor^{⚠}  Experimentalavx512f Compute the bitwise XOR of 16bit masks a and b, and store the result in k. 
_mm512_loadu_pd^{⚠}  Experimentalavx512f Loads 512bits (composed of 8 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm512_loadu_ps^{⚠}  Experimentalavx512f Loads 512bits (composed of 16 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm512_madd52hi_epu64^{⚠}  Experimentalavx512ifma Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_madd52lo_epu64^{⚠}  Experimentalavx512ifma Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_mask2_permutex2var_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm512_mask2_permutex2var_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm512_mask2_permutex2var_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set) 
_mm512_mask2_permutex2var_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm512_mask3_fmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_pd^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_round_pd^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask_abs_epi32^{⚠}  Experimentalavx512f Computes the absolute value of packed 32bit integers in 
_mm512_mask_abs_epi64^{⚠}  Experimentalavx512f Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_abs_pd^{⚠}  Experimentalavx512f Finds the absolute value of each packed doubleprecision (64bit) floatingpoint element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_abs_ps^{⚠}  Experimentalavx512f Finds the absolute value of each packed singleprecision (32bit) floatingpoint element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_epi32^{⚠}  Experimentalavx512f Add packed 32bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_epi64^{⚠}  Experimentalavx512f Add packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_round_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_round_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_and_epi32^{⚠}  Experimentalavx512f Performs elementbyelement bitwise AND between packed 32bit integer elements of v2 and v3, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_and_epi64^{⚠}  Experimentalavx512f Compute the bitwise AND of packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_andnot_epi32^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 32bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_andnot_epi64^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 64bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_blend_epi32^{⚠}  Experimentalavx512f Blend packed 32bit integers from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_epi64^{⚠}  Experimentalavx512f Blend packed 64bit integers from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_pd^{⚠}  Experimentalavx512f Blend packed doubleprecision (64bit) floatingpoint elements from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_ps^{⚠}  Experimentalavx512f Blend packed singleprecision (32bit) floatingpoint elements from a and b using control mask k, and store the results in dst. 
_mm512_mask_broadcast_f32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcast_f64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed doubleprecision (64bit) floatingpoint elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcast_i32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 32bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcast_i64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 64bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastd_epi32^{⚠}  Experimentalavx512f Broadcast the low packed 32bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastq_epi64^{⚠}  Experimentalavx512f Broadcast the low packed 64bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastsd_pd^{⚠}  Experimentalavx512f Broadcast the low doubleprecision (64bit) floatingpoint element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastss_ps^{⚠}  Experimentalavx512f Broadcast the low singleprecision (32bit) floatingpoint element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cmp_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_round_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_round_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by op, using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for equality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for equality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for equality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for equality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for equality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for equality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for lessthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthanorequal, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epi32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for inequality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for inequality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for inequality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for inequality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for inequality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for inequality, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnle_pd_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnle_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnlt_pd_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnlt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for greaterthan, and store the results in a mask vector k using zeromask m (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in a mask vector. 
_mm512_mask_cmpord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in a mask vector. 
_mm512_mask_cmpunord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if either is NaN, and store the results in a mask vector. 
_mm512_mask_cmpunord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if either is NaN, and store the results in a mask vector. 
_mm512_mask_cvt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_cvtps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtt_roundpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_cvtt_roundpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_cvtt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_cvtt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_cvttpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvttpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvttps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvttps_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_div_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_div_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_div_round_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_div_round_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_fmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmaddsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmaddsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmaddsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmaddsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsubadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsubadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsubadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fmsubadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_fnmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_getexp_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_mask_getexp_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_mask_getexp_round_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_getexp_round_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_getmant_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm512_mask_getmant_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm512_mask_getmant_round_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_getmant_round_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_i32gather_epi32^{⚠}  Experimentalavx512f Gather 32bit integers from memory using 32bit indices. 
_mm512_mask_i32gather_epi64^{⚠}  Experimentalavx512f Gather 64bit integers from memory using 32bit indices. 
_mm512_mask_i32gather_pd^{⚠}  Experimentalavx512f Gather doubleprecision (64bit) floatingpoint elements from memory using 32bit indices. 
_mm512_mask_i32gather_ps^{⚠}  Experimentalavx512f Gather singleprecision (32bit) floatingpoint elements from memory using 32bit indices. 
_mm512_mask_i32scatter_epi32^{⚠}  Experimentalavx512f Scatter 32bit integers from src into memory using 32bit indices. 
_mm512_mask_i32scatter_epi64^{⚠}  Experimentalavx512f Scatter 64bit integers from src into memory using 32bit indices. 
_mm512_mask_i32scatter_pd^{⚠}  Experimentalavx512f Scatter doubleprecision (64bit) floatingpoint elements from src into memory using 32bit indices. 
_mm512_mask_i32scatter_ps^{⚠}  Experimentalavx512f Scatter singleprecision (32bit) floatingpoint elements from src into memory using 32bit indices. 
_mm512_mask_i64gather_epi32^{⚠}  Experimentalavx512f Gather 32bit integers from memory using 64bit indices. 
_mm512_mask_i64gather_epi64^{⚠}  Experimentalavx512f Gather 64bit integers from memory using 64bit indices. 
_mm512_mask_i64gather_pd^{⚠}  Experimentalavx512f Gather doubleprecision (64bit) floatingpoint elements from memory using 64bit indices. 
_mm512_mask_i64gather_ps^{⚠}  Experimentalavx512f Gather singleprecision (32bit) floatingpoint elements from memory using 64bit indices. 
_mm512_mask_i64scatter_epi32^{⚠}  Experimentalavx512f Scatter 32bit integers from src into memory using 64bit indices. 
_mm512_mask_i64scatter_epi64^{⚠}  Experimentalavx512f Scatter 64bit integers from src into memory using 64bit indices. 
_mm512_mask_i64scatter_pd^{⚠}  Experimentalavx512f Scatter doubleprecision (64bit) floatingpoint elements from src into memory using 64bit indices. 
_mm512_mask_i64scatter_ps^{⚠}  Experimentalavx512f Scatter singleprecision (32bit) floatingpoint elements from src into memory using 64bit indices. 
_mm512_mask_insertf32x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_insertf64x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 256 bits (composed of 4 packed doubleprecision (64bit) floatingpoint elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_inserti32x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 128 bits (composed of 4 packed 32bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_inserti64x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 256 bits (composed of 4 packed 64bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_max_epi32^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_max_epi64^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_max_epu32^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_max_epu64^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_max_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_max_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_max_round_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_max_round_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_min_epi32^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_min_epi64^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_min_epu32^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_min_epu64^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_min_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_min_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_min_round_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_min_round_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_mask_movedup_pd^{⚠}  Experimentalavx512f Duplicate evenindexed doubleprecision (64bit) floatingpoint elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_movehdup_ps^{⚠}  Experimentalavx512f Duplicate oddindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_moveldup_ps^{⚠}  Experimentalavx512f Duplicate evenindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_mul_epi32^{⚠}  Experimentalavx512f Multiply the low signed 32bit integers from each packed 64bit element in a and b, and store the signed 64bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_mul_epu32^{⚠}  Experimentalavx512f Multiply the low unsigned 32bit integers from each packed 64bit element in a and b, and store the unsigned 64bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_mul_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). RM. 
_mm512_mask_mul_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). RM. 
_mm512_mask_mul_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_mul_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_mullo_epi32^{⚠}  Experimentalavx512f Multiply the packed 32bit integers in a and b, producing intermediate 64bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_mullox_epi64^{⚠}  Experimentalavx512f Multiplies elements in packed 64bit integer vectors a and b together, storing the lower 64 bits of the result in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_or_epi32^{⚠}  Experimentalavx512f Compute the bitwise OR of packed 32bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_or_epi64^{⚠}  Experimentalavx512f Compute the bitwise OR of packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permute_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permute_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutevar_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Note that this intrinsic shuffles across 128bit lanes, unlike past intrinsics that use the permutevar name. This intrinsic is identical to _mm512_mask_permutexvar_epi32, and it is recommended that you use that intrinsic name. 
_mm512_mask_permutevar_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutevar_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutex2var_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_permutex2var_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_permutex2var_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_permutex2var_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask_permutex_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a within 256bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutex_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 256bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutexvar_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutexvar_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutexvar_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_permutexvar_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_rcp14_pd^{⚠}  Experimentalavx512f Compute the approximate reciprocal of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_mask_rcp14_ps^{⚠}  Experimentalavx512f Compute the approximate reciprocal of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_mask_rol_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_rol_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_rolv_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_rolv_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_ror_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_ror_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_rorv_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_rorv_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_rsqrt14_pd^{⚠}  Experimentalavx512f Compute the approximate reciprocal square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_mask_rsqrt14_ps^{⚠}  Experimentalavx512f Compute the approximate reciprocal square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_mask_shuffle_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_shuffle_f32x4^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 4 singleprecision (32bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_shuffle_f64x2^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 2 doubleprecision (64bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_shuffle_i32x4^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 4 32bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_shuffle_i64x2^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 2 64bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_shuffle_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_shuffle_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sll_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sll_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_slli_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_slli_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sllv_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sllv_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sqrt_pd^{⚠}  Experimentalavx512f Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sqrt_ps^{⚠}  Experimentalavx512f Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sqrt_round_pd^{⚠}  Experimentalavx512f Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sqrt_round_ps^{⚠}  Experimentalavx512f Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sra_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sra_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srai_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srai_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srav_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srav_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srl_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srl_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srli_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srli_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srlv_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_srlv_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sub_epi32^{⚠}  Experimentalavx512f Subtract packed 32bit integers in b from packed 32bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sub_epi64^{⚠}  Experimentalavx512f Subtract packed 64bit integers in b from packed 64bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sub_pd^{⚠}  Experimentalavx512f Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sub_ps^{⚠}  Experimentalavx512f Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sub_round_pd^{⚠}  Experimentalavx512f Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_sub_round_ps^{⚠}  Experimentalavx512f Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpackhi_epi32^{⚠}  Experimentalavx512f Unpack and interleave 32bit integers from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpackhi_epi64^{⚠}  Experimentalavx512f Unpack and interleave 64bit integers from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpackhi_pd^{⚠}  Experimentalavx512f Unpack and interleave doubleprecision (64bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpackhi_ps^{⚠}  Experimentalavx512f Unpack and interleave singleprecision (32bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpacklo_epi32^{⚠}  Experimentalavx512f Unpack and interleave 32bit integers from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpacklo_epi64^{⚠}  Experimentalavx512f Unpack and interleave 64bit integers from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpacklo_pd^{⚠}  Experimentalavx512f Unpack and interleave doubleprecision (64bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_unpacklo_ps^{⚠}  Experimentalavx512f Unpack and interleave singleprecision (32bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_xor_epi32^{⚠}  Experimentalavx512f Compute the bitwise XOR of packed 32bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_xor_epi64^{⚠}  Experimentalavx512f Compute the bitwise XOR of packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_maskz_abs_epi32^{⚠}  Experimentalavx512f Computes the absolute value of packed 32bit integers in 
_mm512_maskz_abs_epi64^{⚠}  Experimentalavx512f Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_add_epi32^{⚠}  Experimentalavx512f Add packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_add_epi64^{⚠}  Experimentalavx512f Add packed 64bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_add_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_add_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_add_round_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_add_round_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_and_epi32^{⚠}  Experimentalavx512f Compute the bitwise AND of packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_and_epi64^{⚠}  Experimentalavx512f Compute the bitwise AND of packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_andnot_epi32^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 32bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_andnot_epi64^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 64bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcast_f32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcast_f64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed doubleprecision (64bit) floatingpoint elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcast_i32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 32bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcast_i64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 64bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcastd_epi32^{⚠}  Experimentalavx512f Broadcast the low packed 32bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcastq_epi64^{⚠}  Experimentalavx512f Broadcast the low packed 64bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcastsd_pd^{⚠}  Experimentalavx512f Broadcast the low doubleprecision (64bit) floatingpoint element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_broadcastss_ps^{⚠}  Experimentalavx512f Broadcast the low singleprecision (32bit) floatingpoint element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvt_roundps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_cvtps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvtps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvtps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvtt_roundpd_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_cvtt_roundpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_cvtt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_cvtt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_cvttpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvttpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvttps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_cvttps_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_div_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_div_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_div_round_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_div_round_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in a using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmaddsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmaddsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmaddsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmaddsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsubadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsubadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsubadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fmsubadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_fnmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_getexp_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_maskz_getexp_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_maskz_getexp_round_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_getexp_round_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_getmant_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm512_maskz_getmant_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm512_maskz_getmant_round_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_getmant_round_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_insertf32x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_insertf64x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 256 bits (composed of 4 packed doubleprecision (64bit) floatingpoint elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_inserti32x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 128 bits (composed of 4 packed 32bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_inserti64x4^{⚠}  Experimentalavx512f Copy a to tmp, then insert 256 bits (composed of 4 packed 64bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_max_epi32^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_max_epi64^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_max_epu32^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_max_epu64^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_max_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_max_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_max_round_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_max_round_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_min_epi32^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_min_epi64^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_min_epu32^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_min_epu64^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_min_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_min_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_min_round_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_min_round_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_maskz_movedup_pd^{⚠}  Experimentalavx512f Duplicate evenindexed doubleprecision (64bit) floatingpoint elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_movehdup_ps^{⚠}  Experimentalavx512f Duplicate oddindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_moveldup_ps^{⚠}  Experimentalavx512f Duplicate evenindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_mul_epi32^{⚠}  Experimentalavx512f Multiply the low signed 32bit integers from each packed 64bit element in a and b, and store the signed 64bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_mul_epu32^{⚠}  Experimentalavx512f Multiply the low unsigned 32bit integers from each packed 64bit element in a and b, and store the unsigned 64bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_mul_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_mul_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_mul_round_pd^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_mul_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_mullo_epi32^{⚠}  Experimentalavx512f Multiply the packed 32bit integers in a and b, producing intermediate 64bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_or_epi32^{⚠}  Experimentalavx512f Compute the bitwise OR of packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_or_epi64^{⚠}  Experimentalavx512f Compute the bitwise OR of packed 64bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permute_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permute_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutevar_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutevar_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutex2var_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutex2var_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutex2var_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutex2var_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutex_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a within 256bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutex_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 256bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutexvar_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutexvar_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutexvar_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_permutexvar_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_rcp14_pd^{⚠}  Experimentalavx512f Compute the approximate reciprocal of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_maskz_rcp14_ps^{⚠}  Experimentalavx512f Compute the approximate reciprocal of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_maskz_rol_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_rol_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_rolv_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_rolv_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_ror_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_ror_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_rorv_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_rorv_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_rsqrt14_pd^{⚠}  Experimentalavx512f Compute the approximate reciprocal square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_maskz_rsqrt14_ps^{⚠}  Experimentalavx512f Compute the approximate reciprocal square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm512_maskz_shuffle_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_shuffle_f32x4^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 4 singleprecision (32bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_shuffle_f64x2^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 2 doubleprecision (64bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_shuffle_i32x4^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 4 32bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_shuffle_i64x2^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 2 64bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_shuffle_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_shuffle_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sll_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sll_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_slli_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_slli_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sllv_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sllv_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sqrt_pd^{⚠}  Experimentalavx512f Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sqrt_ps^{⚠}  Experimentalavx512f Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sqrt_round_pd^{⚠}  Experimentalavx512f Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sqrt_round_ps^{⚠}  Experimentalavx512f Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sra_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sra_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srai_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srai_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srav_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srav_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srl_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srl_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srli_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srli_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srlv_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_srlv_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sub_epi32^{⚠}  Experimentalavx512f Subtract packed 32bit integers in b from packed 32bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sub_epi64^{⚠}  Experimentalavx512f Subtract packed 64bit integers in b from packed 64bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sub_pd^{⚠}  Experimentalavx512f Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sub_ps^{⚠}  Experimentalavx512f Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sub_round_pd^{⚠}  Experimentalavx512f Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_sub_round_ps^{⚠}  Experimentalavx512f Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpackhi_epi32^{⚠}  Experimentalavx512f Unpack and interleave 32bit integers from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpackhi_epi64^{⚠}  Experimentalavx512f Unpack and interleave 64bit integers from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpackhi_pd^{⚠}  Experimentalavx512f Unpack and interleave doubleprecision (64bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpackhi_ps^{⚠}  Experimentalavx512f Unpack and interleave singleprecision (32bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpacklo_epi32^{⚠}  Experimentalavx512f Unpack and interleave 32bit integers from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpacklo_epi64^{⚠}  Experimentalavx512f Unpack and interleave 64bit integers from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpacklo_pd^{⚠}  Experimentalavx512f Unpack and interleave doubleprecision (64bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_unpacklo_ps^{⚠}  Experimentalavx512f Unpack and interleave singleprecision (32bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_xor_epi32^{⚠}  Experimentalavx512f Compute the bitwise XOR of packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_maskz_xor_epi64^{⚠}  Experimentalavx512f Compute the bitwise XOR of packed 64bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_max_epi32^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b, and store packed maximum values in dst. 
_mm512_max_epi64^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b, and store packed maximum values in dst. 
_mm512_max_epu32^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b, and store packed maximum values in dst. 
_mm512_max_epu64^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b, and store packed maximum values in dst. 
_mm512_max_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst. 
_mm512_max_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst. 
_mm512_max_round_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_max_round_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_min_epi32^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b, and store packed minimum values in dst. 
_mm512_min_epi64^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b, and store packed minimum values in dst. 
_mm512_min_epu32^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b, and store packed minimum values in dst. 
_mm512_min_epu64^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b, and store packed minimum values in dst. 
_mm512_min_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst. 
_mm512_min_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst. 
_mm512_min_round_pd^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_min_round_ps^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. 
_mm512_movedup_pd^{⚠}  Experimentalavx512f Duplicate evenindexed doubleprecision (64bit) floatingpoint elements from a, and store the results in dst. 
_mm512_movehdup_ps^{⚠}  Experimentalavx512f Duplicate oddindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst. 
_mm512_moveldup_ps^{⚠}  Experimentalavx512f Duplicate evenindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst. 
_mm512_mul_epi32^{⚠}  Experimentalavx512f Multiply the low signed 32bit integers from each packed 64bit element in a and b, and store the signed 64bit results in dst. 
_mm512_mul_epu32^{⚠}  Experimentalavx512f Multiply the low unsigned 32bit integers from each packed 64bit element in a and b, and store the unsigned 64bit results in dst. 
_mm512_mul_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_mul_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_mul_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_mul_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_mullo_epi32^{⚠}  Experimentalavx512f Multiply the packed 32bit integers in a and b, producing intermediate 64bit integers, and store the low 32 bits of the intermediate integers in dst. 
_mm512_mullox_epi64^{⚠}  Experimentalavx512f Multiplies elements in packed 64bit integer vectors a and b together, storing the lower 64 bits of the result in dst. 
_mm512_or_epi32^{⚠}  Experimentalavx512f Compute the bitwise OR of packed 32bit integers in a and b, and store the results in dst. 
_mm512_or_epi64^{⚠}  Experimentalavx512f Compute the bitwise OR of packed 64bit integers in a and b, and store the resut in dst. 
_mm512_or_si512^{⚠}  Experimentalavx512f Compute the bitwise OR of 512 bits (representing integer data) in a and b, and store the result in dst. 
_mm512_permute_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst. 
_mm512_permute_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst. 
_mm512_permutevar_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst. Note that this intrinsic shuffles across 128bit lanes, unlike past intrinsics that use the permutevar name. This intrinsic is identical to _mm512_permutexvar_epi32, and it is recommended that you use that intrinsic name. 
_mm512_permutevar_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst. 
_mm512_permutevar_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst. 
_mm512_permutex2var_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm512_permutex2var_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm512_permutex2var_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm512_permutex2var_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm512_permutex_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a within 256bit lanes using the control in imm8, and store the results in dst. 
_mm512_permutex_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a within 256bit lanes using the control in imm8, and store the results in dst. 
_mm512_permutexvar_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm512_permutexvar_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm512_permutexvar_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm512_permutexvar_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a across lanes using the corresponding index in idx. 
_mm512_rcp14_pd^{⚠}  Experimentalavx512f Compute the approximate reciprocal of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^14. 
_mm512_rcp14_ps^{⚠}  Experimentalavx512f Compute the approximate reciprocal of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^14. 
_mm512_rol_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in imm8, and store the results in dst. 
_mm512_rol_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in imm8, and store the results in dst. 
_mm512_rolv_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm512_rolv_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm512_ror_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in imm8, and store the results in dst. 
_mm512_ror_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in imm8, and store the results in dst. 
_mm512_rorv_epi32^{⚠}  Experimentalavx512f Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm512_rorv_epi64^{⚠}  Experimentalavx512f Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm512_rsqrt14_pd^{⚠}  Experimentalavx512f Compute the approximate reciprocal square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^14. 
_mm512_rsqrt14_ps^{⚠}  Experimentalavx512f Compute the approximate reciprocal square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^14. 
_mm512_set1_epi32^{⚠}  Experimentalavx512f Broadcast 32bit integer 
_mm512_set1_epi64^{⚠}  Experimentalavx512f Broadcast 64bit integer 
_mm512_set1_pd^{⚠}  Experimentalavx512f Broadcast 64bit float 
_mm512_set1_ps^{⚠}  Experimentalavx512f Broadcast 32bit float 
_mm512_set_epi32^{⚠}  Experimentalavx512f Sets packed 32bit integers in 
_mm512_set_epi64^{⚠}  Experimentalavx512f Sets packed 64bit integers in 
_mm512_set_pd^{⚠}  Experimentalavx512f Sets packed 64bit integers in 
_mm512_set_ps^{⚠}  Experimentalavx512f Sets packed 32bit integers in 
_mm512_setr_epi32^{⚠}  Experimentalavx512f Sets packed 32bit integers in 
_mm512_setr_epi64^{⚠}  Experimentalavx512f Sets packed 64bit integers in 
_mm512_setr_pd^{⚠}  Experimentalavx512f Sets packed 64bit integers in 
_mm512_setr_ps^{⚠}  Experimentalavx512f Sets packed 32bit integers in 
_mm512_setzero_pd^{⚠}  Experimentalavx512f Returns vector of type 
_mm512_setzero_ps^{⚠}  Experimentalavx512f Returns vector of type 
_mm512_setzero_si512^{⚠}  Experimentalavx512f Returns vector of type 
_mm512_shuffle_epi32^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst. 
_mm512_shuffle_f32x4^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 4 singleprecision (32bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst. 
_mm512_shuffle_f64x2^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 2 doubleprecision (64bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst. 
_mm512_shuffle_i32x4^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 4 32bit integers) selected by imm8 from a and b, and store the results in dst. 
_mm512_shuffle_i64x2^{⚠}  Experimentalavx512f Shuffle 128bits (composed of 2 64bit integers) selected by imm8 from a and b, and store the results in dst. 
_mm512_shuffle_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements within 128bit lanes using the control in imm8, and store the results in dst. 
_mm512_shuffle_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst. 
_mm512_sll_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by count while shifting in zeros, and store the results in dst. 
_mm512_sll_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by count while shifting in zeros, and store the results in dst. 
_mm512_slli_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by imm8 while shifting in zeros, and store the results in dst. 
_mm512_slli_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by imm8 while shifting in zeros, and store the results in dst. 
_mm512_sllv_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst. 
_mm512_sllv_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst. 
_mm512_sqrt_pd^{⚠}  Experimentalavx512f Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. 
_mm512_sqrt_ps^{⚠}  Experimentalavx512f Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. 
_mm512_sqrt_round_pd^{⚠}  Experimentalavx512f Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. 
_mm512_sqrt_round_ps^{⚠}  Experimentalavx512f Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. 
_mm512_sra_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by count while shifting in sign bits, and store the results in dst. 
_mm512_sra_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by count while shifting in sign bits, and store the results in dst. 
_mm512_srai_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by imm8 while shifting in sign bits, and store the results in dst. 
_mm512_srai_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by imm8 while shifting in sign bits, and store the results in dst. 
_mm512_srav_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst. 
_mm512_srav_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst. 
_mm512_srl_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by count while shifting in zeros, and store the results in dst. 
_mm512_srl_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by count while shifting in zeros, and store the results in dst. 
_mm512_srli_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by imm8 while shifting in zeros, and store the results in dst. 
_mm512_srli_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by imm8 while shifting in zeros, and store the results in dst. 
_mm512_srlv_epi32^{⚠}  Experimentalavx512f Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst. 
_mm512_srlv_epi64^{⚠}  Experimentalavx512f Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst. 
_mm512_storeu_pd^{⚠}  Experimentalavx512f Stores 512bits (composed of 8 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm512_sub_epi32^{⚠}  Experimentalavx512f Subtract packed 32bit integers in b from packed 32bit integers in a, and store the results in dst. 
_mm512_sub_epi64^{⚠}  Experimentalavx512f Subtract packed 64bit integers in b from packed 64bit integers in a, and store the results in dst. 
_mm512_sub_pd^{⚠}  Experimentalavx512f Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. 
_mm512_sub_ps^{⚠}  Experimentalavx512f Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. 
_mm512_sub_round_pd^{⚠}  Experimentalavx512f Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. 
_mm512_sub_round_ps^{⚠}  Experimentalavx512f Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. 
_mm512_undefined_pd^{⚠}  Experimentalavx512f Returns vector of type 
_mm512_undefined_ps^{⚠}  Experimentalavx512f Returns vector of type 
_mm512_unpackhi_epi32^{⚠}  Experimentalavx512f Unpack and interleave 32bit integers from the high half of each 128bit lane in a and b, and store the results in dst. 
_mm512_unpackhi_epi64^{⚠}  Experimentalavx512f Unpack and interleave 64bit integers from the high half of each 128bit lane in a and b, and store the results in dst. 
_mm512_unpackhi_pd^{⚠}  Experimentalavx512f Unpack and interleave doubleprecision (64bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst. 
_mm512_unpackhi_ps^{⚠}  Experimentalavx512f Unpack and interleave singleprecision (32bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst. 
_mm512_unpacklo_epi32^{⚠}  Experimentalavx512f Unpack and interleave 32bit integers from the low half of each 128bit lane in a and b, and store the results in dst. 
_mm512_unpacklo_epi64^{⚠}  Experimentalavx512f Unpack and interleave 64bit integers from the low half of each 128bit lane in a and b, and store the results in dst. 
_mm512_unpacklo_pd^{⚠}  Experimentalavx512f Unpack and interleave doubleprecision (64bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst. 
_mm512_unpacklo_ps^{⚠}  Experimentalavx512f Unpack and interleave singleprecision (32bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst. 
_mm512_xor_epi32^{⚠}  Experimentalavx512f Compute the bitwise XOR of packed 32bit integers in a and b, and store the results in dst. 
_mm512_xor_epi64^{⚠}  Experimentalavx512f Compute the bitwise XOR of packed 64bit integers in a and b, and store the results in dst. 
_mm512_xor_si512^{⚠}  Experimentalavx512f Compute the bitwise XOR of 512 bits (representing integer data) in a and b, and store the result in dst. 
_mm_cmp_round_sd_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector. 
_mm_cmp_round_ss_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector. 
_mm_cmp_sd_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector. 
_mm_cmp_ss_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector. 
_mm_cvtph_ps^{⚠}  Experimentalf16c Converts the 4 x 16bit halfprecision float values in the lowest 64bit of
the 128bit vector 
_mm_cvtps_ph^{⚠}  Experimentalf16c Converts the 4 x 32bit float values in the 128bit vector 
_mm_madd52hi_epu64^{⚠}  Experimentalavx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm_madd52lo_epu64^{⚠}  Experimentalavx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm_mask_cmp_round_sd_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector using zeromask m (the element is zeroed out when mask bit 0 is not set). 
_mm_mask_cmp_round_ss_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector using zeromask m (the element is zeroed out when mask bit 0 is not set). 
_mm_mask_cmp_sd_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector using zeromask m (the element is zeroed out when mask bit 0 is not set). 
_mm_mask_cmp_ss_mask^{⚠}  Experimentalavx512f Compare the lower singleprecision (32bit) floatingpoint element in a and b based on the comparison operand specified by imm8, and store the result in a mask vector using zeromask m (the element is zeroed out when mask bit 0 is not set). 
_xabort^{⚠}  Experimentalrtm Forces a restricted transactional memory (RTM) region to abort. 
_xabort_code  Experimental Retrieves the parameter passed to 
_xbegin^{⚠}  Experimentalrtm Specifies the start of a restricted transactional memory (RTM) code region and returns a value indicating status. 
_xend^{⚠}  Experimentalrtm Specifies the end of a restricted transactional memory (RTM) code region. 
_xtest^{⚠}  Experimentalrtm Queries whether the processor is executing in a transactional region identified by restricted transactional memory (RTM) or hardware lock elision (HLE). 
cmpxchg16b^{⚠}  Experimentalcmpxchg16b Compares and exchange 16 bytes (128 bits) of data atomically. 
has_cpuid  Experimental Does the host support the 
ud2^{⚠}  Experimental Generates the trap instruction 
_MM_GET_EXCEPTION_MASK^{⚠}  sse See 
_MM_GET_EXCEPTION_STATE^{⚠}  sse See 
_MM_GET_FLUSH_ZERO_MODE^{⚠}  sse See 
_MM_GET_ROUNDING_MODE^{⚠}  sse See 
_MM_SET_EXCEPTION_MASK^{⚠}  sse See 
_MM_SET_EXCEPTION_STATE^{⚠}  sse See 
_MM_SET_FLUSH_ZERO_MODE^{⚠}  sse See 
_MM_SET_ROUNDING_MODE^{⚠}  sse See 
_MM_TRANSPOSE4_PS^{⚠}  sse Transpose the 4x4 matrix formed by 4 rows of __m128 in place. 
__cpuid^{⚠}  See 
__cpuid_count^{⚠}  Returns the result of the 
__get_cpuid_max^{⚠}  Returns the highestsupported 
__rdtscp^{⚠}  Reads the current value of the processor’s timestamp counter and
the 
_addcarry_u32^{⚠}  Adds unsigned 32bit integers 
_addcarry_u64^{⚠}  Adds unsigned 64bit integers 
_addcarryx_u32^{⚠}  adx Adds unsigned 32bit integers 
_addcarryx_u64^{⚠}  adx Adds unsigned 64bit integers 
_andn_u32^{⚠}  bmi1 Bitwise logical 
_andn_u64^{⚠}  bmi1 Bitwise logical 
_bextr2_u32^{⚠}  bmi1 Extracts bits of 
_bextr2_u64^{⚠}  bmi1 Extracts bits of 
_bextr_u32^{⚠}  bmi1 Extracts bits in range [ 
_bextr_u64^{⚠}  bmi1 Extracts bits in range [ 
_blcfill_u32^{⚠}  tbm Clears all bits below the least significant zero bit of 
_blcfill_u64^{⚠}  tbm Clears all bits below the least significant zero bit of 
_blci_u32^{⚠}  tbm Sets all bits of 
_blci_u64^{⚠}  tbm Sets all bits of 
_blcic_u32^{⚠}  tbm Sets the least significant zero bit of 
_blcic_u64^{⚠}  tbm Sets the least significant zero bit of 
_blcmsk_u32^{⚠}  tbm Sets the least significant zero bit of 
_blcmsk_u64^{⚠}  tbm Sets the least significant zero bit of 
_blcs_u32^{⚠}  tbm Sets the least significant zero bit of 
_blcs_u64^{⚠}  tbm Sets the least significant zero bit of 
_blsfill_u32^{⚠}  tbm Sets all bits of 
_blsfill_u64^{⚠}  tbm Sets all bits of 
_blsi_u32^{⚠}  bmi1 Extracts lowest set isolated bit. 
_blsi_u64^{⚠}  bmi1 Extracts lowest set isolated bit. 
_blsic_u32^{⚠}  tbm Clears least significant bit and sets all other bits. 
_blsic_u64^{⚠}  tbm Clears least significant bit and sets all other bits. 
_blsmsk_u32^{⚠}  bmi1 Gets mask up to lowest set bit. 
_blsmsk_u64^{⚠}  bmi1 Gets mask up to lowest set bit. 
_blsr_u32^{⚠}  bmi1 Resets the lowest set bit of 
_blsr_u64^{⚠}  bmi1 Resets the lowest set bit of 
_bswap^{⚠}  Returns an integer with the reversed byte order of x 
_bswap64^{⚠}  Returns an integer with the reversed byte order of x 
_bzhi_u32^{⚠}  bmi2 Zeroes higher bits of 
_bzhi_u64^{⚠}  bmi2 Zeroes higher bits of 
_fxrstor^{⚠}  fxsr Restores the 
_fxrstor64^{⚠}  fxsr Restores the 
_fxsave^{⚠}  fxsr Saves the 
_fxsave64^{⚠}  fxsr Saves the 
_lzcnt_u32^{⚠}  lzcnt Counts the leading most significant zero bits. 
_lzcnt_u64^{⚠}  lzcnt Counts the leading most significant zero bits. 
_mm256_abs_epi8^{⚠}  avx2 Computes the absolute values of packed 8bit integers in 
_mm256_abs_epi16^{⚠}  avx2 Computes the absolute values of packed 16bit integers in 
_mm256_abs_epi32^{⚠}  avx2 Computes the absolute values of packed 32bit integers in 
_mm256_add_epi8^{⚠}  avx2 Adds packed 8bit integers in 
_mm256_add_epi16^{⚠}  avx2 Adds packed 16bit integers in 
_mm256_add_epi32^{⚠}  avx2 Adds packed 32bit integers in 
_mm256_add_epi64^{⚠}  avx2 Adds packed 64bit integers in 
_mm256_add_pd^{⚠}  avx Adds packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_add_ps^{⚠}  avx Adds packed singleprecision (32bit) floatingpoint elements in 
_mm256_adds_epi8^{⚠}  avx2 Adds packed 8bit integers in 
_mm256_adds_epi16^{⚠}  avx2 Adds packed 16bit integers in 
_mm256_adds_epu8^{⚠}  avx2 Adds packed unsigned 8bit integers in 
_mm256_adds_epu16^{⚠}  avx2 Adds packed unsigned 16bit integers in 
_mm256_addsub_pd^{⚠}  avx Alternatively adds and subtracts packed doubleprecision (64bit)
floatingpoint elements in 
_mm256_addsub_ps^{⚠}  avx Alternatively adds and subtracts packed singleprecision (32bit)
floatingpoint elements in 
_mm256_alignr_epi8^{⚠}  avx2 Concatenates pairs of 16byte blocks in 
_mm256_and_pd^{⚠}  avx Computes the bitwise AND of a packed doubleprecision (64bit)
floatingpoint elements in 
_mm256_and_ps^{⚠}  avx Computes the bitwise AND of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_and_si256^{⚠}  avx2 Computes the bitwise AND of 256 bits (representing integer data)
in 
_mm256_andnot_pd^{⚠}  avx Computes the bitwise NOT of packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_andnot_ps^{⚠}  avx Computes the bitwise NOT of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_andnot_si256^{⚠}  avx2 Computes the bitwise NOT of 256 bits (representing integer data)
in 
_mm256_avg_epu8^{⚠}  avx2 Averages packed unsigned 8bit integers in 
_mm256_avg_epu16^{⚠}  avx2 Averages packed unsigned 16bit integers in 
_mm256_blend_epi16^{⚠}  avx2 Blends packed 16bit integers from 
_mm256_blend_epi32^{⚠}  avx2 Blends packed 32bit integers from 
_mm256_blend_pd^{⚠}  avx Blends packed doubleprecision (64bit) floatingpoint elements from

_mm256_blend_ps^{⚠}  avx Blends packed singleprecision (32bit) floatingpoint elements from

_mm256_blendv_epi8^{⚠}  avx2 Blends packed 8bit integers from 
_mm256_blendv_pd^{⚠}  avx Blends packed doubleprecision (64bit) floatingpoint elements from

_mm256_blendv_ps^{⚠}  avx Blends packed singleprecision (32bit) floatingpoint elements from

_mm256_broadcast_pd^{⚠}  avx Broadcasts 128 bits from memory (composed of 2 packed doubleprecision (64bit) floatingpoint elements) to all elements of the returned vector. 
_mm256_broadcast_ps^{⚠}  avx Broadcasts 128 bits from memory (composed of 4 packed singleprecision (32bit) floatingpoint elements) to all elements of the returned vector. 
_mm256_broadcast_sd^{⚠}  avx Broadcasts a doubleprecision (64bit) floatingpoint element from memory to all elements of the returned vector. 
_mm256_broadcast_ss^{⚠}  avx Broadcasts a singleprecision (32bit) floatingpoint element from memory to all elements of the returned vector. 
_mm256_broadcastb_epi8^{⚠}  avx2 Broadcasts the low packed 8bit integer from 
_mm256_broadcastd_epi32^{⚠}  avx2 Broadcasts the low packed 32bit integer from 
_mm256_broadcastq_epi64^{⚠}  avx2 Broadcasts the low packed 64bit integer from 
_mm256_broadcastsd_pd^{⚠}  avx2 Broadcasts the low doubleprecision (64bit) floatingpoint element
from 
_mm256_broadcastsi128_si256^{⚠}  avx2 Broadcasts 128 bits of integer data from a to all 128bit lanes in the 256bit returned value. 
_mm256_broadcastss_ps^{⚠}  avx2 Broadcasts the low singleprecision (32bit) floatingpoint element
from 
_mm256_broadcastw_epi16^{⚠}  avx2 Broadcasts the low packed 16bit integer from a to all elements of the 256bit returned value 
_mm256_bslli_epi128^{⚠}  avx2 Shifts 128bit lanes in 
_mm256_bsrli_epi128^{⚠}  avx2 Shifts 128bit lanes in 
_mm256_castpd128_pd256^{⚠}  avx Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. 
_mm256_castpd256_pd128^{⚠}  avx Casts vector of type __m256d to type __m128d. 
_mm256_castpd_ps^{⚠}  avx Cast vector of type __m256d to type __m256. 
_mm256_castpd_si256^{⚠}  avx Casts vector of type __m256d to type __m256i. 
_mm256_castps128_ps256^{⚠}  avx Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. 
_mm256_castps256_ps128^{⚠}  avx Casts vector of type __m256 to type __m128. 
_mm256_castps_pd^{⚠}  avx Cast vector of type __m256 to type __m256d. 
_mm256_castps_si256^{⚠}  avx Casts vector of type __m256 to type __m256i. 
_mm256_castsi128_si256^{⚠}  avx Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. 
_mm256_castsi256_pd^{⚠}  avx Casts vector of type __m256i to type __m256d. 
_mm256_castsi256_ps^{⚠}  avx Casts vector of type __m256i to type __m256. 
_mm256_castsi256_si128^{⚠}  avx Casts vector of type __m256i to type __m128i. 
_mm256_ceil_pd^{⚠}  avx Rounds packed doubleprecision (64bit) floating point elements in 
_mm256_ceil_ps^{⚠}  avx Rounds packed singleprecision (32bit) floating point elements in 
_mm256_cmp_pd^{⚠}  avx Compares packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_cmp_ps^{⚠}  avx Compares packed singleprecision (32bit) floatingpoint
elements in 
_mm256_cmpeq_epi8^{⚠}  avx2 Compares packed 8bit integers in 
_mm256_cmpeq_epi16^{⚠}  avx2 Compares packed 16bit integers in 
_mm256_cmpeq_epi32^{⚠}  avx2 Compares packed 32bit integers in 
_mm256_cmpeq_epi64^{⚠}  avx2 Compares packed 64bit integers in 
_mm256_cmpgt_epi8^{⚠}  avx2 Compares packed 8bit integers in 
_mm256_cmpgt_epi16^{⚠}  avx2 Compares packed 16bit integers in 
_mm256_cmpgt_epi32^{⚠}  avx2 Compares packed 32bit integers in 
_mm256_cmpgt_epi64^{⚠}  avx2 Compares packed 64bit integers in 
_mm256_cvtepi8_epi16^{⚠}  avx2 Signextend 8bit integers to 16bit integers. 
_mm256_cvtepi8_epi32^{⚠}  avx2 Signextend 8bit integers to 32bit integers. 
_mm256_cvtepi8_epi64^{⚠}  avx2 Signextend 8bit integers to 64bit integers. 
_mm256_cvtepi16_epi32^{⚠}  avx2 Signextend 16bit integers to 32bit integers. 
_mm256_cvtepi16_epi64^{⚠}  avx2 Signextend 16bit integers to 64bit integers. 
_mm256_cvtepi32_epi64^{⚠}  avx2 Signextend 32bit integers to 64bit integers. 
_mm256_cvtepi32_pd^{⚠}  avx Converts packed 32bit integers in 
_mm256_cvtepi32_ps^{⚠}  avx Converts packed 32bit integers in 
_mm256_cvtepu8_epi16^{⚠}  avx2 Zeroextend unsigned 8bit integers in 
_mm256_cvtepu8_epi32^{⚠}  avx2 Zeroextend the lower eight unsigned 8bit integers in 
_mm256_cvtepu8_epi64^{⚠}  avx2 Zeroextend the lower four unsigned 8bit integers in 
_mm256_cvtepu16_epi32^{⚠}  avx2 Zeroes extend packed unsigned 16bit integers in 
_mm256_cvtepu16_epi64^{⚠}  avx2 Zeroextend the lower four unsigned 16bit integers in 
_mm256_cvtepu32_epi64^{⚠}  avx2 Zeroextend unsigned 32bit integers in 
_mm256_cvtpd_epi32^{⚠}  avx Converts packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvtpd_ps^{⚠}  avx Converts packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvtps_epi32^{⚠}  avx Converts packed singleprecision (32bit) floatingpoint elements in 
_mm256_cvtps_pd^{⚠}  avx Converts packed singleprecision (32bit) floatingpoint elements in 
_mm256_cvtsd_f64^{⚠}  avx2 Returns the first element of the input vector of 
_mm256_cvtsi256_si32^{⚠}  avx2 Returns the first element of the input vector of 
_mm256_cvtss_f32^{⚠}  avx Returns the first element of the input vector of 
_mm256_cvttpd_epi32^{⚠}  avx Converts packed doubleprecision (64bit) floatingpoint elements in 
_mm256_cvttps_epi32^{⚠}  avx Converts packed singleprecision (32bit) floatingpoint elements in 
_mm256_div_pd^{⚠}  avx Computes the division of each of the 4 packed 64bit floatingpoint elements
in 
_mm256_div_ps^{⚠}  avx Computes the division of each of the 8 packed 32bit floatingpoint elements
in 
_mm256_dp_ps^{⚠}  avx Conditionally multiplies the packed singleprecision (32bit) floatingpoint
elements in 
_mm256_extract_epi8^{⚠}  avx2 Extracts an 8bit integer from 
_mm256_extract_epi16^{⚠}  avx2 Extracts a 16bit integer from 
_mm256_extract_epi32^{⚠}  avx2 Extracts a 32bit integer from 
_mm256_extract_epi64^{⚠}  avx2 Extracts a 64bit integer from 
_mm256_extractf128_pd^{⚠}  avx Extracts 128 bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_extractf128_ps^{⚠}  avx Extracts 128 bits (composed of 4 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_extractf128_si256^{⚠}  avx Extracts 128 bits (composed of integer data) from 
_mm256_extracti128_si256^{⚠}  avx2 Extracts 128 bits (of integer data) from 
_mm256_floor_pd^{⚠}  avx Rounds packed doubleprecision (64bit) floating point elements in 
_mm256_floor_ps^{⚠}  avx Rounds packed singleprecision (32bit) floating point elements in 
_mm256_fmadd_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmadd_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm256_fmaddsub_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmaddsub_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm256_fmsub_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmsub_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm256_fmsubadd_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fmsubadd_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm256_fnmadd_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fnmadd_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm256_fnmsub_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm256_fnmsub_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm256_hadd_epi16^{⚠}  avx2 Horizontally adds adjacent pairs of 16bit integers in 
_mm256_hadd_epi32^{⚠}  avx2 Horizontally adds adjacent pairs of 32bit integers in 
_mm256_hadd_pd^{⚠}  avx Horizontal addition of adjacent pairs in the two packed vectors
of 4 64bit floating points 
_mm256_hadd_ps^{⚠}  avx Horizontal addition of adjacent pairs in the two packed vectors
of 8 32bit floating points 
_mm256_hadds_epi16^{⚠}  avx2 Horizontally adds adjacent pairs of 16bit integers in 
_mm256_hsub_epi16^{⚠}  avx2 Horizontally subtract adjacent pairs of 16bit integers in 
_mm256_hsub_epi32^{⚠}  avx2 Horizontally subtract adjacent pairs of 32bit integers in 
_mm256_hsub_pd^{⚠}  avx Horizontal subtraction of adjacent pairs in the two packed vectors
of 4 64bit floating points 
_mm256_hsub_ps^{⚠}  avx Horizontal subtraction of adjacent pairs in the two packed vectors
of 8 32bit floating points 
_mm256_hsubs_epi16^{⚠}  avx2 Horizontally subtract adjacent pairs of 16bit integers in 
_mm256_i32gather_epi32^{⚠}  avx2 Returns values from 
_mm256_i32gather_epi64^{⚠}  avx2 Returns values from 
_mm256_i32gather_pd^{⚠}  avx2 Returns values from 
_mm256_i32gather_ps^{⚠}  avx2 Returns values from 
_mm256_i64gather_epi32^{⚠}  avx2 Returns values from 
_mm256_i64gather_epi64^{⚠}  avx2 Returns values from 
_mm256_i64gather_pd^{⚠}  avx2 Returns values from 
_mm256_i64gather_ps^{⚠}  avx2 Returns values from 
_mm256_insert_epi8^{⚠}  avx Copies 
_mm256_insert_epi16^{⚠}  avx Copies 
_mm256_insert_epi32^{⚠}  avx Copies 
_mm256_insert_epi64^{⚠}  avx Copies 
_mm256_insertf128_pd^{⚠}  avx Copies 
_mm256_insertf128_ps^{⚠}  avx Copies 
_mm256_insertf128_si256^{⚠}  avx Copies 
_mm256_inserti128_si256^{⚠}  avx2 Copies 
_mm256_lddqu_si256^{⚠}  avx Loads 256bits of integer data from unaligned memory into result.
This intrinsic may perform better than 
_mm256_load_pd^{⚠}  avx Loads 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm256_load_ps^{⚠}  avx Loads 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm256_load_si256^{⚠}  avx Loads 256bits of integer data from memory into result.

_mm256_loadu2_m128^{⚠}  avx,sse Loads two 128bit values (composed of 4 packed singleprecision (32bit)
floatingpoint elements) from memory, and combine them into a 256bit
value.

_mm256_loadu2_m128d^{⚠}  avx,sse2 Loads two 128bit values (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory, and combine them into a 256bit
value.

_mm256_loadu2_m128i^{⚠}  avx,sse2 Loads two 128bit values (composed of integer data) from memory, and combine
them into a 256bit value.

_mm256_loadu_pd^{⚠}  avx Loads 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm256_loadu_ps^{⚠}  avx Loads 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm256_loadu_si256^{⚠}  avx Loads 256bits of integer data from memory into result.

_mm256_madd_epi16^{⚠}  avx2 Multiplies packed signed 16bit integers in 
_mm256_maddubs_epi16^{⚠}  avx2 Vertically multiplies each unsigned 8bit integer from 
_mm256_mask_i32gather_epi32^{⚠}  avx2 Returns values from 
_mm256_mask_i32gather_epi64^{⚠}  avx2 Returns values from 
_mm256_mask_i32gather_pd^{⚠}  avx2 Returns values from 
_mm256_mask_i32gather_ps^{⚠}  avx2 Returns values from 
_mm256_mask_i64gather_epi32^{⚠}  avx2 Returns values from 
_mm256_mask_i64gather_epi64^{⚠}  avx2 Returns values from 
_mm256_mask_i64gather_pd^{⚠}  avx2 Returns values from 
_mm256_mask_i64gather_ps^{⚠}  avx2 Returns values from 
_mm256_maskload_epi32^{⚠}  avx2 Loads packed 32bit integers from memory pointed by 
_mm256_maskload_epi64^{⚠}  avx2 Loads packed 64bit integers from memory pointed by 
_mm256_maskload_pd^{⚠}  avx Loads packed doubleprecision (64bit) floatingpoint elements from memory
into result using 
_mm256_maskload_ps^{⚠}  avx Loads packed singleprecision (32bit) floatingpoint elements from memory
into result using 
_mm256_maskstore_epi32^{⚠}  avx2 Stores packed 32bit integers from 
_mm256_maskstore_epi64^{⚠}  avx2 Stores packed 64bit integers from 
_mm256_maskstore_pd^{⚠}  avx Stores packed doubleprecision (64bit) floatingpoint elements from 
_mm256_maskstore_ps^{⚠}  avx Stores packed singleprecision (32bit) floatingpoint elements from 
_mm256_max_epi8^{⚠}  avx2 Compares packed 8bit integers in 
_mm256_max_epi16^{⚠}  avx2 Compares packed 16bit integers in 
_mm256_max_epi32^{⚠}  avx2 Compares packed 32bit integers in 
_mm256_max_epu8^{⚠}  avx2 Compares packed unsigned 8bit integers in 
_mm256_max_epu16^{⚠}  avx2 Compares packed unsigned 16bit integers in 
_mm256_max_epu32^{⚠}  avx2 Compares packed unsigned 32bit integers in 
_mm256_max_pd^{⚠}  avx Compares packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_max_ps^{⚠}  avx Compares packed singleprecision (32bit) floatingpoint elements in 
_mm256_min_epi8^{⚠}  avx2 Compares packed 8bit integers in 
_mm256_min_epi16^{⚠}  avx2 Compares packed 16bit integers in 
_mm256_min_epi32^{⚠}  avx2 Compares packed 32bit integers in 
_mm256_min_epu8^{⚠}  avx2 Compares packed unsigned 8bit integers in 
_mm256_min_epu16^{⚠}  avx2 Compares packed unsigned 16bit integers in 
_mm256_min_epu32^{⚠}  avx2 Compares packed unsigned 32bit integers in 
_mm256_min_pd^{⚠}  avx Compares packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_min_ps^{⚠}  avx Compares packed singleprecision (32bit) floatingpoint elements in 
_mm256_movedup_pd^{⚠}  avx Duplicate evenindexed doubleprecision (64bit) floatingpoint elements
from 
_mm256_movehdup_ps^{⚠}  avx Duplicate oddindexed singleprecision (32bit) floatingpoint elements
from 
_mm256_moveldup_ps^{⚠}  avx Duplicate evenindexed singleprecision (32bit) floatingpoint elements
from 
_mm256_movemask_epi8^{⚠}  avx2 Creates mask from the most significant bit of each 8bit element in 
_mm256_movemask_pd^{⚠}  avx Sets each bit of the returned mask based on the most significant bit of the
corresponding packed doubleprecision (64bit) floatingpoint element in

_mm256_movemask_ps^{⚠}  avx Sets each bit of the returned mask based on the most significant bit of the
corresponding packed singleprecision (32bit) floatingpoint element in

_mm256_mpsadbw_epu8^{⚠}  avx2 Computes the sum of absolute differences (SADs) of quadruplets of unsigned
8bit integers in 
_mm256_mul_epi32^{⚠}  avx2 Multiplies the low 32bit integers from each packed 64bit element in

_mm256_mul_epu32^{⚠}  avx2 Multiplies the low unsigned 32bit integers from each packed 64bit
element in 
_mm256_mul_pd^{⚠}  avx Multiplies packed doubleprecision (64bit) floatingpoint elements
in 
_mm256_mul_ps^{⚠}  avx Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm256_mulhi_epi16^{⚠}  avx2 Multiplies the packed 16bit integers in 
_mm256_mulhi_epu16^{⚠}  avx2 Multiplies the packed unsigned 16bit integers in 
_mm256_mulhrs_epi16^{⚠}  avx2 Multiplies packed 16bit integers in 
_mm256_mullo_epi16^{⚠}  avx2 Multiplies the packed 16bit integers in 
_mm256_mullo_epi32^{⚠}  avx2 Multiplies the packed 32bit integers in 
_mm256_or_pd^{⚠}  avx Computes the bitwise OR packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_or_ps^{⚠}  avx Computes the bitwise OR packed singleprecision (32bit) floatingpoint
elements in 
_mm256_or_si256^{⚠}  avx2 Computes the bitwise OR of 256 bits (representing integer data) in 
_mm256_packs_epi16^{⚠}  avx2 Converts packed 16bit integers from 
_mm256_packs_epi32^{⚠}  avx2 Converts packed 32bit integers from 
_mm256_packus_epi16^{⚠}  avx2 Converts packed 16bit integers from 
_mm256_packus_epi32^{⚠}  avx2 Converts packed 32bit integers from 
_mm256_permute2f128_pd^{⚠}  avx Shuffles 256 bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) selected by 
_mm256_permute2f128_ps^{⚠}  avx Shuffles 256 bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) selected by 
_mm256_permute2f128_si256^{⚠}  avx Shuffles 128bits (composed of integer data) selected by 
_mm256_permute2x128_si256^{⚠}  avx2 Shuffles 128bits of integer data selected by 
_mm256_permute4x64_epi64^{⚠}  avx2 Permutes 64bit integers from 
_mm256_permute4x64_pd^{⚠}  avx2 Shuffles 64bit floatingpoint elements in 
_mm256_permute_pd^{⚠}  avx Shuffles doubleprecision (64bit) floatingpoint elements in 
_mm256_permute_ps^{⚠}  avx Shuffles singleprecision (32bit) floatingpoint elements in 
_mm256_permutevar8x32_epi32^{⚠}  avx2 Permutes packed 32bit integers from 
_mm256_permutevar8x32_ps^{⚠}  avx2 Shuffles eight 32bit foatingpoint elements in 
_mm256_permutevar_pd^{⚠}  avx Shuffles doubleprecision (64bit) floatingpoint elements in 
_mm256_permutevar_ps^{⚠}  avx Shuffles singleprecision (32bit) floatingpoint elements in 
_mm256_rcp_ps^{⚠}  avx Computes the approximate reciprocal of packed singleprecision (32bit)
floatingpoint elements in 
_mm256_round_pd^{⚠}  avx Rounds packed doubleprecision (64bit) floating point elements in 
_mm256_round_ps^{⚠}  avx Rounds packed singleprecision (32bit) floating point elements in 
_mm256_rsqrt_ps^{⚠}  avx Computes the approximate reciprocal square root of packed singleprecision
(32bit) floatingpoint elements in 
_mm256_sad_epu8^{⚠}  avx2 Computes the absolute differences of packed unsigned 8bit integers in 
_mm256_set1_epi8^{⚠}  avx Broadcasts 8bit integer 
_mm256_set1_epi16^{⚠}  avx Broadcasts 16bit integer 
_mm256_set1_epi32^{⚠}  avx Broadcasts 32bit integer 
_mm256_set1_epi64x^{⚠}  avx Broadcasts 64bit integer 
_mm256_set1_pd^{⚠}  avx Broadcasts doubleprecision (64bit) floatingpoint value 
_mm256_set1_ps^{⚠}  avx Broadcasts singleprecision (32bit) floatingpoint value 
_mm256_set_epi8^{⚠}  avx Sets packed 8bit integers in returned vector with the supplied values in reverse order. 
_mm256_set_epi16^{⚠}  avx Sets packed 16bit integers in returned vector with the supplied values. 
_mm256_set_epi32^{⚠}  avx Sets packed 32bit integers in returned vector with the supplied values. 
_mm256_set_epi64x^{⚠}  avx Sets packed 64bit integers in returned vector with the supplied values. 
_mm256_set_m128^{⚠}  avx Sets packed __m256 returned vector with the supplied values. 
_mm256_set_m128d^{⚠}  avx Sets packed __m256d returned vector with the supplied values. 
_mm256_set_m128i^{⚠}  avx Sets packed __m256i returned vector with the supplied values. 
_mm256_set_pd^{⚠}  avx Sets packed doubleprecision (64bit) floatingpoint elements in returned vector with the supplied values. 
_mm256_set_ps^{⚠}  avx Sets packed singleprecision (32bit) floatingpoint elements in returned vector with the supplied values. 
_mm256_setr_epi8^{⚠}  avx Sets packed 8bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi16^{⚠}  avx Sets packed 16bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi32^{⚠}  avx Sets packed 32bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_epi64x^{⚠}  avx Sets packed 64bit integers in returned vector with the supplied values in reverse order. 
_mm256_setr_m128^{⚠}  avx Sets packed __m256 returned vector with the supplied values. 
_mm256_setr_m128d^{⚠}  avx Sets packed __m256d returned vector with the supplied values. 
_mm256_setr_m128i^{⚠}  avx Sets packed __m256i returned vector with the supplied values. 
_mm256_setr_pd^{⚠}  avx Sets packed doubleprecision (64bit) floatingpoint elements in returned vector with the supplied values in reverse order. 
_mm256_setr_ps^{⚠}  avx Sets packed singleprecision (32bit) floatingpoint elements in returned vector with the supplied values in reverse order. 
_mm256_setzero_pd^{⚠}  avx Returns vector of type __m256d with all elements set to zero. 
_mm256_setzero_ps^{⚠}  avx Returns vector of type __m256 with all elements set to zero. 
_mm256_setzero_si256^{⚠}  avx Returns vector of type __m256i with all elements set to zero. 
_mm256_shuffle_epi8^{⚠}  avx2 Shuffles bytes from 
_mm256_shuffle_epi32^{⚠}  avx2 Shuffles 32bit integers in 128bit lanes of 
_mm256_shuffle_pd^{⚠}  avx Shuffles doubleprecision (64bit) floatingpoint elements within 128bit
lanes using the control in 
_mm256_shuffle_ps^{⚠}  avx Shuffles singleprecision (32bit) floatingpoint elements in 
_mm256_shufflehi_epi16^{⚠}  avx2 Shuffles 16bit integers in the high 64 bits of 128bit lanes of 
_mm256_shufflelo_epi16^{⚠}  avx2 Shuffles 16bit integers in the low 64 bits of 128bit lanes of 
_mm256_sign_epi8^{⚠}  avx2 Negates packed 8bit integers in 
_mm256_sign_epi16^{⚠}  avx2 Negates packed 16bit integers in 
_mm256_sign_epi32^{⚠}  avx2 Negates packed 32bit integers in 
_mm256_sll_epi16^{⚠}  avx2 Shifts packed 16bit integers in 
_mm256_sll_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_sll_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm256_slli_epi16^{⚠}  avx2 Shifts packed 16bit integers in 
_mm256_slli_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_slli_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm256_slli_si256^{⚠}  avx2 Shifts 128bit lanes in 
_mm256_sllv_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_sllv_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm256_sqrt_pd^{⚠}  avx Returns the square root of packed doubleprecision (64bit) floating point
elements in 
_mm256_sqrt_ps^{⚠}  avx Returns the square root of packed singleprecision (32bit) floating point
elements in 
_mm256_sra_epi16^{⚠}  avx2 Shifts packed 16bit integers in 
_mm256_sra_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_srai_epi16^{⚠}  avx2 Shifts packed 16bit integers in 
_mm256_srai_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_srav_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_srl_epi16^{⚠}  avx2 Shifts packed 16bit integers in 
_mm256_srl_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_srl_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm256_srli_epi16^{⚠}  avx2 Shifts packed 16bit integers in 
_mm256_srli_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_srli_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm256_srli_si256^{⚠}  avx2 Shifts 128bit lanes in 
_mm256_srlv_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm256_srlv_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm256_store_pd^{⚠}  avx Stores 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_store_ps^{⚠}  avx Stores 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_store_si256^{⚠}  avx Stores 256bits of integer data from 
_mm256_storeu2_m128^{⚠}  avx,sse Stores the high and low 128bit halves (each composed of 4 packed
singleprecision (32bit) floatingpoint elements) from 
_mm256_storeu2_m128d^{⚠}  avx,sse2 Stores the high and low 128bit halves (each composed of 2 packed
doubleprecision (64bit) floatingpoint elements) from 
_mm256_storeu2_m128i^{⚠}  avx,sse2 Stores the high and low 128bit halves (each composed of integer data) from

_mm256_storeu_pd^{⚠}  avx Stores 256bits (composed of 4 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm256_storeu_ps^{⚠}  avx Stores 256bits (composed of 8 packed singleprecision (32bit)
floatingpoint elements) from 
_mm256_storeu_si256^{⚠}  avx Stores 256bits of integer data from 
_mm256_stream_pd^{⚠}  avx Moves doubleprecision values from a 256bit vector of 
_mm256_stream_ps^{⚠}  avx Moves singleprecision floating point values from a 256bit vector
of 
_mm256_stream_si256^{⚠}  avx Moves integer data from a 256bit integer vector to a 32byte aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon) 
_mm256_sub_epi8^{⚠}  avx2 Subtract packed 8bit integers in 
_mm256_sub_epi16^{⚠}  avx2 Subtract packed 16bit integers in 
_mm256_sub_epi32^{⚠}  avx2 Subtract packed 32bit integers in 
_mm256_sub_epi64^{⚠}  avx2 Subtract packed 64bit integers in 
_mm256_sub_pd^{⚠}  avx Subtracts packed doubleprecision (64bit) floatingpoint elements in 
_mm256_sub_ps^{⚠}  avx Subtracts packed singleprecision (32bit) floatingpoint elements in 
_mm256_subs_epi8^{⚠}  avx2 Subtract packed 8bit integers in 
_mm256_subs_epi16^{⚠}  avx2 Subtract packed 16bit integers in 
_mm256_subs_epu8^{⚠}  avx2 Subtract packed unsigned 8bit integers in 
_mm256_subs_epu16^{⚠}  avx2 Subtract packed unsigned 16bit integers in 
_mm256_testc_pd^{⚠}  avx Computes the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testc_ps^{⚠}  avx Computes the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testc_si256^{⚠}  avx Computes the bitwise AND of 256 bits (representing integer data) in 
_mm256_testnzc_pd^{⚠}  avx Computes the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testnzc_ps^{⚠}  avx Computes the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testnzc_si256^{⚠}  avx Computes the bitwise AND of 256 bits (representing integer data) in 
_mm256_testz_pd^{⚠}  avx Computes the bitwise AND of 256 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm256_testz_ps^{⚠}  avx Computes the bitwise AND of 256 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm256_testz_si256^{⚠}  avx Computes the bitwise AND of 256 bits (representing integer data) in 
_mm256_undefined_pd^{⚠}  avx Returns vector of type 
_mm256_undefined_ps^{⚠}  avx Returns vector of type 
_mm256_undefined_si256^{⚠}  avx Returns vector of type __m256i with undefined elements. 
_mm256_unpackhi_epi8^{⚠}  avx2 Unpacks and interleave 8bit integers from the high half of each
128bit lane in 
_mm256_unpackhi_epi16^{⚠}  avx2 Unpacks and interleave 16bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_epi32^{⚠}  avx2 Unpacks and interleave 32bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_epi64^{⚠}  avx2 Unpacks and interleave 64bit integers from the high half of each
128bit lane of 
_mm256_unpackhi_pd^{⚠}  avx Unpacks and interleave doubleprecision (64bit) floatingpoint elements
from the high half of each 128bit lane in 
_mm256_unpackhi_ps^{⚠}  avx Unpacks and interleave singleprecision (32bit) floatingpoint elements
from the high half of each 128bit lane in 
_mm256_unpacklo_epi8^{⚠}  avx2 Unpacks and interleave 8bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi16^{⚠}  avx2 Unpacks and interleave 16bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi32^{⚠}  avx2 Unpacks and interleave 32bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_epi64^{⚠}  avx2 Unpacks and interleave 64bit integers from the low half of each
128bit lane of 
_mm256_unpacklo_pd^{⚠}  avx Unpacks and interleave doubleprecision (64bit) floatingpoint elements
from the low half of each 128bit lane in 
_mm256_unpacklo_ps^{⚠}  avx Unpacks and interleave singleprecision (32bit) floatingpoint elements
from the low half of each 128bit lane in 
_mm256_xor_pd^{⚠}  avx Computes the bitwise XOR of packed doubleprecision (64bit) floatingpoint
elements in 
_mm256_xor_ps^{⚠}  avx Computes the bitwise XOR of packed singleprecision (32bit) floatingpoint
elements in 
_mm256_xor_si256^{⚠}  avx2 Computes the bitwise XOR of 256 bits (representing integer data)
in 
_mm256_zeroall^{⚠}  avx Zeroes the contents of all XMM or YMM registers. 
_mm256_zeroupper^{⚠}  avx Zeroes the upper 128 bits of all YMM registers; the lower 128bits of the registers are unmodified. 
_mm256_zextpd128_pd256^{⚠}  avx,sse2 Constructs a 256bit floatingpoint vector of 
_mm256_zextps128_ps256^{⚠}  avx,sse Constructs a 256bit floatingpoint vector of 
_mm256_zextsi128_si256^{⚠}  avx,sse2 Constructs a 256bit integer vector from a 128bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. 
_mm512_storeu_ps^{⚠}  avx512f Stores 512bits (composed of 16 packed singleprecision (32bit)
floatingpoint elements) from 
_mm_abs_epi8^{⚠}  ssse3 Computes the absolute value of packed 8bit signed integers in 
_mm_abs_epi16^{⚠}  ssse3 Computes the absolute value of each of the packed 16bit signed integers in

_mm_abs_epi32^{⚠}  ssse3 Computes the absolute value of each of the packed 32bit signed integers in

_mm_add_epi8^{⚠}  sse2 Adds packed 8bit integers in 
_mm_add_epi16^{⚠}  sse2 Adds packed 16bit integers in 
_mm_add_epi32^{⚠}  sse2 Adds packed 32bit integers in 
_mm_add_epi64^{⚠}  sse2 Adds packed 64bit integers in 
_mm_add_pd^{⚠}  sse2 Adds packed doubleprecision (64bit) floatingpoint elements in 
_mm_add_ps^{⚠}  sse Adds __m128 vectors. 
_mm_add_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_add_ss^{⚠}  sse Adds the first component of 
_mm_adds_epi8^{⚠}  sse2 Adds packed 8bit integers in 
_mm_adds_epi16^{⚠}  sse2 Adds packed 16bit integers in 
_mm_adds_epu8^{⚠}  sse2 Adds packed unsigned 8bit integers in 
_mm_adds_epu16^{⚠}  sse2 Adds packed unsigned 16bit integers in 
_mm_addsub_pd^{⚠}  sse3 Alternatively add and subtract packed doubleprecision (64bit)
floatingpoint elements in 
_mm_addsub_ps^{⚠}  sse3 Alternatively add and subtract packed singleprecision (32bit)
floatingpoint elements in 
_mm_aesdec_si128^{⚠}  aes Performs one round of an AES decryption flow on data (state) in 
_mm_aesdeclast_si128^{⚠}  aes Performs the last round of an AES decryption flow on data (state) in 
_mm_aesenc_si128^{⚠}  aes Performs one round of an AES encryption flow on data (state) in 
_mm_aesenclast_si128^{⚠}  aes Performs the last round of an AES encryption flow on data (state) in 
_mm_aesimc_si128^{⚠}  aes Performs the 
_mm_aeskeygenassist_si128^{⚠}  aes Assist in expanding the AES cipher key. 
_mm_alignr_epi8^{⚠}  ssse3 Concatenate 16byte blocks in 
_mm_and_pd^{⚠}  sse2 Computes the bitwise AND of packed doubleprecision (64bit) floatingpoint
elements in 
_mm_and_ps^{⚠}  sse Bitwise AND of packed singleprecision (32bit) floatingpoint elements. 
_mm_and_si128^{⚠}  sse2 Computes the bitwise AND of 128 bits (representing integer data) in 
_mm_andnot_pd^{⚠}  sse2 Computes the bitwise NOT of 
_mm_andnot_ps^{⚠}  sse Bitwise ANDNOT of packed singleprecision (32bit) floatingpoint elements. 
_mm_andnot_si128^{⚠}  sse2 Computes the bitwise NOT of 128 bits (representing integer data) in 
_mm_avg_epu8^{⚠}  sse2 Averages packed unsigned 8bit integers in 
_mm_avg_epu16^{⚠}  sse2 Averages packed unsigned 16bit integers in 
_mm_blend_epi16^{⚠}  sse4.1 Blend packed 16bit integers from 
_mm_blend_epi32^{⚠}  avx2 Blends packed 32bit integers from 
_mm_blend_pd^{⚠}  sse4.1 Blend packed doubleprecision (64bit) floatingpoint elements from 
_mm_blend_ps^{⚠}  sse4.1 Blend packed singleprecision (32bit) floatingpoint elements from 
_mm_blendv_epi8^{⚠}  sse4.1 Blend packed 8bit integers from 
_mm_blendv_pd^{⚠}  sse4.1 Blend packed doubleprecision (64bit) floatingpoint elements from 
_mm_blendv_ps^{⚠}  sse4.1 Blend packed singleprecision (32bit) floatingpoint elements from 
_mm_broadcast_ss^{⚠}  avx Broadcasts a singleprecision (32bit) floatingpoint element from memory to all elements of the returned vector. 
_mm_broadcastb_epi8^{⚠}  avx2 Broadcasts the low packed 8bit integer from 
_mm_broadcastd_epi32^{⚠}  avx2 Broadcasts the low packed 32bit integer from 
_mm_broadcastq_epi64^{⚠}  avx2 Broadcasts the low packed 64bit integer from 
_mm_broadcastsd_pd^{⚠}  avx2 Broadcasts the low doubleprecision (64bit) floatingpoint element
from 
_mm_broadcastss_ps^{⚠}  avx2 Broadcasts the low singleprecision (32bit) floatingpoint element
from 
_mm_broadcastw_epi16^{⚠}  avx2 Broadcasts the low packed 16bit integer from a to all elements of the 128bit returned value 
_mm_bslli_si128^{⚠}  sse2 Shifts 
_mm_bsrli_si128^{⚠}  sse2 Shifts 
_mm_castpd_ps^{⚠}  sse2 Casts a 128bit floatingpoint vector of 
_mm_castpd_si128^{⚠}  sse2 Casts a 128bit floatingpoint vector of 
_mm_castps_pd^{⚠}  sse2 Casts a 128bit floatingpoint vector of 
_mm_castps_si128^{⚠}  sse2 Casts a 128bit floatingpoint vector of 
_mm_castsi128_pd^{⚠}  sse2 Casts a 128bit integer vector into a 128bit floatingpoint vector
of 
_mm_castsi128_ps^{⚠}  sse2 Casts a 128bit integer vector into a 128bit floatingpoint vector
of 
_mm_ceil_pd^{⚠}  sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_ceil_ps^{⚠}  sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_ceil_sd^{⚠}  sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_ceil_ss^{⚠}  sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_clflush^{⚠}  sse2 Invalidates and flushes the cache line that contains 
_mm_clmulepi64_si128^{⚠}  pclmulqdq Performs a carryless multiplication of two 64bit polynomials over the finite field GF(2^k). 
_mm_cmp_pd^{⚠}  avx,sse2 Compares packed doubleprecision (64bit) floatingpoint
elements in 
_mm_cmp_ps^{⚠}  avx,sse Compares packed singleprecision (32bit) floatingpoint
elements in 
_mm_cmp_sd^{⚠}  avx,sse2 Compares the lower doubleprecision (64bit) floatingpoint element in

_mm_cmp_ss^{⚠}  avx,sse Compares the lower singleprecision (32bit) floatingpoint element in

_mm_cmpeq_epi8^{⚠}  sse2 Compares packed 8bit integers in 
_mm_cmpeq_epi16^{⚠}  sse2 Compares packed 16bit integers in 
_mm_cmpeq_epi32^{⚠}  sse2 Compares packed 32bit integers in 
_mm_cmpeq_epi64^{⚠}  sse4.1 Compares packed 64bit integers in 
_mm_cmpeq_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpeq_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpeq_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpeq_ss^{⚠}  sse Compares the lowest 
_mm_cmpestra^{⚠}  sse4.2 Compares packed strings in 
_mm_cmpestrc^{⚠}  sse4.2 Compares packed strings in 
_mm_cmpestri^{⚠}  sse4.2 Compares packed strings 
_mm_cmpestrm^{⚠}  sse4.2 Compares packed strings in 
_mm_cmpestro^{⚠}  sse4.2 Compares packed strings in 
_mm_cmpestrs^{⚠}  sse4.2 Compares packed strings in 
_mm_cmpestrz^{⚠}  sse4.2 Compares packed strings in 
_mm_cmpge_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpge_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpge_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpge_ss^{⚠}  sse Compares the lowest 
_mm_cmpgt_epi8^{⚠}  sse2 Compares packed 8bit integers in 
_mm_cmpgt_epi16^{⚠}  sse2 Compares packed 16bit integers in 
_mm_cmpgt_epi32^{⚠}  sse2 Compares packed 32bit integers in 
_mm_cmpgt_epi64^{⚠}  sse4.2 Compares packed 64bit integers in 
_mm_cmpgt_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpgt_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpgt_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpgt_ss^{⚠}  sse Compares the lowest 
_mm_cmpistra^{⚠}  sse4.2 Compares packed strings with implicit lengths in 
_mm_cmpistrc^{⚠}  sse4.2 Compares packed strings with implicit lengths in 
_mm_cmpistri^{⚠}  sse4.2 Compares packed strings with implicit lengths in 
_mm_cmpistrm^{⚠}  sse4.2 Compares packed strings with implicit lengths in 
_mm_cmpistro^{⚠}  sse4.2 Compares packed strings with implicit lengths in 
_mm_cmpistrs^{⚠}  sse4.2 Compares packed strings with implicit lengths in 
_mm_cmpistrz^{⚠}  sse4.2 Compares packed strings with implicit lengths in 
_mm_cmple_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmple_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmple_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmple_ss^{⚠}  sse Compares the lowest 
_mm_cmplt_epi8^{⚠}  sse2 Compares packed 8bit integers in 
_mm_cmplt_epi16^{⚠}  sse2 Compares packed 16bit integers in 
_mm_cmplt_epi32^{⚠}  sse2 Compares packed 32bit integers in 
_mm_cmplt_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmplt_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmplt_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmplt_ss^{⚠}  sse Compares the lowest 
_mm_cmpneq_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpneq_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpneq_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpneq_ss^{⚠}  sse Compares the lowest 
_mm_cmpnge_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpnge_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpnge_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpnge_ss^{⚠}  sse Compares the lowest 
_mm_cmpngt_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpngt_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpngt_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpngt_ss^{⚠}  sse Compares the lowest 
_mm_cmpnle_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpnle_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpnle_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpnle_ss^{⚠}  sse Compares the lowest 
_mm_cmpnlt_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpnlt_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpnlt_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpnlt_ss^{⚠}  sse Compares the lowest 
_mm_cmpord_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpord_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpord_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpord_ss^{⚠}  sse Checks if the lowest 
_mm_cmpunord_pd^{⚠}  sse2 Compares corresponding elements in 
_mm_cmpunord_ps^{⚠}  sse Compares each of the four floats in 
_mm_cmpunord_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_cmpunord_ss^{⚠}  sse Checks if the lowest 
_mm_comieq_sd^{⚠}  sse2 Compares the lower element of 
_mm_comieq_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_comige_sd^{⚠}  sse2 Compares the lower element of 
_mm_comige_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_comigt_sd^{⚠}  sse2 Compares the lower element of 
_mm_comigt_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_comile_sd^{⚠}  sse2 Compares the lower element of 
_mm_comile_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_comilt_sd^{⚠}  sse2 Compares the lower element of 
_mm_comilt_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_comineq_sd^{⚠}  sse2 Compares the lower element of 
_mm_comineq_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_crc32_u8^{⚠}  sse4.2 Starting with the initial value in 
_mm_crc32_u16^{⚠}  sse4.2 Starting with the initial value in 
_mm_crc32_u32^{⚠}  sse4.2 Starting with the initial value in 
_mm_crc32_u64^{⚠}  sse4.2 Starting with the initial value in 
_mm_cvt_si2ss^{⚠}  sse Alias for 
_mm_cvt_ss2si^{⚠}  sse Alias for 
_mm_cvtepi8_epi16^{⚠}  sse4.1 Sign extend packed 8bit integers in 
_mm_cvtepi8_epi32^{⚠}  sse4.1 Sign extend packed 8bit integers in 
_mm_cvtepi8_epi64^{⚠}  sse4.1 Sign extend packed 8bit integers in the low 8 bytes of 
_mm_cvtepi16_epi32^{⚠}  sse4.1 Sign extend packed 16bit integers in 
_mm_cvtepi16_epi64^{⚠}  sse4.1 Sign extend packed 16bit integers in 
_mm_cvtepi32_epi64^{⚠}  sse4.1 Sign extend packed 32bit integers in 
_mm_cvtepi32_pd^{⚠}  sse2 Converts the lower two packed 32bit integers in 
_mm_cvtepi32_ps^{⚠}  sse2 Converts packed 32bit integers in 
_mm_cvtepu8_epi16^{⚠}  sse4.1 Zeroes extend packed unsigned 8bit integers in 
_mm_cvtepu8_epi32^{⚠}  sse4.1 Zeroes extend packed unsigned 8bit integers in 
_mm_cvtepu8_epi64^{⚠}  sse4.1 Zeroes extend packed unsigned 8bit integers in 
_mm_cvtepu16_epi32^{⚠}  sse4.1 Zeroes extend packed unsigned 16bit integers in 
_mm_cvtepu16_epi64^{⚠}  sse4.1 Zeroes extend packed unsigned 16bit integers in 
_mm_cvtepu32_epi64^{⚠}  sse4.1 Zeroes extend packed unsigned 32bit integers in 
_mm_cvtpd_epi32^{⚠}  sse2 Converts packed doubleprecision (64bit) floatingpoint elements in 
_mm_cvtpd_ps^{⚠}  sse2 Converts packed doubleprecision (64bit) floatingpoint elements in 
_mm_cvtps_epi32^{⚠}  sse2 Converts packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtps_pd^{⚠}  sse2 Converts packed singleprecision (32bit) floatingpoint elements in 
_mm_cvtsd_f64^{⚠}  sse2 Returns the lower doubleprecision (64bit) floatingpoint element of 
_mm_cvtsd_si32^{⚠}  sse2 Converts the lower doubleprecision (64bit) floatingpoint element in a to a 32bit integer. 
_mm_cvtsd_si64^{⚠}  sse2 Converts the lower doubleprecision (64bit) floatingpoint element in a to a 64bit integer. 
_mm_cvtsd_si64x^{⚠}  sse2 Alias for 
_mm_cvtsd_ss^{⚠}  sse2 Converts the lower doubleprecision (64bit) floatingpoint element in 
_mm_cvtsi32_sd^{⚠}  sse2 Returns 
_mm_cvtsi32_si128^{⚠}  sse2 Returns a vector whose lowest element is 
_mm_cvtsi32_ss^{⚠}  sse Converts a 32 bit integer to a 32 bit float. The result vector is the input
vector 
_mm_cvtsi64_sd^{⚠}  sse2 Returns 
_mm_cvtsi64_si128^{⚠}  sse2 Returns a vector whose lowest element is 
_mm_cvtsi64_ss^{⚠}  sse Converts a 64 bit integer to a 32 bit float. The result vector is the input
vector 
_mm_cvtsi64x_sd^{⚠}  sse2 Returns 
_mm_cvtsi64x_si128^{⚠}  sse2 Returns a vector whose lowest element is 
_mm_cvtsi128_si32^{⚠}  sse2 Returns the lowest element of 
_mm_cvtsi128_si64^{⚠}  sse2 Returns the lowest element of 
_mm_cvtsi128_si64x^{⚠}  sse2 Returns the lowest element of 
_mm_cvtss_f32^{⚠}  sse Extracts the lowest 32 bit float from the input vector. 
_mm_cvtss_sd^{⚠}  sse2 Converts the lower singleprecision (32bit) floatingpoint element in 
_mm_cvtss_si32^{⚠}  sse Converts the lowest 32 bit float in the input vector to a 32 bit integer. 
_mm_cvtss_si64^{⚠}  sse Converts the lowest 32 bit float in the input vector to a 64 bit integer. 
_mm_cvtt_ss2si^{⚠}  sse Alias for 
_mm_cvttpd_epi32^{⚠}  sse2 Converts packed doubleprecision (64bit) floatingpoint elements in 
_mm_cvttps_epi32^{⚠}  sse2 Converts packed singleprecision (32bit) floatingpoint elements in 
_mm_cvttsd_si32^{⚠}  sse2 Converts the lower doubleprecision (64bit) floatingpoint element in 
_mm_cvttsd_si64^{⚠}  sse2 Converts the lower doubleprecision (64bit) floatingpoint element in 
_mm_cvttsd_si64x^{⚠}  sse2 Alias for 
_mm_cvttss_si32^{⚠}  sse Converts the lowest 32 bit float in the input vector to a 32 bit integer with truncation. 
_mm_cvttss_si64^{⚠}  sse Converts the lowest 32 bit float in the input vector to a 64 bit integer with truncation. 
_mm_div_pd^{⚠}  sse2 Divide packed doubleprecision (64bit) floatingpoint elements in 
_mm_div_ps^{⚠}  sse Divides __m128 vectors. 
_mm_div_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_div_ss^{⚠}  sse Divides the first component of 
_mm_dp_pd^{⚠}  sse4.1 Returns the dot product of two __m128d vectors. 
_mm_dp_ps^{⚠}  sse4.1 Returns the dot product of two __m128 vectors. 
_mm_extract_epi8^{⚠}  sse4.1 Extracts an 8bit integer from 
_mm_extract_epi16^{⚠}  sse2 Returns the 
_mm_extract_epi32^{⚠}  sse4.1 Extracts an 32bit integer from 
_mm_extract_epi64^{⚠}  sse4.1 Extracts an 64bit integer from 
_mm_extract_ps^{⚠}  sse4.1 Extracts a singleprecision (32bit) floatingpoint element from 
_mm_extract_si64^{⚠}  sse4a Extracts the bit range specified by 
_mm_floor_pd^{⚠}  sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_floor_ps^{⚠}  sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_floor_sd^{⚠}  sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_floor_ss^{⚠}  sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_fmadd_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmadd_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm_fmadd_sd^{⚠}  fma Multiplies the lower doubleprecision (64bit) floatingpoint elements in

_mm_fmadd_ss^{⚠}  fma Multiplies the lower singleprecision (32bit) floatingpoint elements in

_mm_fmaddsub_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmaddsub_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm_fmsub_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmsub_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm_fmsub_sd^{⚠}  fma Multiplies the lower doubleprecision (64bit) floatingpoint elements in

_mm_fmsub_ss^{⚠}  fma Multiplies the lower singleprecision (32bit) floatingpoint elements in

_mm_fmsubadd_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm_fmsubadd_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm_fnmadd_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm_fnmadd_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm_fnmadd_sd^{⚠}  fma Multiplies the lower doubleprecision (64bit) floatingpoint elements in

_mm_fnmadd_ss^{⚠}  fma Multiplies the lower singleprecision (32bit) floatingpoint elements in

_mm_fnmsub_pd^{⚠}  fma Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm_fnmsub_ps^{⚠}  fma Multiplies packed singleprecision (32bit) floatingpoint elements in 
_mm_fnmsub_sd^{⚠}  fma Multiplies the lower doubleprecision (64bit) floatingpoint elements in

_mm_fnmsub_ss^{⚠}  fma Multiplies the lower singleprecision (32bit) floatingpoint elements in

_mm_getcsr^{⚠}  sse Gets the unsigned 32bit value of the MXCSR control and status register. 
_mm_hadd_epi16^{⚠}  ssse3 Horizontally adds the adjacent pairs of values contained in 2 packed
128bit vectors of 
_mm_hadd_epi32^{⚠}  ssse3 Horizontally adds the adjacent pairs of values contained in 2 packed
128bit vectors of 
_mm_hadd_pd^{⚠}  sse3 Horizontally adds adjacent pairs of doubleprecision (64bit)
floatingpoint elements in 
_mm_hadd_ps^{⚠}  sse3 Horizontally adds adjacent pairs of singleprecision (32bit)
floatingpoint elements in 
_mm_hadds_epi16^{⚠}  ssse3 Horizontally adds the adjacent pairs of values contained in 2 packed
128bit vectors of 
_mm_hsub_epi16^{⚠}  ssse3 Horizontally subtract the adjacent pairs of values contained in 2
packed 128bit vectors of 
_mm_hsub_epi32^{⚠}  ssse3 Horizontally subtract the adjacent pairs of values contained in 2
packed 128bit vectors of 
_mm_hsub_pd^{⚠}  sse3 Horizontally subtract adjacent pairs of doubleprecision (64bit)
floatingpoint elements in 
_mm_hsub_ps^{⚠}  sse3 Horizontally adds adjacent pairs of singleprecision (32bit)
floatingpoint elements in 
_mm_hsubs_epi16^{⚠}  ssse3 Horizontally subtract the adjacent pairs of values contained in 2
packed 128bit vectors of 
_mm_i32gather_epi32^{⚠}  avx2 Returns values from 
_mm_i32gather_epi64^{⚠}  avx2 Returns values from 
_mm_i32gather_pd^{⚠}  avx2 Returns values from 
_mm_i32gather_ps^{⚠}  avx2 Returns values from 
_mm_i64gather_epi32^{⚠}  avx2 Returns values from 
_mm_i64gather_epi64^{⚠}  avx2 Returns values from 
_mm_i64gather_pd^{⚠}  avx2 Returns values from 
_mm_i64gather_ps^{⚠}  avx2 Returns values from 
_mm_insert_epi8^{⚠}  sse4.1 Returns a copy of 
_mm_insert_epi16^{⚠}  sse2 Returns a new vector where the 
_mm_insert_epi32^{⚠}  sse4.1 Returns a copy of 
_mm_insert_epi64^{⚠}  sse4.1 Returns a copy of 
_mm_insert_ps^{⚠}  sse4.1 Select a single value in 
_mm_insert_si64^{⚠}  sse4a Inserts the 
_mm_lddqu_si128^{⚠}  sse3 Loads 128bits of integer data from unaligned memory.
This intrinsic may perform better than 
_mm_lfence^{⚠}  sse2 Performs a serializing operation on all loadfrommemory instructions that were issued prior to this instruction. 
_mm_load1_pd^{⚠}  sse2 Loads a doubleprecision (64bit) floatingpoint element from memory into both elements of returned vector. 
_mm_load1_ps^{⚠}  sse Construct a 
_mm_load_pd^{⚠}  sse2 Loads 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory into the returned vector.

_mm_load_pd1^{⚠}  sse2 Loads a doubleprecision (64bit) floatingpoint element from memory into both elements of returned vector. 
_mm_load_ps^{⚠}  sse Loads four 
_mm_load_ps1^{⚠}  sse Alias for 
_mm_load_sd^{⚠}  sse2 Loads a 64bit doubleprecision value to the low element of a 128bit integer vector and clears the upper element. 
_mm_load_si128^{⚠}  sse2 Loads 128bits of integer data from memory into a new vector. 
_mm_load_ss^{⚠}  sse Construct a 
_mm_loaddup_pd^{⚠}  sse3 Loads a doubleprecision (64bit) floatingpoint element from memory into both elements of return vector. 
_mm_loadh_pd^{⚠}  sse2 Loads a doubleprecision value into the highorder bits of a 128bit
vector of 
_mm_loadl_epi64^{⚠}  sse2 Loads 64bit integer from memory into first element of returned vector. 
_mm_loadl_pd^{⚠}  sse2 Loads a doubleprecision value into the loworder bits of a 128bit
vector of 
_mm_loadr_pd^{⚠}  sse2 Loads 2 doubleprecision (64bit) floatingpoint elements from memory into
the returned vector in reverse order. 
_mm_loadr_ps^{⚠}  sse Loads four 
_mm_loadu_pd^{⚠}  sse2 Loads 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from memory into the returned vector.

_mm_loadu_ps^{⚠}  sse Loads four 
_mm_loadu_si64^{⚠}  sse Loads unaligned 64bits of integer data from memory into new vector. 
_mm_loadu_si128^{⚠}  sse2 Loads 128bits of integer data from memory into a new vector. 
_mm_madd_epi16^{⚠}  sse2 Multiplies and then horizontally add signed 16 bit integers in 
_mm_maddubs_epi16^{⚠}  ssse3 Multiplies corresponding pairs of packed 8bit unsigned integer values contained in the first source operand and packed 8bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16bit sums to the corresponding bits in the destination. 
_mm_mask_i32gather_epi32^{⚠}  avx2 Returns values from 
_mm_mask_i32gather_epi64^{⚠}  avx2 Returns values from 
_mm_mask_i32gather_pd^{⚠}  avx2 Returns values from 
_mm_mask_i32gather_ps^{⚠}  avx2 Returns values from 
_mm_mask_i64gather_epi32^{⚠}  avx2 Returns values from 
_mm_mask_i64gather_epi64^{⚠}  avx2 Returns values from 
_mm_mask_i64gather_pd^{⚠}  avx2 Returns values from 
_mm_mask_i64gather_ps^{⚠}  avx2 Returns values from 
_mm_maskload_epi32^{⚠}  avx2 Loads packed 32bit integers from memory pointed by 
_mm_maskload_epi64^{⚠}  avx2 Loads packed 64bit integers from memory pointed by 
_mm_maskload_pd^{⚠}  avx Loads packed doubleprecision (64bit) floatingpoint elements from memory
into result using 
_mm_maskload_ps^{⚠}  avx Loads packed singleprecision (32bit) floatingpoint elements from memory
into result using 
_mm_maskmoveu_si128^{⚠}  sse2 Conditionally store 8bit integer elements from 
_mm_maskstore_epi32^{⚠}  avx2 Stores packed 32bit integers from 
_mm_maskstore_epi64^{⚠}  avx2 Stores packed 64bit integers from 
_mm_maskstore_pd^{⚠}  avx Stores packed doubleprecision (64bit) floatingpoint elements from 
_mm_maskstore_ps^{⚠}  avx Stores packed singleprecision (32bit) floatingpoint elements from 
_mm_max_epi8^{⚠}  sse4.1 Compares packed 8bit integers in 
_mm_max_epi16^{⚠}  sse2 Compares packed 16bit integers in 
_mm_max_epi32^{⚠}  sse4.1 Compares packed 32bit integers in 
_mm_max_epu8^{⚠}  sse2 Compares packed unsigned 8bit integers in 
_mm_max_epu16^{⚠}  sse4.1 Compares packed unsigned 16bit integers in 
_mm_max_epu32^{⚠}  sse4.1 Compares packed unsigned 32bit integers in 
_mm_max_pd^{⚠}  sse2 Returns a new vector with the maximum values from corresponding elements in

_mm_max_ps^{⚠}  sse Compares packed singleprecision (32bit) floatingpoint elements in 
_mm_max_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_max_ss^{⚠}  sse Compares the first singleprecision (32bit) floatingpoint element of 
_mm_mfence^{⚠}  sse2 Performs a serializing operation on all loadfrommemory and storetomemory instructions that were issued prior to this instruction. 
_mm_min_epi8^{⚠}  sse4.1 Compares packed 8bit integers in 
_mm_min_epi16^{⚠}  sse2 Compares packed 16bit integers in 
_mm_min_epi32^{⚠}  sse4.1 Compares packed 32bit integers in 
_mm_min_epu8^{⚠}  sse2 Compares packed unsigned 8bit integers in 
_mm_min_epu16^{⚠}  sse4.1 Compares packed unsigned 16bit integers in 
_mm_min_epu32^{⚠}  sse4.1 Compares packed unsigned 32bit integers in 
_mm_min_pd^{⚠}  sse2 Returns a new vector with the minimum values from corresponding elements in

_mm_min_ps^{⚠}  sse Compares packed singleprecision (32bit) floatingpoint elements in 
_mm_min_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_min_ss^{⚠}  sse Compares the first singleprecision (32bit) floatingpoint element of 
_mm_minpos_epu16^{⚠}  sse4.1 Finds the minimum unsigned 16bit element in the 128bit __m128i vector, returning a vector containing its value in its first position, and its index in its second position; all other elements are set to zero. 
_mm_move_epi64^{⚠}  sse2 Returns a vector where the low element is extracted from 
_mm_move_sd^{⚠}  sse2 Constructs a 128bit floatingpoint vector of 
_mm_move_ss^{⚠}  sse Returns a 
_mm_movedup_pd^{⚠}  sse3 Duplicate the low doubleprecision (64bit) floatingpoint element
from 
_mm_movehdup_ps^{⚠}  sse3 Duplicate oddindexed singleprecision (32bit) floatingpoint elements
from 
_mm_movehl_ps^{⚠}  sse Combine higher half of 
_mm_moveldup_ps^{⚠}  sse3 Duplicate evenindexed singleprecision (32bit) floatingpoint elements
from 
_mm_movelh_ps^{⚠}  sse Combine lower half of 
_mm_movemask_epi8^{⚠}  sse2 Returns a mask of the most significant bit of each element in 
_mm_movemask_pd^{⚠}  sse2 Returns a mask of the most significant bit of each element in 
_mm_movemask_ps^{⚠}  sse Returns a mask of the most significant bit of each element in 
_mm_mpsadbw_epu8^{⚠}  sse4.1 Subtracts 8bit unsigned integer values and computes the absolute values of the differences to the corresponding bits in the destination. Then sums of the absolute differences are returned according to the bit fields in the immediate operand. 
_mm_mul_epi32^{⚠}  sse4.1 Multiplies the low 32bit integers from each packed 64bit
element in 
_mm_mul_epu32^{⚠}  sse2 Multiplies the low unsigned 32bit integers from each packed 64bit element
in 
_mm_mul_pd^{⚠}  sse2 Multiplies packed doubleprecision (64bit) floatingpoint elements in 
_mm_mul_ps^{⚠}  sse Multiplies __m128 vectors. 
_mm_mul_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_mul_ss^{⚠}  sse Multiplies the first component of 
_mm_mulhi_epi16^{⚠}  sse2 Multiplies the packed 16bit integers in 
_mm_mulhi_epu16^{⚠}  sse2 Multiplies the packed unsigned 16bit integers in 
_mm_mulhrs_epi16^{⚠}  ssse3 Multiplies packed 16bit signed integer values, truncate the 32bit
product to the 18 most significant bits by rightshifting, round the
truncated value by adding 1, and write bits 
_mm_mullo_epi16^{⚠}  sse2 Multiplies the packed 16bit integers in 
_mm_mullo_epi32^{⚠}  sse4.1 Multiplies the packed 32bit integers in 
_mm_or_pd^{⚠}  sse2 Computes the bitwise OR of 
_mm_or_ps^{⚠}  sse Bitwise OR of packed singleprecision (32bit) floatingpoint elements. 
_mm_or_si128^{⚠}  sse2 Computes the bitwise OR of 128 bits (representing integer data) in 
_mm_packs_epi16^{⚠}  sse2 Converts packed 16bit integers from 
_mm_packs_epi32^{⚠}  sse2 Converts packed 32bit integers from 
_mm_packus_epi16^{⚠}  sse2 Converts packed 16bit integers from 
_mm_packus_epi32^{⚠}  sse4.1 Converts packed 32bit integers from 
_mm_pause^{⚠}  Provides a hint to the processor that the code sequence is a spinwait loop. 
_mm_permute_pd^{⚠}  avx,sse2 Shuffles doubleprecision (64bit) floatingpoint elements in 
_mm_permute_ps^{⚠}  avx,sse Shuffles singleprecision (32bit) floatingpoint elements in 
_mm_permutevar_pd^{⚠}  avx Shuffles doubleprecision (64bit) floatingpoint elements in 
_mm_permutevar_ps^{⚠}  avx Shuffles singleprecision (32bit) floatingpoint elements in 
_mm_prefetch^{⚠}  sse Fetch the cache line that contains address 
_mm_rcp_ps^{⚠}  sse Returns the approximate reciprocal of packed singleprecision (32bit)
floatingpoint elements in 
_mm_rcp_ss^{⚠}  sse Returns the approximate reciprocal of the first singleprecision
(32bit) floatingpoint element in 
_mm_round_pd^{⚠}  sse4.1 Round the packed doubleprecision (64bit) floatingpoint elements in 
_mm_round_ps^{⚠}  sse4.1 Round the packed singleprecision (32bit) floatingpoint elements in 
_mm_round_sd^{⚠}  sse4.1 Round the lower doubleprecision (64bit) floatingpoint element in 
_mm_round_ss^{⚠}  sse4.1 Round the lower singleprecision (32bit) floatingpoint element in 
_mm_rsqrt_ps^{⚠}  sse Returns the approximate reciprocal square root of packed singleprecision
(32bit) floatingpoint elements in 
_mm_rsqrt_ss^{⚠}  sse Returns the approximate reciprocal square root of the fist singleprecision
(32bit) floatingpoint elements in 
_mm_sad_epu8^{⚠}  sse2 Sum the absolute differences of packed unsigned 8bit integers. 
_mm_set1_epi8^{⚠}  sse2 Broadcasts 8bit integer 
_mm_set1_epi16^{⚠}  sse2 Broadcasts 16bit integer 
_mm_set1_epi32^{⚠}  sse2 Broadcasts 32bit integer 
_mm_set1_epi64x^{⚠}  sse2 Broadcasts 64bit integer 
_mm_set1_pd^{⚠}  sse2 Broadcasts doubleprecision (64bit) floatingpoint value a to all elements of the return value. 
_mm_set1_ps^{⚠}  sse Construct a 
_mm_set_epi8^{⚠}  sse2 Sets packed 8bit integers with the supplied values. 
_mm_set_epi16^{⚠}  sse2 Sets packed 16bit integers with the supplied values. 
_mm_set_epi32^{⚠}  sse2 Sets packed 32bit integers with the supplied values. 
_mm_set_epi64x^{⚠}  sse2 Sets packed 64bit integers with the supplied values, from highest to lowest. 
_mm_set_pd^{⚠}  sse2 Sets packed doubleprecision (64bit) floatingpoint elements in the return value with the supplied values. 
_mm_set_pd1^{⚠}  sse2 Broadcasts doubleprecision (64bit) floatingpoint value a to all elements of the return value. 
_mm_set_ps^{⚠}  sse Construct a 
_mm_set_ps1^{⚠}  sse Alias for 
_mm_set_sd^{⚠}  sse2 Copies doubleprecision (64bit) floatingpoint element 
_mm_set_ss^{⚠}  sse Construct a 
_mm_setcsr^{⚠}  sse Sets the MXCSR register with the 32bit unsigned integer value. 
_mm_setr_epi8^{⚠}  sse2 Sets packed 8bit integers with the supplied values in reverse order. 
_mm_setr_epi16^{⚠}  sse2 Sets packed 16bit integers with the supplied values in reverse order. 
_mm_setr_epi32^{⚠}  sse2 Sets packed 32bit integers with the supplied values in reverse order. 
_mm_setr_pd^{⚠}  sse2 Sets packed doubleprecision (64bit) floatingpoint elements in the return value with the supplied values in reverse order. 
_mm_setr_ps^{⚠}  sse Construct a 
_mm_setzero_pd^{⚠}  sse2 Returns packed doubleprecision (64bit) floatingpoint elements with all zeros. 
_mm_setzero_ps^{⚠}  sse Construct a 
_mm_setzero_si128^{⚠}  sse2 Returns a vector with all elements set to zero. 
_mm_sfence^{⚠}  sse Performs a serializing operation on all storetomemory instructions that were issued prior to this instruction. 
_mm_sha1msg1_epu32^{⚠}  sha Performs an intermediate calculation for the next four SHA1 message values
(unsigned 32bit integers) using previous message values from 
_mm_sha1msg2_epu32^{⚠}  sha Performs the final calculation for the next four SHA1 message values
(unsigned 32bit integers) using the intermediate result in 
_mm_sha1nexte_epu32^{⚠}  sha Calculate SHA1 state variable E after four rounds of operation from the
current SHA1 state variable 
_mm_sha1rnds4_epu32^{⚠}  sha Performs four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D)
from 
_mm_sha256msg1_epu32^{⚠}  sha Performs an intermediate calculation for the next four SHA256 message values
(unsigned 32bit integers) using previous message values from 
_mm_sha256msg2_epu32^{⚠}  sha Performs the final calculation for the next four SHA256 message values
(unsigned 32bit integers) using previous message values from 
_mm_sha256rnds2_epu32^{⚠}  sha Performs 2 rounds of SHA256 operation using an initial SHA256 state
(C,D,G,H) from 
_mm_shuffle_epi8^{⚠}  ssse3 Shuffles bytes from 
_mm_shuffle_epi32^{⚠}  sse2 Shuffles 32bit integers in 
_mm_shuffle_pd^{⚠}  sse2 Constructs a 128bit floatingpoint vector of 
_mm_shuffle_ps^{⚠}  sse Shuffles packed singleprecision (32bit) floatingpoint elements in 
_mm_shufflehi_epi16^{⚠}  sse2 Shuffles 16bit integers in the high 64 bits of 
_mm_shufflelo_epi16^{⚠}  sse2 Shuffles 16bit integers in the low 64 bits of 
_mm_sign_epi8^{⚠}  ssse3 Negates packed 8bit integers in 
_mm_sign_epi16^{⚠}  ssse3 Negates packed 16bit integers in 
_mm_sign_epi32^{⚠}  ssse3 Negates packed 32bit integers in 
_mm_sll_epi16^{⚠}  sse2 Shifts packed 16bit integers in 
_mm_sll_epi32^{⚠}  sse2 Shifts packed 32bit integers in 
_mm_sll_epi64^{⚠}  sse2 Shifts packed 64bit integers in 
_mm_slli_epi16^{⚠}  sse2 Shifts packed 16bit integers in 
_mm_slli_epi32^{⚠}  sse2 Shifts packed 32bit integers in 
_mm_slli_epi64^{⚠}  sse2 Shifts packed 64bit integers in 
_mm_slli_si128^{⚠}  sse2 Shifts 
_mm_sllv_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm_sllv_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm_sqrt_pd^{⚠}  sse2 Returns a new vector with the square root of each of the values in 
_mm_sqrt_ps^{⚠}  sse Returns the square root of packed singleprecision (32bit) floatingpoint
elements in 
_mm_sqrt_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_sqrt_ss^{⚠}  sse Returns the square root of the first singleprecision (32bit)
floatingpoint element in 
_mm_sra_epi16^{⚠}  sse2 Shifts packed 16bit integers in 
_mm_sra_epi32^{⚠}  sse2 Shifts packed 32bit integers in 
_mm_srai_epi16^{⚠}  sse2 Shifts packed 16bit integers in 
_mm_srai_epi32^{⚠}  sse2 Shifts packed 32bit integers in 
_mm_srav_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm_srl_epi16^{⚠}  sse2 Shifts packed 16bit integers in 
_mm_srl_epi32^{⚠}  sse2 Shifts packed 32bit integers in 
_mm_srl_epi64^{⚠}  sse2 Shifts packed 64bit integers in 
_mm_srli_epi16^{⚠}  sse2 Shifts packed 16bit integers in 
_mm_srli_epi32^{⚠}  sse2 Shifts packed 32bit integers in 
_mm_srli_epi64^{⚠}  sse2 Shifts packed 64bit integers in 
_mm_srli_si128^{⚠}  sse2 Shifts 
_mm_srlv_epi32^{⚠}  avx2 Shifts packed 32bit integers in 
_mm_srlv_epi64^{⚠}  avx2 Shifts packed 64bit integers in 
_mm_store1_pd^{⚠}  sse2 Stores the lower doubleprecision (64bit) floatingpoint element from 
_mm_store1_ps^{⚠}  sse Stores the lowest 32 bit float of 
_mm_store_pd^{⚠}  sse2 Stores 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm_store_pd1^{⚠}  sse2 Stores the lower doubleprecision (64bit) floatingpoint element from 
_mm_store_ps^{⚠}  sse Stores four 32bit floats into aligned memory. 
_mm_store_ps1^{⚠}  sse Alias for 
_mm_store_sd^{⚠}  sse2 Stores the lower 64 bits of a 128bit vector of 
_mm_store_si128^{⚠}  sse2 Stores 128bits of integer data from 
_mm_store_ss^{⚠}  sse Stores the lowest 32 bit float of 
_mm_storeh_pd^{⚠}  sse2 Stores the upper 64 bits of a 128bit vector of 
_mm_storel_epi64^{⚠}  sse2 Stores the lower 64bit integer 
_mm_storel_pd^{⚠}  sse2 Stores the lower 64 bits of a 128bit vector of 
_mm_storer_pd^{⚠}  sse2 Stores 2 doubleprecision (64bit) floatingpoint elements from 
_mm_storer_ps^{⚠}  sse Stores four 32bit floats into aligned memory in reverse order. 
_mm_storeu_pd^{⚠}  sse2 Stores 128bits (composed of 2 packed doubleprecision (64bit)
floatingpoint elements) from 
_mm_storeu_ps^{⚠}  sse Stores four 32bit floats into memory. There are no restrictions on memory
alignment. For aligned memory 
_mm_storeu_si128^{⚠}  sse2 Stores 128bits of integer data from 
_mm_stream_pd^{⚠}  sse2 Stores a 128bit floating point vector of 
_mm_stream_ps^{⚠}  sse Stores 
_mm_stream_sd^{⚠}  sse4a Nontemporal store of 
_mm_stream_si32^{⚠}  sse2 Stores a 32bit integer value in the specified memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_si64^{⚠}  sse2 Stores a 64bit integer value in the specified memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_si128^{⚠}  sse2 Stores a 128bit integer vector to a 128bit aligned memory location. To minimize caching, the data is flagged as nontemporal (unlikely to be used again soon). 
_mm_stream_ss^{⚠}  sse4a Nontemporal store of 
_mm_sub_epi8^{⚠}  sse2 Subtracts packed 8bit integers in 
_mm_sub_epi16^{⚠}  sse2 Subtracts packed 16bit integers in 
_mm_sub_epi32^{⚠}  sse2 Subtract packed 32bit integers in 
_mm_sub_epi64^{⚠}  sse2 Subtract packed 64bit integers in 
_mm_sub_pd^{⚠}  sse2 Subtract packed doubleprecision (64bit) floatingpoint elements in 
_mm_sub_ps^{⚠}  sse Subtracts __m128 vectors. 
_mm_sub_sd^{⚠}  sse2 Returns a new vector with the low element of 
_mm_sub_ss^{⚠}  sse Subtracts the first component of 
_mm_subs_epi8^{⚠}  sse2 Subtract packed 8bit integers in 
_mm_subs_epi16^{⚠}  sse2 Subtract packed 16bit integers in 
_mm_subs_epu8^{⚠}  sse2 Subtract packed unsigned 8bit integers in 
_mm_subs_epu16^{⚠}  sse2 Subtract packed unsigned 16bit integers in 
_mm_test_all_ones^{⚠}  sse4.1 Tests whether the specified bits in 
_mm_test_all_zeros^{⚠}  sse4.1 Tests whether the specified bits in a 128bit integer vector are all zeros. 
_mm_test_mix_ones_zeros^{⚠}  sse4.1 Tests whether the specified bits in a 128bit integer vector are neither all zeros nor all ones. 
_mm_testc_pd^{⚠}  avx Computes the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testc_ps^{⚠}  avx Computes the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testc_si128^{⚠}  sse4.1 Tests whether the specified bits in a 128bit integer vector are all ones. 
_mm_testnzc_pd^{⚠}  avx Computes the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testnzc_ps^{⚠}  avx Computes the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testnzc_si128^{⚠}  sse4.1 Tests whether the specified bits in a 128bit integer vector are neither all zeros nor all ones. 
_mm_testz_pd^{⚠}  avx Computes the bitwise AND of 128 bits (representing doubleprecision (64bit)
floatingpoint elements) in 
_mm_testz_ps^{⚠}  avx Computes the bitwise AND of 128 bits (representing singleprecision (32bit)
floatingpoint elements) in 
_mm_testz_si128^{⚠}  sse4.1 Tests whether the specified bits in a 128bit integer vector are all zeros. 
_mm_tzcnt_32^{⚠}  bmi1 Counts the number of trailing least significant zero bits. 
_mm_tzcnt_64^{⚠}  bmi1 Counts the number of trailing least significant zero bits. 
_mm_ucomieq_sd^{⚠}  sse2 Compares the lower element of 
_mm_ucomieq_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_ucomige_sd^{⚠}  sse2 Compares the lower element of 
_mm_ucomige_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_ucomigt_sd^{⚠}  sse2 Compares the lower element of 
_mm_ucomigt_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_ucomile_sd^{⚠}  sse2 Compares the lower element of 
_mm_ucomile_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_ucomilt_sd^{⚠}  sse2 Compares the lower element of 
_mm_ucomilt_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_ucomineq_sd^{⚠}  sse2 Compares the lower element of 
_mm_ucomineq_ss^{⚠}  sse Compares two 32bit floats from the loworder bits of 
_mm_undefined_pd^{⚠}  sse2 Returns vector of type __m128d with undefined elements. 
_mm_undefined_ps^{⚠}  sse Returns vector of type __m128 with undefined elements. 
_mm_undefined_si128^{⚠}  sse2 Returns vector of type __m128i with undefined elements. 
_mm_unpackhi_epi8^{⚠}  sse2 Unpacks and interleave 8bit integers from the high half of 
_mm_unpackhi_epi16^{⚠}  sse2 Unpacks and interleave 16bit integers from the high half of 
_mm_unpackhi_epi32^{⚠} 