[][src]Module core::arch::arm

🔬 This is a nightly-only experimental API. (stdsimd #27731)
This is supported on ARM only.

Platform-specific intrinsics for the arm platform.

See the module documentation for more details.

Structs

APSRExperimentalARM

Application Program Status Register

SYExperimentalARM

Full system is the required shareability domain, reads and writes are the required access types

float32x2_tExperimentalARM

ARM-specific 64-bit wide vector of two packed f32.

float32x4_tExperimentalARM

ARM-specific 128-bit wide vector of four packed f32.

int16x4_tExperimentalARM

ARM-specific 64-bit wide vector of four packed i16.

int16x8_tExperimentalARM

ARM-specific 128-bit wide vector of eight packed i16.

int32x2_tExperimentalARM

ARM-specific 64-bit wide vector of two packed i32.

int32x4_tExperimentalARM

ARM-specific 128-bit wide vector of four packed i32.

int64x1_tExperimentalARM

ARM-specific 64-bit wide vector of one packed i64.

int64x2_tExperimentalARM

ARM-specific 128-bit wide vector of two packed i64.

int8x8_tExperimentalARM

ARM-specific 64-bit wide vector of eight packed i8.

int8x16_tExperimentalARM

ARM-specific 128-bit wide vector of sixteen packed i8.

int8x8x2_tExperimentalARM

ARM-specific type containing two int8x8_t vectors.

int8x8x3_tExperimentalARM

ARM-specific type containing three int8x8_t vectors.

int8x8x4_tExperimentalARM

ARM-specific type containing four int8x8_t vectors.

poly16x4_tExperimentalARM

ARM-specific 64-bit wide vector of four packed u16.

poly16x8_tExperimentalARM

ARM-specific 128-bit wide vector of eight packed u16.

poly8x8_tExperimentalARM

ARM-specific 64-bit wide polynomial vector of eight packed u8.

poly8x16_tExperimentalARM

ARM-specific 128-bit wide vector of sixteen packed u8.

poly8x8x2_tExperimentalARM

ARM-specific type containing two poly8x8_t vectors.

poly8x8x3_tExperimentalARM

ARM-specific type containing three poly8x8_t vectors.

poly8x8x4_tExperimentalARM

ARM-specific type containing four poly8x8_t vectors.

uint16x4_tExperimentalARM

ARM-specific 64-bit wide vector of four packed u16.

uint16x8_tExperimentalARM

ARM-specific 128-bit wide vector of eight packed u16.

uint32x2_tExperimentalARM

ARM-specific 64-bit wide vector of two packed u32.

uint32x4_tExperimentalARM

ARM-specific 128-bit wide vector of four packed u32.

uint64x1_tExperimentalARM

ARM-specific 64-bit wide vector of one packed u64.

uint64x2_tExperimentalARM

ARM-specific 128-bit wide vector of two packed u64.

uint8x8_tExperimentalARM

ARM-specific 64-bit wide vector of eight packed u8.

uint8x16_tExperimentalARM

ARM-specific 128-bit wide vector of sixteen packed u8.

uint8x8x2_tExperimentalARM

ARM-specific type containing two uint8x8_t vectors.

uint8x8x3_tExperimentalARM

ARM-specific type containing three uint8x8_t vectors.

uint8x8x4_tExperimentalARM

ARM-specific type containing four uint8x8_t vectors.

Functions

__breakpointExperimentalARM

Inserts a breakpoint instruction.

__dmbExperimentalARM

Generates a DMB (data memory barrier) instruction or equivalent CP15 instruction.

__dsbExperimentalARM

Generates a DSB (data synchronization barrier) instruction or equivalent CP15 instruction.

__isbExperimentalARM

Generates an ISB (instruction synchronization barrier) instruction or equivalent CP15 instruction.

__nopExperimentalARM

Generates an unspecified no-op instruction.

__rsrExperimentalARM

Reads a 32-bit system register

__rsrpExperimentalARM

Reads a system register containing an address

__wsrExperimentalARM

Writes a 32-bit system register

__wsrpExperimentalARM

Writes a system register containing an address

_rev_u16ExperimentalARM

Reverse the order of the bytes.

_rev_u32ExperimentalARM

Reverse the order of the bytes.

vadd_f32ExperimentalARM and neon

Vector add.

vadd_s8ExperimentalARM and neon

Vector add.

vadd_s16ExperimentalARM and neon

Vector add.

vadd_s32ExperimentalARM and neon

Vector add.

vadd_u8ExperimentalARM and neon

Vector add.

vadd_u16ExperimentalARM and neon

Vector add.

vadd_u32ExperimentalARM and neon

Vector add.

vaddl_s8ExperimentalARM and neon

Vector long add.

vaddl_s16ExperimentalARM and neon

Vector long add.

vaddl_s32ExperimentalARM and neon

Vector long add.

vaddl_u8ExperimentalARM and neon

Vector long add.

vaddl_u16ExperimentalARM and neon

Vector long add.

vaddl_u32ExperimentalARM and neon

Vector long add.

vaddq_f32ExperimentalARM and neon

Vector add.

vaddq_s8ExperimentalARM and neon

Vector add.

vaddq_s16ExperimentalARM and neon

Vector add.

vaddq_s32ExperimentalARM and neon

Vector add.

vaddq_s64ExperimentalARM and neon

Vector add.

vaddq_u8ExperimentalARM and neon

Vector add.

vaddq_u16ExperimentalARM and neon

Vector add.

vaddq_u32ExperimentalARM and neon

Vector add.

vaddq_u64ExperimentalARM and neon

Vector add.

vand_s8ExperimentalARM and neon

Vector bitwise and

vand_s16ExperimentalARM and neon

Vector bitwise and

vand_s32ExperimentalARM and neon

Vector bitwise and

vand_s64ExperimentalARM and neon

Vector bitwise and

vand_u8ExperimentalARM and neon

Vector bitwise and

vand_u16ExperimentalARM and neon

Vector bitwise and

vand_u32ExperimentalARM and neon

Vector bitwise and

vand_u64ExperimentalARM and neon

Vector bitwise and

vandq_s8ExperimentalARM and neon

Vector bitwise and

vandq_s16ExperimentalARM and neon

Vector bitwise and

vandq_s32ExperimentalARM and neon

Vector bitwise and

vandq_s64ExperimentalARM and neon

Vector bitwise and

vandq_u8ExperimentalARM and neon

Vector bitwise and

vandq_u16ExperimentalARM and neon

Vector bitwise and

vandq_u32ExperimentalARM and neon

Vector bitwise and

vandq_u64ExperimentalARM and neon

Vector bitwise and

vceq_f32ExperimentalARM and neon

Floating-point compare equal

vceq_s8ExperimentalARM and neon

Compare bitwise Equal (vector)

vceq_s16ExperimentalARM and neon

Compare bitwise Equal (vector)

vceq_s32ExperimentalARM and neon

Compare bitwise Equal (vector)

vceq_u8ExperimentalARM and neon

Compare bitwise Equal (vector)

vceq_u16ExperimentalARM and neon

Compare bitwise Equal (vector)

vceq_u32ExperimentalARM and neon

Compare bitwise Equal (vector)

vceqq_f32ExperimentalARM and neon

Floating-point compare equal

vceqq_s8ExperimentalARM and neon

Compare bitwise Equal (vector)

vceqq_s16ExperimentalARM and neon

Compare bitwise Equal (vector)

vceqq_s32ExperimentalARM and neon

Compare bitwise Equal (vector)

vceqq_u8ExperimentalARM and neon

Compare bitwise Equal (vector)

vceqq_u16ExperimentalARM and neon

Compare bitwise Equal (vector)

vceqq_u32ExperimentalARM and neon

Compare bitwise Equal (vector)

vcge_f32ExperimentalARM and neon

Floating-point compare greater than or equal

vcge_s8ExperimentalARM and neon

Compare signed greater than or equal

vcge_s16ExperimentalARM and neon

Compare signed greater than or equal

vcge_s32ExperimentalARM and neon

Compare signed greater than or equal

vcge_u8ExperimentalARM and neon

Compare unsigned greater than or equal

vcge_u16ExperimentalARM and neon

Compare unsigned greater than or equal

vcge_u32ExperimentalARM and neon

Compare unsigned greater than or equal

vcgeq_f32ExperimentalARM and neon

Floating-point compare greater than or equal

vcgeq_s8ExperimentalARM and neon

Compare signed greater than or equal

vcgeq_s16ExperimentalARM and neon

Compare signed greater than or equal

vcgeq_s32ExperimentalARM and neon

Compare signed greater than or equal

vcgeq_u8ExperimentalARM and neon

Compare unsigned greater than or equal

vcgeq_u16ExperimentalARM and neon

Compare unsigned greater than or equal

vcgeq_u32ExperimentalARM and neon

Compare unsigned greater than or equal

vcgt_f32ExperimentalARM and neon

Floating-point compare greater than

vcgt_s8ExperimentalARM and neon

Compare signed greater than

vcgt_s16ExperimentalARM and neon

Compare signed greater than

vcgt_s32ExperimentalARM and neon

Compare signed greater than

vcgt_u8ExperimentalARM and neon

Compare unsigned highe

vcgt_u16ExperimentalARM and neon

Compare unsigned highe

vcgt_u32ExperimentalARM and neon

Compare unsigned highe

vcgtq_f32ExperimentalARM and neon

Floating-point compare greater than

vcgtq_s8ExperimentalARM and neon

Compare signed greater than

vcgtq_s16ExperimentalARM and neon

Compare signed greater than

vcgtq_s32ExperimentalARM and neon

Compare signed greater than

vcgtq_u8ExperimentalARM and neon

Compare unsigned highe

vcgtq_u16ExperimentalARM and neon

Compare unsigned highe

vcgtq_u32ExperimentalARM and neon

Compare unsigned highe

vcle_f32ExperimentalARM and neon

Floating-point compare less than or equal

vcle_s8ExperimentalARM and neon

Compare signed less than or equal

vcle_s16ExperimentalARM and neon

Compare signed less than or equal

vcle_s32ExperimentalARM and neon

Compare signed less than or equal

vcle_u8ExperimentalARM and neon

Compare unsigned less than or equal

vcle_u16ExperimentalARM and neon

Compare unsigned less than or equal

vcle_u32ExperimentalARM and neon

Compare unsigned less than or equal

vcleq_f32ExperimentalARM and neon

Floating-point compare less than or equal

vcleq_s8ExperimentalARM and neon

Compare signed less than or equal

vcleq_s16ExperimentalARM and neon

Compare signed less than or equal

vcleq_s32ExperimentalARM and neon

Compare signed less than or equal

vcleq_u8ExperimentalARM and neon

Compare unsigned less than or equal

vcleq_u16ExperimentalARM and neon

Compare unsigned less than or equal

vcleq_u32ExperimentalARM and neon

Compare unsigned less than or equal

vclt_f32ExperimentalARM and neon

Floating-point compare less than

vclt_s8ExperimentalARM and neon

Compare signed less than

vclt_s16ExperimentalARM and neon

Compare signed less than

vclt_s32ExperimentalARM and neon

Compare signed less than

vclt_u8ExperimentalARM and neon

Compare unsigned less than

vclt_u16ExperimentalARM and neon

Compare unsigned less than

vclt_u32ExperimentalARM and neon

Compare unsigned less than

vcltq_f32ExperimentalARM and neon

Floating-point compare less than

vcltq_s8ExperimentalARM and neon

Compare signed less than

vcltq_s16ExperimentalARM and neon

Compare signed less than

vcltq_s32ExperimentalARM and neon

Compare signed less than

vcltq_u8ExperimentalARM and neon

Compare unsigned less than

vcltq_u16ExperimentalARM and neon

Compare unsigned less than

vcltq_u32ExperimentalARM and neon

Compare unsigned less than

vdupq_n_s8ExperimentalARM and neon

Duplicate vector element to vector or scalar

vdupq_n_u8ExperimentalARM and neon

Duplicate vector element to vector or scalar

veor_s8ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veor_s16ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veor_s32ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veor_s64ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veor_u8ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veor_u16ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veor_u32ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veor_u64ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_s8ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_s16ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_s32ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_s64ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_u8ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_u16ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_u32ExperimentalARM and neon

Vector bitwise exclusive or (vector)

veorq_u64ExperimentalARM and neon

Vector bitwise exclusive or (vector)

vextq_s8ExperimentalARM and neon

Extract vector from pair of vectors

vextq_u8ExperimentalARM and neon

Extract vector from pair of vectors

vget_lane_u8ExperimentalARM and neon

Move vector element to general-purpose register

vget_lane_u64ExperimentalARM and neon

Move vector element to general-purpose register

vgetq_lane_u16ExperimentalARM and neon

Move vector element to general-purpose register

vgetq_lane_u32ExperimentalARM and neon

Move vector element to general-purpose register

vgetq_lane_u64ExperimentalARM and neon

Move vector element to general-purpose register

vhadd_s8ExperimentalARM and neon

Halving add

vhadd_s16ExperimentalARM and neon

Halving add

vhadd_s32ExperimentalARM and neon

Halving add

vhadd_u8ExperimentalARM and neon

Halving add

vhadd_u16ExperimentalARM and neon

Halving add

vhadd_u32ExperimentalARM and neon

Halving add

vhaddq_s8ExperimentalARM and neon

Halving add

vhaddq_s16ExperimentalARM and neon

Halving add

vhaddq_s32ExperimentalARM and neon

Halving add

vhaddq_u8ExperimentalARM and neon

Halving add

vhaddq_u16ExperimentalARM and neon

Halving add

vhaddq_u32ExperimentalARM and neon

Halving add

vhsub_s8ExperimentalARM and neon

Signed halving subtract

vhsub_s16ExperimentalARM and neon

Signed halving subtract

vhsub_s32ExperimentalARM and neon

Signed halving subtract

vhsub_u8ExperimentalARM and neon

Signed halving subtract

vhsub_u16ExperimentalARM and neon

Signed halving subtract

vhsub_u32ExperimentalARM and neon

Signed halving subtract

vhsubq_s8ExperimentalARM and neon

Signed halving subtract

vhsubq_s16ExperimentalARM and neon

Signed halving subtract

vhsubq_s32ExperimentalARM and neon

Signed halving subtract

vhsubq_u8ExperimentalARM and neon

Signed halving subtract

vhsubq_u16ExperimentalARM and neon

Signed halving subtract

vhsubq_u32ExperimentalARM and neon

Signed halving subtract

vld1q_s8ExperimentalARM and neon

Load multiple single-element structures to one, two, three, or four registers

vld1q_u8ExperimentalARM and neon

Load multiple single-element structures to one, two, three, or four registers

vmovl_s8ExperimentalARM and neon

Vector long move.

vmovl_s16ExperimentalARM and neon

Vector long move.

vmovl_s32ExperimentalARM and neon

Vector long move.

vmovl_u8ExperimentalARM and neon

Vector long move.

vmovl_u16ExperimentalARM and neon

Vector long move.

vmovl_u32ExperimentalARM and neon

Vector long move.

vmovn_s16ExperimentalARM and neon

Vector narrow integer.

vmovn_s32ExperimentalARM and neon

Vector narrow integer.

vmovn_s64ExperimentalARM and neon

Vector narrow integer.

vmovn_u16ExperimentalARM and neon

Vector narrow integer.

vmovn_u32ExperimentalARM and neon

Vector narrow integer.

vmovn_u64ExperimentalARM and neon

Vector narrow integer.

vmovq_n_u8ExperimentalARM and neon

Duplicate vector element to vector or scalar

vmul_f32ExperimentalARM and neon

Multiply

vmul_s8ExperimentalARM and neon

Multiply

vmul_s16ExperimentalARM and neon

Multiply

vmul_s32ExperimentalARM and neon

Multiply

vmul_u8ExperimentalARM and neon

Multiply

vmul_u16ExperimentalARM and neon

Multiply

vmul_u32ExperimentalARM and neon

Multiply

vmulq_f32ExperimentalARM and neon

Multiply

vmulq_s8ExperimentalARM and neon

Multiply

vmulq_s16ExperimentalARM and neon

Multiply

vmulq_s32ExperimentalARM and neon

Multiply

vmulq_u8ExperimentalARM and neon

Multiply

vmulq_u16ExperimentalARM and neon

Multiply

vmulq_u32ExperimentalARM and neon

Multiply

vmvn_p8ExperimentalARM and neon

Vector bitwise not.

vmvn_s8ExperimentalARM and neon

Vector bitwise not.

vmvn_s16ExperimentalARM and neon

Vector bitwise not.

vmvn_s32ExperimentalARM and neon

Vector bitwise not.

vmvn_u8ExperimentalARM and neon

Vector bitwise not.

vmvn_u16ExperimentalARM and neon

Vector bitwise not.

vmvn_u32ExperimentalARM and neon

Vector bitwise not.

vmvnq_p8ExperimentalARM and neon

Vector bitwise not.

vmvnq_s8ExperimentalARM and neon

Vector bitwise not.

vmvnq_s16ExperimentalARM and neon

Vector bitwise not.

vmvnq_s32ExperimentalARM and neon

Vector bitwise not.

vmvnq_u8ExperimentalARM and neon

Vector bitwise not.

vmvnq_u16ExperimentalARM and neon

Vector bitwise not.

vmvnq_u32ExperimentalARM and neon

Vector bitwise not.

vorr_s8ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorr_s16ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorr_s32ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorr_s64ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorr_u8ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorr_u16ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorr_u32ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorr_u64ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_s8ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_s16ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_s32ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_s64ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_u8ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_u16ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_u32ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vorrq_u64ExperimentalARM and neon

Vector bitwise or (immediate, inclusive)

vpmax_f32ExperimentalARM and neon

Folding maximum of adjacent pairs

vpmax_s8ExperimentalARM and neon

Folding maximum of adjacent pairs

vpmax_s16ExperimentalARM and neon

Folding maximum of adjacent pairs

vpmax_s32ExperimentalARM and neon

Folding maximum of adjacent pairs

vpmax_u8ExperimentalARM and neon

Folding maximum of adjacent pairs

vpmax_u16ExperimentalARM and neon

Folding maximum of adjacent pairs

vpmax_u32ExperimentalARM and neon

Folding maximum of adjacent pairs

vpmin_f32ExperimentalARM and neon

Folding minimum of adjacent pairs

vpmin_s8ExperimentalARM and neon

Folding minimum of adjacent pairs

vpmin_s16ExperimentalARM and neon

Folding minimum of adjacent pairs

vpmin_s32ExperimentalARM and neon

Folding minimum of adjacent pairs

vpmin_u8ExperimentalARM and neon

Folding minimum of adjacent pairs

vpmin_u16ExperimentalARM and neon

Folding minimum of adjacent pairs

vpmin_u32ExperimentalARM and neon

Folding minimum of adjacent pairs

vqadd_s8ExperimentalARM and neon

Saturating add

vqadd_s16ExperimentalARM and neon

Saturating add

vqadd_s32ExperimentalARM and neon

Saturating add

vqadd_u8ExperimentalARM and neon

Saturating add

vqadd_u16ExperimentalARM and neon

Saturating add

vqadd_u32ExperimentalARM and neon

Saturating add

vqaddq_s8ExperimentalARM and neon

Saturating add

vqaddq_s16ExperimentalARM and neon

Saturating add

vqaddq_s32ExperimentalARM and neon

Saturating add

vqaddq_u8ExperimentalARM and neon

Saturating add

vqaddq_u16ExperimentalARM and neon

Saturating add

vqaddq_u32ExperimentalARM and neon

Saturating add

vqmovn_u64ExperimentalARM and neon

Unsigned saturating extract narrow.

vqsub_s8ExperimentalARM and neon

Saturating subtract

vqsub_s16ExperimentalARM and neon

Saturating subtract

vqsub_s32ExperimentalARM and neon

Saturating subtract

vqsub_u8ExperimentalARM and neon

Saturating subtract

vqsub_u16ExperimentalARM and neon

Saturating subtract

vqsub_u32ExperimentalARM and neon

Saturating subtract

vqsubq_s8ExperimentalARM and neon

Saturating subtract

vqsubq_s16ExperimentalARM and neon

Saturating subtract

vqsubq_s32ExperimentalARM and neon

Saturating subtract

vqsubq_u8ExperimentalARM and neon

Saturating subtract

vqsubq_u16ExperimentalARM and neon

Saturating subtract

vqsubq_u32ExperimentalARM and neon

Saturating subtract

vreinterpret_u64_u32ExperimentalARM and neon

Vector reinterpret cast operation

vreinterpretq_s8_u8ExperimentalARM and neon

Vector reinterpret cast operation

vreinterpretq_u16_u8ExperimentalARM and neon

Vector reinterpret cast operation

vreinterpretq_u32_u8ExperimentalARM and neon

Vector reinterpret cast operation

vreinterpretq_u64_u8ExperimentalARM and neon

Vector reinterpret cast operation

vreinterpretq_u8_s8ExperimentalARM and neon

Vector reinterpret cast operation

vrhadd_s8ExperimentalARM and neon

Rounding halving add

vrhadd_s16ExperimentalARM and neon

Rounding halving add

vrhadd_s32ExperimentalARM and neon

Rounding halving add

vrhadd_u8ExperimentalARM and neon

Rounding halving add

vrhadd_u16ExperimentalARM and neon

Rounding halving add

vrhadd_u32ExperimentalARM and neon

Rounding halving add

vrhaddq_s8ExperimentalARM and neon

Rounding halving add

vrhaddq_s16ExperimentalARM and neon

Rounding halving add

vrhaddq_s32ExperimentalARM and neon

Rounding halving add

vrhaddq_u8ExperimentalARM and neon

Rounding halving add

vrhaddq_u16ExperimentalARM and neon

Rounding halving add

vrhaddq_u32ExperimentalARM and neon

Rounding halving add

vrsqrte_f32ExperimentalARM and neon

Reciprocal square-root estimate.

vshlq_n_u8ExperimentalARM and neon

Shift right

vshrq_n_u8ExperimentalARM and neon

Unsigned shift right

vsub_f32ExperimentalARM and neon

Subtract

vsub_s8ExperimentalARM and neon

Subtract

vsub_s16ExperimentalARM and neon

Subtract

vsub_s32ExperimentalARM and neon

Subtract

vsub_s64ExperimentalARM and neon

Subtract

vsub_u8ExperimentalARM and neon

Subtract

vsub_u16ExperimentalARM and neon

Subtract

vsub_u32ExperimentalARM and neon

Subtract

vsub_u64ExperimentalARM and neon

Subtract

vsubq_f32ExperimentalARM and neon

Subtract

vsubq_s8ExperimentalARM and neon

Subtract

vsubq_s16ExperimentalARM and neon

Subtract

vsubq_s32ExperimentalARM and neon

Subtract

vsubq_s64ExperimentalARM and neon

Subtract

vsubq_u8ExperimentalARM and neon

Subtract

vsubq_u16ExperimentalARM and neon

Subtract

vsubq_u32ExperimentalARM and neon

Subtract

vsubq_u64ExperimentalARM and neon

Subtract