|
|
|
|
|
by ashtonsix
249 days ago
|
|
I'm assuming you're referring to BFM/EXTR? NEON absolutely improves here. The core I developed on (Neoverse V2) has 4 SIMD ports and 6 scalar integer ports, however only 2 of those scalar ports support multicycle integer operations like the insert variant of BFM (essential for scalar packing). More importantly, NEON progresses 16 elements per instruction instead of 1. |
|