|
|
|
|
|
by fwsgonzo
1597 days ago
|
|
I compared against native: #define ITERATIONS 1000
int main()
{
const size_t BUFFER_SIZE = 64ul \* 1024 \* 1024;
__m128i\* data_buffer = (__m128i *)memalign(64, BUFFER_SIZE);
const __m128i all_ones = _mm_set1_epi8(0xFF);
for (size_t i = 0; i < ITERATIONS; i++)
{
__m128i* data = data_buffer;
for (size_t b = 0; b < BUFFER_SIZE;) {
_mm_stream_si128(&data[0], all_ones);
_mm_stream_si128(&data[1], all_ones);
_mm_stream_si128(&data[2], all_ones);
_mm_stream_si128(&data[3], all_ones);
data += 4;
b += 16 * 4;
}
}
}
$ time ./fill_buffer.elf
real 0m1,832s
$ time ./wasmer fill_buffer.wasm -i fillBufferWithSIMD 1000
real 0m4,237s
I had to fixup the WAT because set_local and get_local don't exist anymore. They are called local.get and local.set now.At higher number of iterations the C version converges on about 1.7 seconds per 1000, while the WASM version seems to remain the same at 4.2 secs per 1000. This leaves native 2.5x faster for this particular operation, on my machine. |
|
Have you tried with the llvm backend? I believe the results might be even better there!