| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fwsgonzo 1597 days ago

I compared against native:

  #define ITERATIONS 1000

  int main()
  {
    const size_t BUFFER_SIZE = 64ul \* 1024 \* 1024;
    __m128i\* data_buffer = (__m128i *)memalign(64, BUFFER_SIZE);

    const __m128i all_ones = _mm_set1_epi8(0xFF);

    for (size_t i = 0; i < ITERATIONS; i++)
    {
      __m128i* data = data_buffer;
      for (size_t b = 0; b < BUFFER_SIZE;) {
        _mm_stream_si128(&data[0], all_ones);
        _mm_stream_si128(&data[1], all_ones);
        _mm_stream_si128(&data[2], all_ones);
        _mm_stream_si128(&data[3], all_ones);
        data += 4;
        b += 16 * 4;
      }
    }
  }


  $ time ./fill_buffer.elf 
  real 0m1,832s


  $ time ./wasmer fill_buffer.wasm -i fillBufferWithSIMD 1000
  real 0m4,237s

I had to fixup the WAT because set_local and get_local don't exist anymore. They are called local.get and local.set now.

At higher number of iterations the C version converges on about 1.7 seconds per 1000, while the WASM version seems to remain the same at 4.2 secs per 1000. This leaves native 2.5x faster for this particular operation, on my machine.

3 comments

syrusakbary 1597 days ago

Hi, I'm Syrus from Wasmer.

Have you tried with the llvm backend? I believe the results might be even better there!

  $ time ./wasmer run --llvm fill_buffer.wasm -i fillBufferWithSIMD 1000

link

fwsgonzo 1596 days ago

$ time ./wasmer fill_buffer.wasm --llvm -i fillBufferWithSIMD 1000

real 0m5,233s

It seems to be a 25% longer run-time with LLVM backend. I did not know about that option! Very interesting.

link

syrusakbary 1596 days ago

It's strange that llvm runs slower (perhaps the of the 5s some of that is spent being compiled).

Could you share the fill_buffer.wasm (or the WAT file) so I can do some tests? Thanks!

link

fwsgonzo 1595 days ago

Here you go: https://gist.github.com/fwsGonzo/2968cf0bc3364eb1ff0e0500569...

Hopefully I did not mess anything up!

link

jcelerier 1597 days ago

very depressing when for so many use cases even native performance is very very very much not fast enough

link

kevingadd 1597 days ago

In many of these use cases native is only "not fast enough" because the code you're running makes very poor use of the cache, pipelining, simd instruction sets, and memory bandwidth

link

Dylan16807 1597 days ago

> when

If native performance is "very very very" not fast enough then that's supercomputer work and it doesn't really matter if WASM is 3x native or 0.3x native. So that context should be where you're the least depressed.

link

jcelerier 1596 days ago

> then that's supercomputer work

today's laptop work is late 90's supercomputer's work (and it was even more depressing back then).

link

Dylan16807 1596 days ago

And today's supercomputer work was impossible in the late 90's.

That doesn't really change my argument. When you're looking at languages that are used for small tasks today, their speed doesn't have much relevance to how vastly bigger tasks are accomplished. And by the time those tasks can be run on a laptop, WASM implementations are going to be much better and we still might not be using it at all for those larger tasks.

link

hackthesystem 1597 days ago

Thanks for mentioning, I just updated my website and Github to use the new WAT functions.

link