Hacker News new | ask | show | jobs
by edoo 2703 days ago
Impressive effect.

I didn't realize openmp was so easy to use. It isn't realtime but you could bake up some cool effects with this.

AMD FX8320 3.5GHz

$ time ./tinykaboom

real 0m4.176s user 0m28.631 sys 0m0.012s

3 comments

Running it on the GPU would probably get to realtime speeds.

This technique is used a lot in demoscene demos, which certainly do run in realtime.

I saw the openmp pragma and thought to myself "neat! should be fun to watch the cores work hard at this" and went ahead and compiled and run it and smiled at the 400% cpu usage in top.

    $ time ./tinykaboom 
    ./tinykaboom  78.08s user 0.02s system 369% cpu 21.159 total
Then I wondered how it would fare if I were to port it to Go and went ahead and hastily did port to Go and thought that, "hmmm this should run a bit slower than the c++ version" but surprisingly it ran more than twice faster:

    $ go build ./tinykaboom.go
    $ time ./tinykaboom 
    ./tinykaboom  34.32s user 0.03s system 368% cpu 9.315 total
https://github.com/holygeek/tinykaboom/blob/master/tinykaboo...

Here's the corresponding perf report:

Go:

    Samples: 103K of event 'cycles:pp', Event count (approx.): 37252033995665
    Overhead  Command     Shared Object      Symbol
      32.17%  tinykaboom  tinykaboom         [.] math.sin
      28.80%  tinykaboom  tinykaboom         [.] main.hash
      11.81%  tinykaboom  tinykaboom         [.] main.rotate
       7.76%  tinykaboom  tinykaboom         [.] math.Min
       5.18%  tinykaboom  tinykaboom         [.] main.lerpFloat64
       4.25%  tinykaboom  tinykaboom         [.] main.noise
       2.59%  tinykaboom  tinykaboom         [.] runtime.mallocgc
       2.59%  tinykaboom  tinykaboom         [.] main.fractal_brownian_motion
       2.58%  tinykaboom  tinykaboom         [.] main.signed_distance
c++:

    Samples: 234K of event 'cycles:pp', Event count (approx.): 86721459552303
    Overhead  Command     Shared Object        Symbol
      67.93%  tinykaboom  libm-2.23.so         [.] __sin_avx
      30.80%  tinykaboom  tinykaboom           [.] _Z5noiseRK3vecILm3EfE
       1.27%  tinykaboom  libm-2.23.so         [.] __floorf_sse41
       0.00%  tinykaboom  tinykaboom           [.] _Z23fractal_brownian_motionRK3vecILm3EfE
       0.00%  tinykaboom  tinykaboom           [.] floorf@plt
If anyone can give suggestions on how to make the tinykaboom.cpp faster that would be neat!
There are a few potential improvements here: 1) Use a look up table for 'sin' rather than using 'std::sin'. 2) Tell the compiler what instruction sets to use; for example, tell GCC to use 'skylake' instructions (https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/x86-Options.htm...). 3) Many of the functions could be 'inline constexpr'. 4) Although 'ofs <<' is buffered, it can still be very slow. Create the output in memory and use a lower level function like 'fwrite' to write it to file. 5) Use 'std::thread' or 'std::async'. It makes the multi-threading more portable and clear.
What were your compilation flags?
I used the default one in CmakeLists.txt (-O3).

I ran the comparison again on another machine that I have and this time their performances are about the same:

c++:

    $ time ./tinykaboom
    ./tinykaboom  46.72s user 0.01s system 364% cpu 12.804 total
go:

    $ time ./tinykaboom     
    ./tinykaboom  42.50s user 0.07s system 350% cpu 12.161 total
i7-3770: real 0m6.800s user 0m6.695s sys 0m0.034s

2x e5-2667 v2: real 0m2.217s user 0m56.088s sys 0m0.016s

Seems like it's pretty inefficient with the dual CPU setup.

Weird result. I guess it makes sense that it could use twice as much CPU to finish in half the time but looking at the numbers doesn't feel intuitive.

I wonder how many shaders this would keep busy. There is probably a class of GPUs and above that this could work on rather well alongside an already large workload.