| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by edoo 2703 days ago

Impressive effect.

I didn't realize openmp was so easy to use. It isn't realtime but you could bake up some cool effects with this.

AMD FX8320 3.5GHz

$ time ./tinykaboom

real 0m4.176s user 0m28.631 sys 0m0.012s

3 comments

userbinator 2703 days ago

Running it on the GPU would probably get to realtime speeds.

This technique is used a lot in demoscene demos, which certainly do run in realtime.

link

nazri1 2703 days ago

I saw the openmp pragma and thought to myself "neat! should be fun to watch the cores work hard at this" and went ahead and compiled and run it and smiled at the 400% cpu usage in top.

    $ time ./tinykaboom 
    ./tinykaboom  78.08s user 0.02s system 369% cpu 21.159 total

Then I wondered how it would fare if I were to port it to Go and went ahead and hastily did port to Go and thought that, "hmmm this should run a bit slower than the c++ version" but surprisingly it ran more than twice faster:

    $ go build ./tinykaboom.go
    $ time ./tinykaboom 
    ./tinykaboom  34.32s user 0.03s system 368% cpu 9.315 total

https://github.com/holygeek/tinykaboom/blob/master/tinykaboo...

Here's the corresponding perf report:

Go:

    Samples: 103K of event 'cycles:pp', Event count (approx.): 37252033995665
    Overhead  Command     Shared Object      Symbol
      32.17%  tinykaboom  tinykaboom         [.] math.sin
      28.80%  tinykaboom  tinykaboom         [.] main.hash
      11.81%  tinykaboom  tinykaboom         [.] main.rotate
       7.76%  tinykaboom  tinykaboom         [.] math.Min
       5.18%  tinykaboom  tinykaboom         [.] main.lerpFloat64
       4.25%  tinykaboom  tinykaboom         [.] main.noise
       2.59%  tinykaboom  tinykaboom         [.] runtime.mallocgc
       2.59%  tinykaboom  tinykaboom         [.] main.fractal_brownian_motion
       2.58%  tinykaboom  tinykaboom         [.] main.signed_distance

c++:

    Samples: 234K of event 'cycles:pp', Event count (approx.): 86721459552303
    Overhead  Command     Shared Object        Symbol
      67.93%  tinykaboom  libm-2.23.so         [.] __sin_avx
      30.80%  tinykaboom  tinykaboom           [.] _Z5noiseRK3vecILm3EfE
       1.27%  tinykaboom  libm-2.23.so         [.] __floorf_sse41
       0.00%  tinykaboom  tinykaboom           [.] _Z23fractal_brownian_motionRK3vecILm3EfE
       0.00%  tinykaboom  tinykaboom           [.] floorf@plt

If anyone can give suggestions on how to make the tinykaboom.cpp faster that would be neat!

link

namirez 2702 days ago

There are a few potential improvements here: 1) Use a look up table for 'sin' rather than using 'std::sin'. 2) Tell the compiler what instruction sets to use; for example, tell GCC to use 'skylake' instructions (https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/x86-Options.htm...). 3) Many of the functions could be 'inline constexpr'. 4) Although 'ofs <<' is buffered, it can still be very slow. Create the output in memory and use a lower level function like 'fwrite' to write it to file. 5) Use 'std::thread' or 'std::async'. It makes the multi-threading more portable and clear.

link

haldean 2703 days ago

What were your compilation flags?

link

nazri1 2703 days ago

I used the default one in CmakeLists.txt (-O3).

I ran the comparison again on another machine that I have and this time their performances are about the same:

c++:

    $ time ./tinykaboom
    ./tinykaboom  46.72s user 0.01s system 364% cpu 12.804 total

go:

    $ time ./tinykaboom     
    ./tinykaboom  42.50s user 0.07s system 350% cpu 12.161 total

link

maksimum 2703 days ago

i7-3770: real 0m6.800s user 0m6.695s sys 0m0.034s

2x e5-2667 v2: real 0m2.217s user 0m56.088s sys 0m0.016s

Seems like it's pretty inefficient with the dual CPU setup.

link

edoo 2703 days ago

Weird result. I guess it makes sense that it could use twice as much CPU to finish in half the time but looking at the numbers doesn't feel intuitive.

I wonder how many shaders this would keep busy. There is probably a class of GPUs and above that this could work on rather well alongside an already large workload.

link