Hacker News new | ask | show | jobs
by Thaxll 1324 days ago
It's probably "slow" because of sdl and cgo not native Go code. A gb emulator doesn't do much actually, it's about fixed array, bit shifting, switch cases etc ...

I ran a quick pprof and indeed it's spending a lot of time in cgo:

  Showing nodes accounting for 28720ms, 67.67% of 42440ms total
  Dropped 145 nodes (cum <= 212.20ms)
  Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
   13080ms 30.82% 30.82%    16600ms 39.11%  runtime.cgocall
    4720ms 11.12% 41.94%     4750ms 11.19%  main.(*RAM).get
    2840ms  6.69% 48.63%    33070ms 77.92%  main.(*GPU).tick
    1970ms  4.64% 53.28%     3720ms  8.77%  runtime.mallocgc
    1450ms  3.42% 56.69%     1470ms  3.46%  main.(*RAM).set
    1160ms  2.73% 59.43%    41350ms 97.43%  main.(*GameBoy).tick
    1000ms  2.36% 61.78%     3160ms  7.45%  runtime.exitsyscall
     890ms  2.10% 63.88%     1610ms  3.79%  main.(*CPU).tick_interrupts
     820ms  1.93% 65.81%      850ms  2.00%  runtime.casgstatus
     790ms  1.86% 67.67%     5530ms 13.03%  main.(*CPU).tick
3 comments

> It's probably "slow" because of sdl and cgo not native Go code

The other languages are also using SDL via their respective interacting-with-C interfaces - what makes Go special here?

The author stated that the benchmarks run in headless mode, so I am not sure that it's SDL that's slowing it down here.

Even if it is SDL slowing it down, Go FFI being slow is still a real disadvantage compared to the other languages, and you can't just pretend like it doesn't exist in this case.

Yup.. a 240x performance difference (zig-py) has very little to do with the language, vm, or whatever. As soon as I saw that, I dismissed the benchmark.

By these standards a 10 year old cpu with a beefy GPU will beat any new cpu as well.

>Yup.. a 240x performance difference (zig-py) has very little to do with the language, vm, or whatever.

It absolutely does. A simple for loop in the standard Python interpreter will literally take 100x longer than the same thing in a language like C/C++, try for yourself if you don't believe me. CPython is unbelievably slow.

Very interesting... Simple loop (0 -> 320000000) and add 1 to a variable.

I couldn't reproduce 100x (no optimization flags, otherwise it won't do anything)

  Apple clang version 14.0.0 (clang-1400.0.29.102) -> 0m0.347s
  ruby 3.1.2p20 -> 0m11.314s
  Python 3.8.12 -> 0m19.662s
So ruby is almost 60% faster, but the C version is "only" 32x faster than ruby. 55x python.

I thought the differences would be smaller these days

What does a GPU have to do with any of this? I think you may be a little confused about something but I am not sure what.