| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by benhoyt 1769 days ago

Wow, this update is awesome: my GoAWK interpreter (https://github.com/benhoyt/goawk) runs a simple CPU-bound AWK program 38% faster when compiled with Go 1.17 (compared to 1.16).

  $ time goawk_go1.16 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
  4999999950000000
  real    0m10.158s ...
  $ time goawk_go1.17 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
  4999999950000000
  real    0m6.268s ...

I wonder why it's so much better than their advertised 5% perf improvement? Here's a quick CPU profile: https://i.imgur.com/csJyOYq.png ... I don't get too much out of it at a glance, just seems like everything's a bunch faster.

6 comments

drchase 1769 days ago

Hi, I'm one of the people who worked on it, and the guy who did the initial estimate back in early 2017. 5% is the geomean of a lot of benchmarks; a whole lot fall in the the 4-8% range, a few do worse because the new ABI creates new patterns of register use that don't fit well with the current register allocator, and the fix was larger than we wanted to risk. (See https://github.com/golang/go/issues/46216 )

link

benhoyt 1769 days ago

Overall for GoAWK I get an 18% speed increase on my micro-benchmarks between Go 1.16 and 1.17 (see https://github.com/benhoyt/goawk/commit/1f314f421273b3dc164f...) and I measured an 8% speed increase on my "slightly more real-world" benchmarks (these ones: https://github.com/benhoyt/goawk/blob/master/benchmark_awks....).

link

prattmic 1769 days ago

The benefits come primarily from avoiding extra work spilling arguments to/from the stack on function calls. If you are making lots and lots of function calls, particularly to small functions that can't be inlined, there could certainly be much bigger improvements.

link

barsonme 1769 days ago

just an fyi: you can use the -diff_base flag to diff the profiles without opening both profiles side-by-side.

link

benhoyt 1769 days ago

Oh, good to know, thanks!

link

_ph_ 1769 days ago

The speed gained depends a lot on the structure of the code benchmarked. Natively written Go code has more computation happening in local loops without many function calls, the optimization brings less effect. An interpreter often calls a function for every single directive executed. This means, you have a lot of function calls inside loops, sometimes for every single operation executed. This of course profits massively from this optimization.

link

Scaevolus 1769 days ago

Look at the disassembly and observe how your function calls have far fewer push/pop operations going on, and how the function prologues/epilogues are smaller.

link