Hacker News new | ask | show | jobs
by glymor 2897 days ago
A reply in the thread from David Major about performance:

"At the moment, performance is a mixed bag. Some tests are up and some are down. In particular I believe Speedometer is down a few percent."

"Note however that clang-cl is punching above its weight. These builds currently have neither LTO nor PGO, while our MSVC builds use both of those. Any regressions that we're seeing ought to be short-lived. Once we enable LTO[1] and PGO[2], I expect clang to be a clear performance win."

[1] "LTO (Link Time Optimization) is a method for achieving better runtime performance through whole-program analysis and cross-module optimization." http://blog.llvm.org/2016/06/thinlto-scalable-and-incrementa...

[2] PGO = Profile Guided Optimization

2 comments

LTO can be particularly handy when doing devirtualization. Our experience ([1]) with Chromium demonstrated that there's a fair amount of virtual methods which have exactly two implementation in the code: one for production and another one for tests. While linking production code, it's trivial for this optimization to spot that, replace a virtual call with regular invocation and then often inline and allow for other in-place optimizations. In many cases of the renderer (Blink), that gives 3%-7% speedup out of nowhere.

1. https://bugs.chromium.org/p/chromium/issues/detail?id=580389...

Out of curiosity, why use virtual calls in those cases? Why not just have two different implementation files -- one for tests, and one for production -- and choose one based on which one you're doing? It seems like a win all around.
There's always performance hacks to be found in big apps, in exchange for implementation complexity and man-hours. And there's always a next bottleneck to be found. Performnace bugs are subject to an extended form of survivorship bias where they become bigger when the preceding perf bottlenecks get fixed.
I've refreshed and seen three major revisions of your comment after writing a full reply to the first one, and I can't say I've been able to clearly follow where you're coming from and where you're going with any of them. I would suggest that if you feel the need for such drastic revisions, it may be worth reconsidering whether whatever you're trying to argue is really compelling.
His advice is sage and you should look past whatever deficiencies you find in his form of communication. Simply put there's always bigger fish to fry, and virtual methods are a quick and easy way to implement test stubs. Especially when the compiler devirtualizes.
Well I had a response to that originally but his edits kind of changed how much sense my reply made in response. Here's what I had:

Performance is only one aspect of it. It also reduces code bloat, reducing the program's size footprint. Most tests (yes, I know, not all, but most) should not make it into the final binary users are running. I also don't see what's "hacky" about making a foo.test.cc file when I want an alternate implementation for foo.cc. It seems to be quite a positive and clear way to document the fact that an implementation is only needed for testing, and vice-versa. And not only that, but it reduces compile (& link) times, since you only need to compile one of the two implementations for each use case.

Sorry! That's a risk you have when following comments tightly :)

I'll reconstruct my first take from memory if you have comments on it:

Current CPUs are pretty good at predicting indirect branches, and it's hard to tell beforehand which virtual calls still turn out to be perf problems - and it's wasted effort and a fallible process to attempt their compllete elimination beforehand.

Right, and I posted my reply to that here: https://news.ycombinator.com/item?id=17504680
It seems like a win all around.

Depends. Virtual calls can be done purely in code, with different implementation files you need to go round via the build system. The former just always works independent of platform whereas often for the latter it's more configuration work. Also, ignoring YAGNI, you could say the virtual one makes it easy to quickly test things out etc, is easily mockable, ...

From my point of view a benefit for virtual calls is toolability. If you have an abstract interface, it’s pretty easy to either directly create mocks for it in the test file, or use sth like gmock. If you go through the link completely different implementation route, you might need to create and link a different library which contains the faked implementation for several test cases. It also works well with pure header code.
If I understand you correctly aren't you violating the initial assumption that there are only 2 implementations, one for production and one for testing? At which point, you're answering a different question...
>Out of curiosity, why use virtual calls in those cases? Why not just have two different implementation files -- one for tests, and one for production -- and choose one based on which one you're doing? It seems like a win all around.

I don't fully understand the proposal here. Do you have an example of a code in the wild that uses the proposed scheme?

I'm just saying do conditional compilation instead of virtual dispatch, since you generally shouldn't really need both the test and production implementations to run inside the same program. So if you have reason to require

  // widget.h
  class Widget { virtual void throb(); };
  // widget.cc
  class WidgetImpl : public Widget { void throb() { ... } };
  class WidgetTest : public Widget { void throb() { ... } };
then, instead of that, just do

  // widget.h
  class Widget { void throb(); };
  // widget.cc
  void Widget::throb() { ... }
  // widget.test.cc
  void Widget::throb() { ... }
where you only compile widget.cc for the production build, and only compile widget.test.cc for the test build.

Feel free to post a more specific example and I can take a stab at seeing if I can make this transformation to it (or why it might not be possible, if you think it's not).

Thank you for elaborating your proposal. It might work in a theoretical scenario, but unlikely to be practical for existing projects with millions lines of codes.

Essentially, it was easier to write a compiler / linker optimization, than change the source code. On the bright side, everyone wins!

> It might work in a theoretical scenario, but unlikely to be practical for existing projects with millions lines of codes.

I mean there's no reason you have to change everything by tomorrow. You could introduce it as a policy moving forward, the same way I'm sure you make any other changes to other patterns that need to be followed. I'm sure you have a process for this?

> Essentially, it was easier to write a compiler / linker optimization, than change the source code.

I agree, but see above and below.

> On the bright side, everyone wins!

Well, you waste space, time, and energy making the compiler and linker do work that doesn't inherently need to be done. You also lose parallelizability of the compilation of the two implementations if they currently reside in the same file. Finally the compiler (er, linker) might not be able to actually apply the devirtualizations you expect, whereas in this case it's just a matter of inlining since the target method is already known.

In the performance-insensitive code the approach you defend could be "good enough" but for the code on the "hot path" the changes like the proposed by dataflow could result in the consistent improvement and are surely better long-term "win."

"We do it that way here" is not something that has to be always applied.

I have done things like that, but the one definition rule gets in the way.

    void DoSomething(widget* param, OtherClassWithvirtual* param2) {...}
My DoSomething function needs to link to widget, but now I have two different libraries that widget could be in. The linker encodes in my binary which library to load to get widget. All attempts to use the other library instead are undefined behavior.

The only way I know of to work around that is build DoSomething twice, linking the real widget once and the test widget the other. Note that DoSomething takes two parameters, and those are classes that may themselves also take other classes with virtual functions. All the possible cases quickly gets into a combinatoric explosion and my build times goes from an hour (already way too long) to days.

If I'm wrong please tell me how, I'd love to not have so many interfaces just because someone doesn't want their unit test to become a large scale integration test.

What if I want both? What if in ClassATest, I need ClassBImpl? Do I go around creating more compile cases?

Because that’s what we need, more magic in the build system.

Not sure I see what you're thinking of... could you give me a code example?
The implementation of TestWidget only defines some stuff of its own but otherwise shares a bunch of code with Widget is a possible scenario here.
You can still derive classes when you need to, right?
One reason is that other test might need the production version. Building twice for that case is more of a pain and not really worth it.
I mean, then you can avoid it in that situation. Or actually use the production version like I explained here: https://news.ycombinator.com/item?id=17505358 Or find another way to use it. It's not all-or-nothing.
Yes that works, but relies on devirtualization in the production code, and isn't at all what you were suggesting before about having separate implementation files.
Have you seen this in use in a real project?
Or just use a macro?
Yeah that would work but also make your code base a mess. Things that you can catch during build step and move in, or out, is much better handled at that level rather than leaking it into your code base.
That's really cool. I've primarily seen LTO benefits on the more embedded side of things where it ends up enabling far more aggressive dead code elimination between code and libraries which makes it much easier to fit into small chips. I hadn't thought much about virtual methods with C++ code (uncommon to see those in embedded code to begin with).
Interesting. Do they collect profiles from Nightly users or do they have some internal characteristic workload?
It's based on the perf tests, which I think are all public.