Hacker News new | ask | show | jobs
by mehrdadn 2897 days ago
Out of curiosity, why use virtual calls in those cases? Why not just have two different implementation files -- one for tests, and one for production -- and choose one based on which one you're doing? It seems like a win all around.
7 comments

There's always performance hacks to be found in big apps, in exchange for implementation complexity and man-hours. And there's always a next bottleneck to be found. Performnace bugs are subject to an extended form of survivorship bias where they become bigger when the preceding perf bottlenecks get fixed.
I've refreshed and seen three major revisions of your comment after writing a full reply to the first one, and I can't say I've been able to clearly follow where you're coming from and where you're going with any of them. I would suggest that if you feel the need for such drastic revisions, it may be worth reconsidering whether whatever you're trying to argue is really compelling.
His advice is sage and you should look past whatever deficiencies you find in his form of communication. Simply put there's always bigger fish to fry, and virtual methods are a quick and easy way to implement test stubs. Especially when the compiler devirtualizes.
Well I had a response to that originally but his edits kind of changed how much sense my reply made in response. Here's what I had:

Performance is only one aspect of it. It also reduces code bloat, reducing the program's size footprint. Most tests (yes, I know, not all, but most) should not make it into the final binary users are running. I also don't see what's "hacky" about making a foo.test.cc file when I want an alternate implementation for foo.cc. It seems to be quite a positive and clear way to document the fact that an implementation is only needed for testing, and vice-versa. And not only that, but it reduces compile (& link) times, since you only need to compile one of the two implementations for each use case.

Yea all things being equal I'd choose your approach. But in an existing project that did things this way - it would be the last thing I'd touch. This is of course barring extreme circumstances - which profiling and other tools would identify.

Unfortunately a lot of C++ developers don't really understand how the linker works, and because of that the build system is black magic.

Yeah, I wasn't suggesting they immediately dispatch a team for transforming everything into this. But at least from now on they could try to get people to use this pattern, and make the move gradually.
Sorry! That's a risk you have when following comments tightly :)

I'll reconstruct my first take from memory if you have comments on it:

Current CPUs are pretty good at predicting indirect branches, and it's hard to tell beforehand which virtual calls still turn out to be perf problems - and it's wasted effort and a fallible process to attempt their compllete elimination beforehand.

Right, and I posted my reply to that here: https://news.ycombinator.com/item?id=17504680
It seems like a win all around.

Depends. Virtual calls can be done purely in code, with different implementation files you need to go round via the build system. The former just always works independent of platform whereas often for the latter it's more configuration work. Also, ignoring YAGNI, you could say the virtual one makes it easy to quickly test things out etc, is easily mockable, ...

From my point of view a benefit for virtual calls is toolability. If you have an abstract interface, it’s pretty easy to either directly create mocks for it in the test file, or use sth like gmock. If you go through the link completely different implementation route, you might need to create and link a different library which contains the faked implementation for several test cases. It also works well with pure header code.
If I understand you correctly aren't you violating the initial assumption that there are only 2 implementations, one for production and one for testing? At which point, you're answering a different question...
>Out of curiosity, why use virtual calls in those cases? Why not just have two different implementation files -- one for tests, and one for production -- and choose one based on which one you're doing? It seems like a win all around.

I don't fully understand the proposal here. Do you have an example of a code in the wild that uses the proposed scheme?

I'm just saying do conditional compilation instead of virtual dispatch, since you generally shouldn't really need both the test and production implementations to run inside the same program. So if you have reason to require

  // widget.h
  class Widget { virtual void throb(); };
  // widget.cc
  class WidgetImpl : public Widget { void throb() { ... } };
  class WidgetTest : public Widget { void throb() { ... } };
then, instead of that, just do

  // widget.h
  class Widget { void throb(); };
  // widget.cc
  void Widget::throb() { ... }
  // widget.test.cc
  void Widget::throb() { ... }
where you only compile widget.cc for the production build, and only compile widget.test.cc for the test build.

Feel free to post a more specific example and I can take a stab at seeing if I can make this transformation to it (or why it might not be possible, if you think it's not).

Thank you for elaborating your proposal. It might work in a theoretical scenario, but unlikely to be practical for existing projects with millions lines of codes.

Essentially, it was easier to write a compiler / linker optimization, than change the source code. On the bright side, everyone wins!

> It might work in a theoretical scenario, but unlikely to be practical for existing projects with millions lines of codes.

I mean there's no reason you have to change everything by tomorrow. You could introduce it as a policy moving forward, the same way I'm sure you make any other changes to other patterns that need to be followed. I'm sure you have a process for this?

> Essentially, it was easier to write a compiler / linker optimization, than change the source code.

I agree, but see above and below.

> On the bright side, everyone wins!

Well, you waste space, time, and energy making the compiler and linker do work that doesn't inherently need to be done. You also lose parallelizability of the compilation of the two implementations if they currently reside in the same file. Finally the compiler (er, linker) might not be able to actually apply the devirtualizations you expect, whereas in this case it's just a matter of inlining since the target method is already known.

> Well, you waste space, time, and energy making the compiler and linker do work that doesn't inherently need to be done.

It's repetitive, error-prone work, better to have the compiler and linker do it than rely on the programmer getting their use of the preprocessor right.

And even if the programmer gets it right, doing it via macros means every tool that you want to apply to your codebase - IDEs, profilers, coverage, instrumentation - needs to understand your macro. Are you sure they'll all get it right?

Better to write plain old standard code that every tool will work correctly with, and the worst thing that can ever happen is a slight performance penalty.

Are you replying to the correct comment? Not once did I even mention macros or the preprocessor.
> Well, you waste space, time, and energy making the compiler and linker do work that doesn't inherently need to be done.

You're suggesting that instead of a compiler wasting time space and energy, that we get actual people to waste their time space and energy by manually applying the transformation that the compiler is able to do anyway?

Changing things gradually can lead to old petrified lava flows of old design patterns that nobody understands or removes anymore: https://en.m.wikipedia.org/wiki/Lava_flow_(programming)
> lava flow is a problem in which computer code written under sub-optimal conditions is put into production and added to while still in a developmental state.

I don't know what you're referring to as the "suboptimal code" here. If it's the old pattern using virtual everywhere, then the ship's already sailed... that code is already in production. If you mean the two-file approach I proposed then that makes no sense; if it's worse than the last approach then why would you even use it.

In the performance-insensitive code the approach you defend could be "good enough" but for the code on the "hot path" the changes like the proposed by dataflow could result in the consistent improvement and are surely better long-term "win."

"We do it that way here" is not something that has to be always applied.

I have done things like that, but the one definition rule gets in the way.

    void DoSomething(widget* param, OtherClassWithvirtual* param2) {...}
My DoSomething function needs to link to widget, but now I have two different libraries that widget could be in. The linker encodes in my binary which library to load to get widget. All attempts to use the other library instead are undefined behavior.

The only way I know of to work around that is build DoSomething twice, linking the real widget once and the test widget the other. Note that DoSomething takes two parameters, and those are classes that may themselves also take other classes with virtual functions. All the possible cases quickly gets into a combinatoric explosion and my build times goes from an hour (already way too long) to days.

If I'm wrong please tell me how, I'd love to not have so many interfaces just because someone doesn't want their unit test to become a large scale integration test.

What if I want both? What if in ClassATest, I need ClassBImpl? Do I go around creating more compile cases?

Because that’s what we need, more magic in the build system.

Not sure I see what you're thinking of... could you give me a code example?
The implementation of TestWidget only defines some stuff of its own but otherwise shares a bunch of code with Widget is a possible scenario here.
You can still derive classes when you need to, right?
What class are you deriving from though? Can’t be Widget, because in the scenario you described Widget and TestWidget are mutually exclusive (TestWidget just replaces Widget altogether). The alternative is extracting the shared behaviour to a BaseWidget that both classes extend, but that’s just ugly as sin.
idk, would something like this work maybe?

  // widget.h
  class Widget { void throb1(); virtual void throb2(); };
  // widget.cc
  void Widget::throb1() { ... }
  void Widget::throb2() { ... }
  // widget.test.cc
  void Widget::throb1() { ... }
  void Widget::throb2() { ... }
  class TestWidget : public Widget { int x; void throb2(); };
  void TestWidget::throb2() { ... Widget::throb2(); ... }
If not, could you present code to illustrate the problem? I won't try to address more issues without actual code.
One reason is that other test might need the production version. Building twice for that case is more of a pain and not really worth it.
I mean, then you can avoid it in that situation. Or actually use the production version like I explained here: https://news.ycombinator.com/item?id=17505358 Or find another way to use it. It's not all-or-nothing.
Yes that works, but relies on devirtualization in the production code, and isn't at all what you were suggesting before about having separate implementation files.
Have you seen this in use in a real project?
Or just use a macro?
Yeah that would work but also make your code base a mess. Things that you can catch during build step and move in, or out, is much better handled at that level rather than leaking it into your code base.