Hacker News new | ask | show | jobs
by jcelerier 1923 days ago
my personal rule of thumb is that my software must be useable at -O0 with address sanitizers on my desktop - so far that has meant that at -O3 it stays useable on raspberry pi-3 level hardware.

A few months ago I tried to make a build which targetted ivybridge-level CPUs, it took no more than one day for a few users to report that it didn't work on their machines, turns out a lot of people still rock some old AMD Phenom or Q6600-era machines

3 comments

> my personal rule of thumb is that my software must be useable at -O0 with address sanitizers on my desktop

The trouble with this criterion is that it fundamentally alters the language from the ground-up: it forces you to optimize the source code structure for this too, not just run-time performance. Specifically, one of the core strengths of C++ is that no matter how many (practical) levels of wrapping and forwarding you do, as long as they're simple, they can generally all get flattened and go away with optimizations like inlining. But if you don't enable optimizations, now every indirection in your source code will cost you—even absolutely trivial things, like std::move() or std::forward(), that should be 100% free. This obviously hampers your ability to design good C++ abstractions, and, basically, turns C++ into a different language (like Javascript or Python). It seems rather suboptimal. (Do you not encounter these issues in your particular application?)

What I would probably prefer in your situation is to change the criteria somewhat, by doing things like keeping ASAN, enabling some debug-mode facilities (like ITERATOR_DEBUG_LEVEL=1 for MSVC), but also enabling some optimizations for inlining and such so that you don't fundamentally alter the language like this. And/or you can just slow down your CPU when testing (in Windows you can just set the max CPU speed in Advanced Power Options).

Presumably they still optimize and write for -O3, just that they run far slower version.

Without any manual optimization targeting O0.

(main negative is that missing performance degradation appearing in 03 ut not O0 may be harder to notice)

> : it forces you to optimize the source code structure for this too,

I thought that it would but on my dev machine (a broadwell 6900k, still pretty good but definitely not top of the line) I actually have to push it a fair bit to have this be an issue (which is why it is important to do it ! because low-power computer are really low-power compared to that), so this question definitely does not come up during the design (which is in my case generally very template-y and subject to the issues you mention). For reference, the app in question is https://ossia.io

The cases where doing this led to changes in code were more in the lines of "welp, looks like this algorithm I implemented for rendering waveforms is damn inefficient", "gonna have to think if I can redraw this widget less", "I should really cache the results of this computation", etc.

Interesting, I guess it depends on your application. :-) You made me go back and double-check this on an actual program I had; here's what it is as a comparison point:

So I have an application in front of me right now that I've already optimized the heck out of (and it's as close to single-pass as can be), and turning off optimizations in release mode makes a basic 0.27-second task take 2.4 seconds... almost an order of magnitude difference.

And when I try to break into the code to see where it stops, it's almost always within traditionally-very-cheap operations like std::vector::emplace_back

  1 std::vector::emplace_back
  2 std::vector::_Emplace_back_with_unused_capacity
  3 std::_Default_allocator_traits::construct
  4 T::T
  5 U::V::w
and std::lower_bound

  1 std::lower_bound
  2 std::lower_bound
  3 std::_Seek_wrapped
  4 std::_Vector_const_iterator::_Seek_to
which have suddenly become incredibly expensive due to lack of optimizations like inlining. And notice this is all in the standard library, not within my own (template-light) code.

Going from 0.27 seconds (near-instantaneous for the user) to 2.4 seconds (a huge lag) is enough to make the program incredibly frustrating. Whether it's still "usable" at that point I guess is a matter of debate (some devs just put up with any amount of lag you throw at them!), but I feel pretty safe in saying the task I'm trying to accomplish simply would not be possible without optimizations.

So I'm guessing your performance targets & constraints are quite different, and that's probably why this isn't such a big deal in your case.

I've still got some SandyBridge-era computers running.
My PC is a dualcore intel thing with 8 gigabytes of RAM. It's 12 years old. It was 2 gigabytes of RAM when I bought it and I have added an SSD some years ago and upgraded the Gfx card. It is still perfectly usable for my job (writing code, word processing, web dev). When I have bigger task, I design them on it and move them to online CPU/GPU if needed.

So it's quite a durable product and I'm proud of it.

Using Linux helps as it doesn't need 1 more gigabyte or RAM each time I upgrade it. And my emacs just consume the same amount of RAM as years ago. Very predictable.

Likewise. A dual C2Q Mac Pro, Nehalem and Westmere Xeons, and a Sandy Bridge NAS. Newest non-embedded x86 in the house is probably my 2017 MacBook Air. I did buy an M1 Mac, but why would I replace our perfectly performant desktops that we only need occasionally for e.g. CAD or video editing or whatever when they still work absolutely fine? It's not a lack of money, it's a question of priorities. I have yet to find the killer app that's going to force my hand. It seems likely that hardware failure will get them first.
You just reminded me that I've also got a Core 2 Duo Mac running, as well. That thing can run games better than my Mac that came out a decade later. Might have something to do with the enormous caches on the Core 2 series versus later Intel Core releases.

I also agree with your reasoning. These computers have been serving their purposes for a while, and I see no reason to take the time to replace them.

Yeah, SFF PC of that era can be had for sometimes RPi-level prices. My grandma has one and it's still more powerful than most low-end laptops people use. I've also got one as a home server, it's plenty powerful for that too. I'd recommend them to anyone who "just wants a pc".
> -O0 with address sanitizers on my desktop

> that at -O3

What does this notation mean?

Optimization levels for C compilers like GCC and Clang.
Specifically the command line flags you would pass to the compiler.