| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jcelerier 1923 days ago
	my personal rule of thumb is that my software must be useable at -O0 with address sanitizers on my desktop - so far that has meant that at -O3 it stays useable on raspberry pi-3 level hardware. A few months ago I tried to make a build which targetted ivybridge-level CPUs, it took no more than one day for a few users to report that it didn't work on their machines, turns out a lot of people still rock some old AMD Phenom or Q6600-era machines

3 comments

dataflow 1923 days ago

> my personal rule of thumb is that my software must be useable at -O0 with address sanitizers on my desktop

The trouble with this criterion is that it fundamentally alters the language from the ground-up: it forces you to optimize the source code structure for this too, not just run-time performance. Specifically, one of the core strengths of C++ is that no matter how many (practical) levels of wrapping and forwarding you do, as long as they're simple, they can generally all get flattened and go away with optimizations like inlining. But if you don't enable optimizations, now every indirection in your source code will cost you—even absolutely trivial things, like std::move() or std::forward(), that should be 100% free. This obviously hampers your ability to design good C++ abstractions, and, basically, turns C++ into a different language (like Javascript or Python). It seems rather suboptimal. (Do you not encounter these issues in your particular application?)

What I would probably prefer in your situation is to change the criteria somewhat, by doing things like keeping ASAN, enabling some debug-mode facilities (like ITERATOR_DEBUG_LEVEL=1 for MSVC), but also enabling some optimizations for inlining and such so that you don't fundamentally alter the language like this. And/or you can just slow down your CPU when testing (in Windows you can just set the max CPU speed in Advanced Power Options).

link

matkoniecz 1923 days ago

Presumably they still optimize and write for -O3, just that they run far slower version.

Without any manual optimization targeting O0.

(main negative is that missing performance degradation appearing in 03 ut not O0 may be harder to notice)

link

jcelerier 1923 days ago

> : it forces you to optimize the source code structure for this too,

I thought that it would but on my dev machine (a broadwell 6900k, still pretty good but definitely not top of the line) I actually have to push it a fair bit to have this be an issue (which is why it is important to do it ! because low-power computer are really low-power compared to that), so this question definitely does not come up during the design (which is in my case generally very template-y and subject to the issues you mention). For reference, the app in question is https://ossia.io

The cases where doing this led to changes in code were more in the lines of "welp, looks like this algorithm I implemented for rendering waveforms is damn inefficient", "gonna have to think if I can redraw this widget less", "I should really cache the results of this computation", etc.

link

dataflow 1923 days ago

Interesting, I guess it depends on your application. :-) You made me go back and double-check this on an actual program I had; here's what it is as a comparison point:

So I have an application in front of me right now that I've already optimized the heck out of (and it's as close to single-pass as can be), and turning off optimizations in release mode makes a basic 0.27-second task take 2.4 seconds... almost an order of magnitude difference.

And when I try to break into the code to see where it stops, it's almost always within traditionally-very-cheap operations like std::vector::emplace_back

  1 std::vector::emplace_back
  2 std::vector::_Emplace_back_with_unused_capacity
  3 std::_Default_allocator_traits::construct
  4 T::T
  5 U::V::w

and std::lower_bound

  1 std::lower_bound
  2 std::lower_bound
  3 std::_Seek_wrapped
  4 std::_Vector_const_iterator::_Seek_to

which have suddenly become incredibly expensive due to lack of optimizations like inlining. And notice this is all in the standard library, not within my own (template-light) code.

Going from 0.27 seconds (near-instantaneous for the user) to 2.4 seconds (a huge lag) is enough to make the program incredibly frustrating. Whether it's still "usable" at that point I guess is a matter of debate (some devs just put up with any amount of lag you throw at them!), but I feel pretty safe in saying the task I'm trying to accomplish simply would not be possible without optimizations.

So I'm guessing your performance targets & constraints are quite different, and that's probably why this isn't such a big deal in your case.

link

heavyset_go 1923 days ago

I've still got some SandyBridge-era computers running.

link

wiz21c 1923 days ago

My PC is a dualcore intel thing with 8 gigabytes of RAM. It's 12 years old. It was 2 gigabytes of RAM when I bought it and I have added an SSD some years ago and upgraded the Gfx card. It is still perfectly usable for my job (writing code, word processing, web dev). When I have bigger task, I design them on it and move them to online CPU/GPU if needed.

So it's quite a durable product and I'm proud of it.

Using Linux helps as it doesn't need 1 more gigabyte or RAM each time I upgrade it. And my emacs just consume the same amount of RAM as years ago. Very predictable.

link

theodric 1923 days ago

Likewise. A dual C2Q Mac Pro, Nehalem and Westmere Xeons, and a Sandy Bridge NAS. Newest non-embedded x86 in the house is probably my 2017 MacBook Air. I did buy an M1 Mac, but why would I replace our perfectly performant desktops that we only need occasionally for e.g. CAD or video editing or whatever when they still work absolutely fine? It's not a lack of money, it's a question of priorities. I have yet to find the killer app that's going to force my hand. It seems likely that hardware failure will get them first.

link

heavyset_go 1922 days ago

You just reminded me that I've also got a Core 2 Duo Mac running, as well. That thing can run games better than my Mac that came out a decade later. Might have something to do with the enormous caches on the Core 2 series versus later Intel Core releases.

I also agree with your reasoning. These computers have been serving their purposes for a while, and I see no reason to take the time to replace them.

link

dvdkon 1923 days ago

Yeah, SFF PC of that era can be had for sometimes RPi-level prices. My grandma has one and it's still more powerful than most low-end laptops people use. I've also got one as a home server, it's plenty powerful for that too. I'd recommend them to anyone who "just wants a pc".

link

mssundaram 1923 days ago

> -O0 with address sanitizers on my desktop

> that at -O3

What does this notation mean?

link

nayuki 1923 days ago

Optimization levels for C compilers like GCC and Clang.

link

nitrogen 1923 days ago

Specifically the command line flags you would pass to the compiler.

link