Hacker News new | ask | show | jobs
by mrich 2498 days ago
My take, after 12 years of industry C++ experience, working on code that needs to be fast: Too much emphasis is placed on gaining another 0.5% performance improvement, instead of slightly slower code that does what was intended. At least offer some safe defaults and make the bleeding edge optional.

While we are debugging things like this, people are writing servers in JavaScript and web services in Python :O No need to optimize so heavily, they will waste it anyway :)

3 comments

I've mostly written C for embedded. Which sometimes needs to be fast. I agree with you.

It definitely feels like as compiler writers continue to gleefully add more footguns to C/C++. Application programmers vote with their feet by using slower or much slower interpreted languages like Java, C#, JavaScript, Python Ruby, and PHP. Meanwhile system programmers are eyeing golang and rust for infrastructure.

That 0.5% improvement can help a lot when it's inside the Javascript engine or the Python interpreter.
It's usually a 0.5% improvement on a micro benchmark.

One of my suspicions is that at the low end where I operate the marginal cost of higher speed is essentially zero. My firmware spends more than 9.99% of the time sleeping. Micro optimizations of a few percent is meaningless. At the other end superscalar processors are a moving target for micro optimizations. And further a lot of tasks look like init -> process data -> clean up. Over time the process data part has gotten very large. Making the init and clean up parts of the code a smaller and smaller percentage of the execution time. Micro optimizations in those parts of the code provide no value. Next is the constant movement to push the data processing into either specialized CPU instructions or GPU's.

A single optimization pass might only improve a microbenchmark a bit, but all passes taken together significantly speed up most programs. In the embedded software that I have experience with we eventually had to turn on optimizations because otherwise we would have had to switch to a new hardware platform to run continuously more demanding workloads.
I've spend 3 years building a java code base with a class of requests having an average response time of < 1 ms. And the entire application had a 99%q response time of < 10ms. Including GC and everything.

Quite honestly, after a more years of experience: Cache smart and batch-query. Network latency, aka lightspeed in fiber or coppper is our enemy. Not a JVM GC'ing in a controllble way. If the CPU cache is your issue, you can either correct me, or you're abusing the network without realizing it.

Some systems will have stricter latency requirements than that -- microseconds, always, no exceptions (e.g. studio audio, network packet processing, industrial controls). Others will have maximizing throughput as a goal (e.g. x264). In both cases CPU cache could be a bottleneck and GC would be the enemy.