Hacker News new | ask | show | jobs
by jesse__ 583 days ago
I do low level systems programming, so pretty different from JS-land, but I feel the techniques you should apply when doing optimization generally apply at any level/language.

0) algorithmic improvement. Obvious shit like do a quick sort instead of bubble sort (assuming N > 64, or whatever), not doing unnecessary work in a hot loop, etc

1) reduce memory footprint. The slowest part of your program is almost always just waiting for memory, unless you're doing something that's heavily CPU bound. Web applications are probably always memory bound. Reducing the amount of memory the function you're optimizing operates on reduces DCache misses, which are expensive.

2) Do batch operations. Once I've got something to a point where it's not completely braindead (which, honestly, is where I stop most of the time), I look to start batching things. Usually look to do 8 or 16 at a time in the hopes the compiler/runtime can make some use of SIMD. Use STATIC LOOPS ie (for 0..8) so the compiler can unroll the loop. That's extremely important.

3) probably unavailable (unless you want to/can drop into WASM), but the next step is usually SIMD. This is a rabbit hole, but if you want/need another ~8x perf improvement, this is how to get it

4) once all that's done, it's probably close to optimal in terms of cycles per element (unless I did something boneheaded, which is common). Last step is to multithread it if it needs even more juice. This can range from trivial to completely impossible depending on the algorithm. In JS land, you need to make sure you operate on SharedAreayBufferrs when doing multithreading for performance, because web workers copy the input/output values by default.

Anywhoo.. maybe that helps.

When I try to optimize something lightly, it's not uncommon for me to get 10x improvement fairly easily. When I optimize something to within an inch of it's life, I can sometimes get three or even four orders of magnitude faster.

2 comments

EDIT: I forgot to mention that for tight performance, avoid branches. This means ifs, switchs, loops, goto, etc. Sometimes you need branches, but mispredicted branches can be extremely costly, causing pipeline stalls and flushes. This is why using a static loop is important; so the compiler can unroll it and not use a branch.

I also should mention that I hate flamegraphs. They only give you a bare minimum amount of information for doing performance work. I'm not sure of a good JS profiler, but what you want to be able to do is mark up the sections of code you want profiled, instead of the profiler taking random samples and squashing them all together. Look at the tracy profiler for an example

> Web applications are probably always memory bound

IO bound and particularly network bound code is common too. The first fix I'd try with network bound code is to either eliminate the network call (local cache? turn a microservice into a library?) or to batch operations.

> Last step is to multithread it if it needs even more juice

In web app land, this is fraught with peril if you're doing it on the server, because it means your code is now competing for n times the resources. Often it's better for one request to take a long time if it means it's using a more predictable amount of memory, not causing other requests to time out, not exhausting your DB connection pool, etc.

I imagine that systems programming is similar in some ways and that's why multithreading is the last resort, just mentioning it because it's easy to shoot oneself in the foot with parallelism.

Some good points, thanks for the insight :)

Yeah, when I multithread something I pretty much assume that I can hog the whole machine for the time slice the job is going to be running. Said another way, I assume there will be a small number of large jobs running on the machine at any given time, which attempt to saturate the machine. Typically dispatched to a core-locked threadpool of some sort.

Definitely agree with the sentiment that multithreading is hard. Especially when trying to get every last drop..