| HN Mirror

• Use defaults. • Use funky shortcut command-line switches like –a, –n, –p, –s, and –i. • Use for to mean foreach. • Run system commands with backticks. ... • Use whatever you think of first. • Get someone else to do the work for you by programming half an implementation and putting it on Github.

I've been dealing with a batch processing task that's written in NodeJS (partly because it was the tool at hand, partly because it does offline a process that can be done online so it's reusing code), and global interpreter locks are definitely introducing some new nuances to my already fairly broad knowledge of performance and concurrency. Broad not in the sense that I am a machine whisperer, but that I include human factors into this and that explodes the surface area of the problem, but also explains quite a lot of failure modes.

In threaded code it's not uncommon to analyze a piece of data and fire off background tasks the moment you encounter them. But if your workload is a DAG instead of a tree, you don't know if the task you fired is needed once, twice, or for every single node. So now you introduce a cache (and if you're a special idiot, you call it Dynamic Programming which it is fucking not) and deal with all of the complexities of that fun problem.

But it turns out in a GIL environment, you're making a lot less forward progress on the overall problem than you think you are because now you're context switching back and forth between two, three, five tasks with separate code and data hotspots, on the same CPU rather than running each on separate cores. It's like the worst implementation of coroutines.

If instead you scan the data and accumulate all the work to be done, and then run those tasks, and then scan the new data and accumulate the next bit of work to be done, you don't lose that much CPU or wall clock time in single threaded async code. What you get in the bargain though is a decomposition of the overall problem that makes it easy to spot improvements such as deduping tasks, dealing with backpressure, adding cache that's more orthogonal, and perhaps most importantly of all, debugging this giant pile of code.

So I've been going around making code faster by making it slower, removing most of the 'clever' and sprinkling a little crypto-cleverness (when the clever thing elicits an 'of course' response) / wisdom on top.