Hacker News new | ask | show | jobs
by shagie 1207 days ago
That's part of the section in Programming Perl that sticks in my memory.

From my copy...

> Efficiency

> ...

> Note that optimizing for time may sometimes cost you in space or programmer efficiency (indicated by conflicting hints below). Them’s the breaks. If program- ming was easy, they wouldn’t need something as complicated as a human being to do it, now would they?

> ...

> Programmer Efficiency

> The half-perfect program that you can run today is better than the fully perfect and pure program that you can run next month. Deal with some temporary ug- liness.1 Some of these are the antithesis of our advice so far.

    • Use defaults.
    • Use funky shortcut command-line switches like –a, –n, –p, –s, and –i.
    • Use for to mean foreach.
    • Run system commands with backticks.
    ...
    • Use whatever you think of first.
    • Get someone else to do the work for you by programming half an implementation and putting it on Github.

> Maintainer Efficiency

> Code that you (or your friends) are going to use and work on for a long time into the future deserves more attention. Substitute some short-term gains for much better long-term benefits.

    • Don’t use defaults.
    • Use foreach to mean foreach.
    ...
2 comments

I've been dealing with a batch processing task that's written in NodeJS (partly because it was the tool at hand, partly because it does offline a process that can be done online so it's reusing code), and global interpreter locks are definitely introducing some new nuances to my already fairly broad knowledge of performance and concurrency. Broad not in the sense that I am a machine whisperer, but that I include human factors into this and that explodes the surface area of the problem, but also explains quite a lot of failure modes.

In threaded code it's not uncommon to analyze a piece of data and fire off background tasks the moment you encounter them. But if your workload is a DAG instead of a tree, you don't know if the task you fired is needed once, twice, or for every single node. So now you introduce a cache (and if you're a special idiot, you call it Dynamic Programming which it is fucking not) and deal with all of the complexities of that fun problem.

But it turns out in a GIL environment, you're making a lot less forward progress on the overall problem than you think you are because now you're context switching back and forth between two, three, five tasks with separate code and data hotspots, on the same CPU rather than running each on separate cores. It's like the worst implementation of coroutines.

If instead you scan the data and accumulate all the work to be done, and then run those tasks, and then scan the new data and accumulate the next bit of work to be done, you don't lose that much CPU or wall clock time in single threaded async code. What you get in the bargain though is a decomposition of the overall problem that makes it easy to spot improvements such as deduping tasks, dealing with backpressure, adding cache that's more orthogonal, and perhaps most importantly of all, debugging this giant pile of code.

So I've been going around making code faster by making it slower, removing most of the 'clever' and sprinkling a little crypto-cleverness (when the clever thing elicits an 'of course' response) / wisdom on top.

> Programming Perl

That book is one of the most underrated and overlooked works on the philosophy of programming I've ever read. It's ostensibly about best practices in programming Perl (which some people consider a complex language), but in reality this is a very deep book about the best practices for programming in any language.

Note the above excerpt is pretty much universally applicable no matter what the language. Much of the book is written at that level.

https://www.oreilly.com/library/view/programming-perl-4th/97...

I could say a similar thing about Practical Parallel Rendering. Officially it's a book about raytracing CGI in a cluster, but the first half of the book explains queuing theory and concurrency concerns in tremendous detail. It's a thin book to begin with, and you've more than gotten your money's worth if you read the first half and give up when they start talking about trigonometry.