Hacker News new | ask | show | jobs
by akira2501 547 days ago
> "[CPU, Memory, Disk] is cheap, engineering time isn't." Fuck that.

It is. It's absurdly cheap. I ensure I check the amount of time it would take for me to make a performance improvement against the runtime costs of my functions. It's rarely worth the extra effort.

Seriously, until you get into the millions of records per second level, you're almost never benefited. You may make your function 2x faster, at a cost of additional complexity, but you never run it enough in a year for it to pay itself back.

> Bad engineering time is _incredibly_ expensive.

Engineering time is expensive. Period. It speaks to the need to minimize it.

> This is an excuse to not spend time learning the ins and outs of your language, and your hardware.

All of which will change in a few years, which is fine, if you're also committing to keeping _all that code_ up to date right along with it. Otherwise you end up with an obscure mess that you have to unwind 5 years of context to understand and fix again.

Complexity and available mental contexts are forgotten costs. If your language even has that many "ins and outs" to begin with you may want to reconsider that.

3 comments

> You may make your function 2x faster, at a cost of additional complexity, but you never run it enough in a year for it to pay itself back.

"You" is both singular and plural, which is often the problem with this thinking.

Is it worth spending a month of engineering time to make a page load in 50ms instead of 2s? Seems like a lot of engineering time for a noticeable but somewhat minor improvement.

But now, what if you have a million users who do this operation 100x/day? Absolutely worth it!

For example, I sure wish atlassian would spend a tiny bit of effort into making jira faster. Even if it is 1 second per ticket, since I'm viewing 100+ tickets per day that adds up. And there's many hundreds of us at the company doing the same thing, it really adds up.

Nit: 50ms vs 2000ms is 40x speed increase, i.e. ~1.5 order of magnitude.

I still keep words of my database optimization lecturer who said that by his experience optimization below 1 OOM aren’t worth it and most „good ones” are 3+

> Absolutely worth it!

Long reaching assumption. Even the biggest companies have limited resources (even if vast). Would you rather improve load times by 2x (from 500ms to 250ms) or improve checkout reliability from 99% to 99.5%? And there is much more to consider on some levels (e.g. planning for thermal efficiency is fun).

Software development is always a game of choice.

> I still keep words of my database optimization lecturer who said that by his experience optimization below 1 OOM aren’t worth it and most „good ones” are 3+

Like everything, it depends. Is the query gating a lot of other things, especially things that can be done in parallel? Shaving 10 ms off might very well be meaningful. Is it a large OLAP query, and the owning team has SLAs that depend on it? Going from 60 --> 55 minutes might actually matter.

The two biggest performance-related issues with RDBMS that I deal with, aside from indexing choices, are over-selection (why on earth do ORMs default to SELECT * ?), and poor JOIN strategies. Admittedly the latter is often a result of poor schema design, but for example, the existence of semi/anti-joins seems to be uncommon knowledge.

If SLAs are involved then I’d argue it’s not about optimization but about business goals, which unsurprisingly take precedence.

But there is another case that is very similar: threshold passing (or how I like to call it - waterfalling). Small inefficiencies add up and at some point a small slowdowns reach critical mass and some significant system breaks everything else.

When system was designed by competent engineers huge optimizations aren’t easy, so it’s shaving couple millis here and couple millis there. But, as in the first case, I’d categorize it as a performance failure.

Most of the time they just move the expensive processing to the user's browser so they don't have to pay for it :)
Jira is incredibly slow, almost unusable. So awful.
More features = more customers = more revenue

More data collection = more revenue

More crap layered on = everything is slower

Everything sucks = more support demand = supply price leverage = more revenue

Enterprise software is necessarily slow and complicated, and not for your benefit.

I’m not quite following your point. It sounds like you’re agreeing that Jira sucks?
Yup
Yeah, so the pathological incentives to build bad products are just insane. It’s more evidence against the whole market-Darwinian hypothesis being good for the consumer. “Fitness” doesn’t equal (let alone define) product quality.
> 50ms instead of 2s

In the past I believe Google was very adament that page load time perception was very important to other metrics.

You're probably not going to achieve that with the kind of optimization described in this article though.
As always, https://xkcd.com/1205/ is a great reference to keep in mind.

That said, most developers (I assume, by most I mean myself) never work with code where the work they do has any serious impact on performance; it's layers on top of code that should be fast, but the work I do is implementing business logic and screens, in which case readability vastly trumps performance.

I mean right now I'm writing a line of code that can be represented as a nested ternary or a function call with some more written-out if/elses. The ternary outperforms the function call 2:1 or more, but either one can be executed hundreds of times a second BUT will only be executed once or twice per page view. It's not worth the tradeoff, even if this line of code will be executed hundreds of millions of times overall.

> You may make your function 2x faster, at a cost of additional complexity, but you never run it enough in a year for it to pay itself back.

I'm not talking about increased complexity, I'm talking about extremely basic things that take zero extra time, like using the correct data structure. For example, in Python:

    In [8]: a = array("i", (x for x in range(1_000_000)))
       ...: l = [x for x in range(1_000_000)]
       ...: d = deque(l)
       ...: for x in (a, l, d):
       ...:     print(f"{sys.getsizeof(x) / 2**20} MiB")
       ...:
    3.902385711669922 MiB
    8.057334899902344 MiB
    7.868537902832031 MiB

Very similar structures, with very different memory requirements and access speeds. I can count on one hand with no fingers the number of times I've seen an array used.

Or knowing that `random.randint` is remarkably slow compared to `random.random()`, which can matter in a hot loop:

    In [10]: %timeit math.floor(random.random() * 1_000_000)
    31.9 ns ± 0.138 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

    In [11]: %timeit random.randint(0, 1_000_000)
    163 ns ± 0.653 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
> All of which will change in a few years, which is fine, if you're also committing to keeping _all that code_ up to date right along with it.

With the exception of list comprehension over large ranges slowing down from 3.11 --> now, I don't think there's been much in Python that's become dramatically worse such that you would need to refactor it later (I gather the Javascript community does this ritual every quarter or so). Anything being deprecated has years of warning.

> which can matter in a hot loop:

163ns - 31.9ns == 131.1ns

This will need to happen 7.6 million times to save me 1 CPU second. On AWS lambda with 1GB of memory this will cost you a whopping: $0.0000166667.

The point is, you're not even wrong, but there are vanishingly few cases where it would actually matter to the bottom line in practice. You're taking an absolutist point of view to a discipline which thoroughly rejects it.

This is what I love about the cloud. It forces you to confront what your efforts are actually worth by placing a specific value on all of these commodities. In my experience they're often worth very little given that none of us have the scale of problems where this would show actual returns.

Reaching the scale where it shows actual returns isn't all that difficult. You need it to happen 7.6 million times to save 1 CPU second, but each CPU core can execute it nearly that many times every second.

Probably you don't leave it generating only random numbers all day, but suppose you do generate a good few, so that it's 1% of your total compute budget, and you have only a modest load, using on average four CPU cores at any given time. Then saving that amount of computation will have saved you something like $15/year in compute, recurring. Which isn't actually that bad a return for ten seconds worth of choosing the right function.

There are also a lot of even fairly small entities for which four cores is peanuts and they're running a hundred or a thousand at once, which quickly turns up the price.

And even the things with internet scale aren't all that rare. Suppose you're making a contribution to the mainline Linux kernel. It will run on billions of devices, possibly for decades. Even if it doesn't run very often, that's still a lot of cycles, and some of the kernel code does run very often. Likewise code in popular web browsers, javascript on popular websites or in popular libraries, etc.

You don't have to work for Google to make a contribution to zlib and that kind of stuff has the weight of the world on it.

Sure, but the cumulative effects of pervasive mediocre to bad decisions do add up. And it isn't just about cloud compute cost. Your own time is stolen by the slow ci jobs that you inevitably get stuck waiting for. For me, I prioritize my own personal happiness in my work and this mindset taken too far makes me unhappy.
> Very similar structures, with very different memory requirements and access speeds. I can count on one hand with no fingers the number of times I've seen an array used.

That is obvious when you actually check the access speed of arrays and find out it is about half that of lists on small integers (under 256), and worse on non-small integers. That is literally the opposite trade off of what you want in 99.99% of cases.

Deques are even less of a consideration, they’re unrolled linked lists so random access is impossible and iteration is slower, you use a deque when you need a deque (or at least a fifo), aka when you need to routinely manipulate the head of the collection.

It depends on your constraints. If you’re limited by RAM, arrays make a lot of sense for certain applications. If you need Python’s buffer protocol, again, they make a lot of sense.

As to deques, yes, they have specific uses, and being slightly smaller isn’t usually a selling point for them. My point was that I have seen many cases where an incorrect data structure was used, because a list or dict was “good enough.” And sure, they generally are, but if the language ships with other options, why wouldn’t you explore those?

Yes but if you want to do things in a less obvious way you should be aware of the downsides, such as bias in your random numbers. Also making sure you watch out for off by one errors.

Stolen the number to show this off well from a bug report somewhere:

    random_counter = Counter()
    
    for i in range(10_000_000):
        result = floor(random() * 6755399441055744) % 3
        random_counter[result] += 1

    print("floor method", random_counter.most_common(3))

    randint_counter = Counter()
    
    for i in range(10_000_000):
        result = randint(0, 6755399441055743) % 3
        randint_counter[result] += 1

    print("randint method", randint_counter.most_common(3))

Result

    floor method [(1, 3751972), (0, 3333444), (2, 2914584)]
    randint method [(1, 3334223), (2, 3333273), (0, 3332504)]
https://bugs.python.org/issue9025
Have you ran this in any modern version of Python? It’s been fixed for a long time.
3.10 so I redid it on 3.13.1, same results.
Ugh, I was checking `randrange` (as the bug mentions), not `random`. I stand corrected.
Ah yeah sorry I should have mentioned it wasn't the same, I used it as it has a nice number that shows the bias to a pretty extreme degree.
Python is 100% the wrong language to worry about this in. If your hot loops are in python and you care about performance, you should be rewriting them in another language.
Agreed; I used it partially because TFA used it to demonstrate ideas, and partially because I’m very familiar with it.

But you’re correct, of course. When I need something to go faster in Python, I write it in C. If it’s more than a small section, then a full rewrite is reasonable.

Even if so, their point still stands. It's a tiny change that grants huge speedups.
I was curious why randint was slower than random and found a good SO answer [0] (it's like chatgpt by humans for you youngsters out there), the gist of it is that `random()` calls a C function directly (which I presume goes straight to a syscall), whereas `randint` is implemented in Python and has a load of preconditions / defensive programming (which I presume is executed at every invocation, so seven potential branches etc it has to check every time).

Of course, if it's not in a tight loop and you need a whole integer between a range then it's the most developer-ergonomic way to get it. If you do need more performance but want to keep the ergonomics, writing your own random-int-between-two-numbers is fairly straightforward but it'll take some time.

[0] https://stackoverflow.com/a/58126026/204840

Ok, but neither of these matter until you know they matter. Seriously. Like, yes, it's nice they exist and that they are available for when you want them, but I would generally advise people to use a list or random.randint, if only because I value their confidence with them over the 2x performance win, because most workloads are not simply just a single array or random number generator loop. And, to be clear, I work on performance professionally: most of my job is not making things as fast as possible, but considering the tradeoffs that go into writing that code. I understand your example as showing off an interesting performance story but in the real world most workloads are more complex than what can be solved with using a rare but drop-in API.
You are saying that the potential gains are less than an order of magnitude. That mkes them a pretty hard sell in most instances.
> until you get into the millions of records per second level, you're almost never benefited

Yeah, but the software landscape is very diverse.

On my job (CAM/CAE) I often handle data structures with gigabytes of data. Worse, unlike e.g. multimedia frameworks many algorithms operating on these numbers are global i.e. can’t be represented as a pipeline which splits data into independent chunks and processes them sequentially.

Making performance critical functions twice as fast might saves hours of time for a single end user in a single day.