Hacker News new | ask | show | jobs
by sgarland 547 days ago
> You may make your function 2x faster, at a cost of additional complexity, but you never run it enough in a year for it to pay itself back.

I'm not talking about increased complexity, I'm talking about extremely basic things that take zero extra time, like using the correct data structure. For example, in Python:

    In [8]: a = array("i", (x for x in range(1_000_000)))
       ...: l = [x for x in range(1_000_000)]
       ...: d = deque(l)
       ...: for x in (a, l, d):
       ...:     print(f"{sys.getsizeof(x) / 2**20} MiB")
       ...:
    3.902385711669922 MiB
    8.057334899902344 MiB
    7.868537902832031 MiB

Very similar structures, with very different memory requirements and access speeds. I can count on one hand with no fingers the number of times I've seen an array used.

Or knowing that `random.randint` is remarkably slow compared to `random.random()`, which can matter in a hot loop:

    In [10]: %timeit math.floor(random.random() * 1_000_000)
    31.9 ns ± 0.138 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

    In [11]: %timeit random.randint(0, 1_000_000)
    163 ns ± 0.653 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
> All of which will change in a few years, which is fine, if you're also committing to keeping _all that code_ up to date right along with it.

With the exception of list comprehension over large ranges slowing down from 3.11 --> now, I don't think there's been much in Python that's become dramatically worse such that you would need to refactor it later (I gather the Javascript community does this ritual every quarter or so). Anything being deprecated has years of warning.

7 comments

> which can matter in a hot loop:

163ns - 31.9ns == 131.1ns

This will need to happen 7.6 million times to save me 1 CPU second. On AWS lambda with 1GB of memory this will cost you a whopping: $0.0000166667.

The point is, you're not even wrong, but there are vanishingly few cases where it would actually matter to the bottom line in practice. You're taking an absolutist point of view to a discipline which thoroughly rejects it.

This is what I love about the cloud. It forces you to confront what your efforts are actually worth by placing a specific value on all of these commodities. In my experience they're often worth very little given that none of us have the scale of problems where this would show actual returns.

Reaching the scale where it shows actual returns isn't all that difficult. You need it to happen 7.6 million times to save 1 CPU second, but each CPU core can execute it nearly that many times every second.

Probably you don't leave it generating only random numbers all day, but suppose you do generate a good few, so that it's 1% of your total compute budget, and you have only a modest load, using on average four CPU cores at any given time. Then saving that amount of computation will have saved you something like $15/year in compute, recurring. Which isn't actually that bad a return for ten seconds worth of choosing the right function.

There are also a lot of even fairly small entities for which four cores is peanuts and they're running a hundred or a thousand at once, which quickly turns up the price.

And even the things with internet scale aren't all that rare. Suppose you're making a contribution to the mainline Linux kernel. It will run on billions of devices, possibly for decades. Even if it doesn't run very often, that's still a lot of cycles, and some of the kernel code does run very often. Likewise code in popular web browsers, javascript on popular websites or in popular libraries, etc.

You don't have to work for Google to make a contribution to zlib and that kind of stuff has the weight of the world on it.

Sure, but the cumulative effects of pervasive mediocre to bad decisions do add up. And it isn't just about cloud compute cost. Your own time is stolen by the slow ci jobs that you inevitably get stuck waiting for. For me, I prioritize my own personal happiness in my work and this mindset taken too far makes me unhappy.
> Very similar structures, with very different memory requirements and access speeds. I can count on one hand with no fingers the number of times I've seen an array used.

That is obvious when you actually check the access speed of arrays and find out it is about half that of lists on small integers (under 256), and worse on non-small integers. That is literally the opposite trade off of what you want in 99.99% of cases.

Deques are even less of a consideration, they’re unrolled linked lists so random access is impossible and iteration is slower, you use a deque when you need a deque (or at least a fifo), aka when you need to routinely manipulate the head of the collection.

It depends on your constraints. If you’re limited by RAM, arrays make a lot of sense for certain applications. If you need Python’s buffer protocol, again, they make a lot of sense.

As to deques, yes, they have specific uses, and being slightly smaller isn’t usually a selling point for them. My point was that I have seen many cases where an incorrect data structure was used, because a list or dict was “good enough.” And sure, they generally are, but if the language ships with other options, why wouldn’t you explore those?

Yes but if you want to do things in a less obvious way you should be aware of the downsides, such as bias in your random numbers. Also making sure you watch out for off by one errors.

Stolen the number to show this off well from a bug report somewhere:

    random_counter = Counter()
    
    for i in range(10_000_000):
        result = floor(random() * 6755399441055744) % 3
        random_counter[result] += 1

    print("floor method", random_counter.most_common(3))

    randint_counter = Counter()
    
    for i in range(10_000_000):
        result = randint(0, 6755399441055743) % 3
        randint_counter[result] += 1

    print("randint method", randint_counter.most_common(3))

Result

    floor method [(1, 3751972), (0, 3333444), (2, 2914584)]
    randint method [(1, 3334223), (2, 3333273), (0, 3332504)]
https://bugs.python.org/issue9025
Have you ran this in any modern version of Python? It’s been fixed for a long time.
3.10 so I redid it on 3.13.1, same results.
Ugh, I was checking `randrange` (as the bug mentions), not `random`. I stand corrected.
Ah yeah sorry I should have mentioned it wasn't the same, I used it as it has a nice number that shows the bias to a pretty extreme degree.
Python is 100% the wrong language to worry about this in. If your hot loops are in python and you care about performance, you should be rewriting them in another language.
Agreed; I used it partially because TFA used it to demonstrate ideas, and partially because I’m very familiar with it.

But you’re correct, of course. When I need something to go faster in Python, I write it in C. If it’s more than a small section, then a full rewrite is reasonable.

Even if so, their point still stands. It's a tiny change that grants huge speedups.
I was curious why randint was slower than random and found a good SO answer [0] (it's like chatgpt by humans for you youngsters out there), the gist of it is that `random()` calls a C function directly (which I presume goes straight to a syscall), whereas `randint` is implemented in Python and has a load of preconditions / defensive programming (which I presume is executed at every invocation, so seven potential branches etc it has to check every time).

Of course, if it's not in a tight loop and you need a whole integer between a range then it's the most developer-ergonomic way to get it. If you do need more performance but want to keep the ergonomics, writing your own random-int-between-two-numbers is fairly straightforward but it'll take some time.

[0] https://stackoverflow.com/a/58126026/204840

Ok, but neither of these matter until you know they matter. Seriously. Like, yes, it's nice they exist and that they are available for when you want them, but I would generally advise people to use a list or random.randint, if only because I value their confidence with them over the 2x performance win, because most workloads are not simply just a single array or random number generator loop. And, to be clear, I work on performance professionally: most of my job is not making things as fast as possible, but considering the tradeoffs that go into writing that code. I understand your example as showing off an interesting performance story but in the real world most workloads are more complex than what can be solved with using a rare but drop-in API.
You are saying that the potential gains are less than an order of magnitude. That mkes them a pretty hard sell in most instances.