Hacker News new | ask | show | jobs
by p4wnc6 3582 days ago
I would say it differently: you absolutely should trust big O numbers for data structures and use that as a major guiding principle in terms of how to structure a program ... and you should require extraordinary evidence, like a surprising result in a performance test, before navigating away from trusting the big O analysis.

What this all says to me is that for an extreme claim (e.g. using an array and doing a linear search will be faster than a hashmap lookup) it should require equally extreme evidence to substantiate it (e.g. the results of performance tests that stand up to heavy scrutiny).

I feel there is a trap in which someone might look at this sort of thing and say, a-ha I don't actually need to care about big O reasoning or standard understanding of data structures at all! That would be a tragic misreading of this sort of example.

2 comments

> before navigating away from trusting the big O analysis.

I don't think that's the spirit of the article. The spirit is to not assume O(1) is O(1) if you work with a physical computer and care about anything beyond how many times your program counter increments. Make sure to use big O notation that includes the complexities of carrying out those single instructions.

If you work with more than 32k of data, you either need to modify your big O to take latency into account, or you're assuming some worst case cache level, which is fine, and what you're doing with an idealistic big O...but that's a terrible assumption if you're actually trying to achieve speed or realistic metrics.

I agree. That's why the article is wrong-headed in its presentation.

Generally when you begin working on problems where this sort of thing is relevant, you have to make choices to get something going. You want to avoid premature optimization and you need something on the ground. In short, you have to assume O(1) is O(1) ... and it basically always is, except when there's evidence that it's not.

The value of the article is to point out cases when these abstractions break down, and the value of performance testing. But carrying it to an extreme such as, "Never make any assumptions about how any data structures work until you've performance tested every single thing in your application" is of course wildly unproductive.

> using an array and doing a linear search will be faster than a hashmap lookup

For small enough sets of data, this will likely be true, simply because you don't have to compute a hash.

Of course, it likely won't scale very well.