Hacker News new | ask | show | jobs
by winterlight 3201 days ago
Ironically, in that situation, using a list might end up being more efficient due to cache locality. Or not. That's why measuring is so important, since performance can be a very counterintuitive subject. Hard-data should always prevail over theory and guesswork.
6 comments

Bad practitioners in any field crudely invoke and naively apply a trite toolbox of mantras. Detached from reality and driven by insecure ego, they become the problem by using magical thinking from authoritarian logic and delude themselves of the reality of what they are actually doing.

In any skill, people can fall prey to cults of myth and mysticism as they merit based on adherence to orthodoxy rather than suitability. Programming is no different and sometimes it's hard to hear anybody think over the mooing of all the sacred cows.

For any realistic workload containing a mix of failed-lookups, and lookup-hits midway through the list, we're talking about many thousands of comparisons for a single lookup on average. Regardless, replace list with linked-list in the above example, for illustrative purposes.

I agree that measurements & hard-data are preferable to guesswork, but it takes time and energy to gather these measurements and hard-data as well. For minor decisions where the alternative proposal is very slightly more complex, but there's a very compelling reason to assume order-of-magnitude performance improvements, I would argue that gathering data is a waste of time.

"Lists", presumably referencing a linked list, have horrible cache locality. You were thinking of an array?
The rise of Python and similar languages have muddied the distinction between "lists" and "arrays".
If it's that bad, it should be easy to discover what is fastest.

But the way too common situation is that you look at a program under development and tell people "no way, that list will be too large to keep searching, you should design it around a hash set", and people reply something about premature optimization and keep going, then it gets released and is too slow to use so you get to rewrite all their code under pressure.

Vector or array? Sure. List is worst of both worlds
Isn't a degenerate hash set a linked list?
If it uses open hashing, yes. A degenerate closed-hashing hash set becomes an array.