Hacker News new | ask | show | jobs
by sfink 760 days ago
Linked lists get a bum rap.

Yes, if you have a simple choice between a vector and a linked list, then the vector is vastly superior due to locality and (for non-intrusive linked lists) allocations. So much so that vectors often win even when you're doing lots of O(n) deletions that would be O(1) with a linked list.

But that doesn't mean that linked lists are useless! A vector gives you a single ordering. What if you need multiple? What if you need a free list, which you're never going to be iterating over but will just grab off an item at a time? I find it quite common to have one "natural" order, for which I will use a vector (or equivalently, a bump allocator of fixed-size items), and then one or several auxiliary orders like the entries belonging to a particular owner or the ones that will need to be visited for some sort of cleanup or an undo list or some sort of stack or queue of pending items to process. Your common iteration will be in memory order, but that doesn't mean you won't ever want to do different iterations.

It annoys me that this is always omitted, with the attitude that linked lists are obsolete and useless because vectors be moar better faster gooder, to the point that it's a waste of time to learn how to manipulate linked lists anymore.

I guess a lot of this is probably due to the popularity of high level languages, where you're just dealing with references to everything in the first place. But in those, the arguments in favor of vectors are often not valid because that blessed golden vector of Objects is giving you no more locality than giving your Object a `next` field: the vector of Objects is represented internally as a vector of pointers to the actual Object data, so your oh so nicely cached lookups are doing memory reads that are 100% pure overhead compared to following a `next` link from an Object that fits into a cache line. In both cases, your performance is going to be limited by the locality of your Object data, which is the same whether you have a vector of pointers or Object data with an intrusive pointer.

Also, if you have an existing setup and need to introduce a new list, it is sometimes far easier to drop in a `next` (and maybe `prev`) field than to refactor everything to accommodate a new vector. Especially since the vector will move all of your data when resizing the vector, which invalidates any pointers you might be using for other purposes. If you'll be iterating that list frequently, then the vector may very well be a good idea. If it's just for error cases or slow paths, then linked lists really aren't bad at all.

I'm not trying to argue for linked lists here so much as arguing against the blanket arguments against them.

</rant>

1 comments

> What if you need multiple?

You can very much have a single vector owning the memory, and do all other ordering over auxiliary vectors of indices. Should be cheaper and faster than holding more linked lists.

If you want to remove elements, you can very much use tombstones to flag deleted elements and then clean up after some threshold.

> You can very much have a single vector owning the memory, and do all other ordering over auxiliary vectors of indices. Should be cheaper and faster than holding more linked lists.

Cheaper in space depends on the percentage of elements in the auxiliary list. An intrusive list has space for every element to be in the list, which is wasteful if few are. A vector that grows by doubling could waste nearly half of its elements. Indexes can often be smaller than pointers, though, which favors the vector approach.

Faster is debatable. Iterating the vector of indexes is quite fast, but indirecting into the data in a random order is still slow. An intrusive linked list doesn't need to do the former, only the latter. (Then again, it also bloats up the data slightly, which matters for small items since fewer fit in cache.)

The main reason why linked lists could still be at an advantage is if you don't want allocations in your auxiliary list manipulation path. Maybe you don't want the unpredictable timing, or maybe you can't handle failure.

I agree on tombstoning, but note that you're giving up some of the vector advantages by doing so. Your processing loop now has a much less predictable branch in it. (Though really, the vector probably still wins here, since the linked list is built out of data dependent accesses!)

Sometimes these auxiliary lists need to contain things that are no longer in the main list, too, as in the case of a free list. (And if you swap such things to the end, then you've just invalidated all of your indexes.) And non-intrusive linked lists can point to variable-size elements (though they lose most of the other benefits of an intrusive linked list.)

Anyway, it's the usual "use the right tool for the right job", I just claim that linked lists are sometimes the right tool.

(Random side note: I use "indexes" instead of "indices" these days, for a silly reason—"indices" is a fine word, I have no issue with it, but it seems to encourage some people to use the non-word "indice" which is an abomination.)