Hacker News new | ask | show | jobs
by bmon 3348 days ago
Insertion and deletion are prime examples, often requiring you to move large chunks of memory. The same operation just requires pointer changes (after finding where you want to insert or delete). I think in this example it would be common to be removing structs from the center of the list.
2 comments

Insertion and deletion _were_ prime examples. It's not necessarily the case with today's hardware, especially until you reach some threshold of the number of elements (usually hundreds of thousands).

You say "the same operation just requires pointer changes (after finding where you want to insert or delete)" like it's some trivial thing that doesn't enter the equation. But finding where you want to insert or delete in a non-contiguous linked list is not a consideration that should be an afterthought in brackets.

Furthermore, when you have found the location, it's still not _just_ pointer changes. You actually have to allocate the memory for the node, too.

Traversing pointers and allocating memory is a lot slower than working with a contiguous block of memory (assuming you don't have to re-allocate the contiguous block of memory, of course).

Either way, don't take my word for it. If you're a C++ programmer, just benchmark std::vector against std::list, or watch this video - https://youtu.be/YQs6IC-vgmo

You are right; all your arguments are valid in user space programming with normal (certain kinds) of data. But to be honest, you sound like someone who hasn't tried kernel or embedded development. You must have some experience with it to understand why these things don't always hold true.

> Furthermore, when you have found the location, it's still not _just_ pointer changes. You actually have to allocate the memory for the node, too.

This is not normally true. In embedded development and game engine memory managers, data nodes already include list pointers, and there are several ways the data nodes can come to exist that don't involve a separate allocation per node. Sometimes it's allocations of blocks of nodes. Sometimes it's slab allocations (to use the terminology from the article) - a memory manager that pre-allocates constant sized memory blocks.

Kernel lists don't normally involve hundreds of thousands of elements, and using dynamic arrays for small lists, the separate alloc & realloc time may not be even close to amortized away.

> just benchmark std::vector against std::list

This is unconvincing! Kernel & embedded code do not normally use std::list. There are very good reasons people warn against kernel development in C++! And performance benchmarks are the lowest item on the priority & constraint list. The least bad thing that can happen is going slower. Fragmentation is a bigger priority, and dynamic arrays have much worse fragmentation patterns because the allocation size changes over time.

Don't forget that there are valid uses of linked lists even in 2017. If there are more insertions / removals than scans, an array makes no sense.

Reallocating a big dynamic array is also a no-go in many time-constrained situations.

> You actually have to allocate the memory for the node, too.

The way this is done is by intrusive linking (the data must include list pointers). The data usually knows in which lists it is linked, so there is no point in having a version of the data without a list head.

Almost any insertion requires a scan.
Well, given that the pointers are contained in the nodes themselves, if a piece of code is already looking at a node when deciding if an insertion is needed, then no additional scan is required.
Many do, but there are also insertions in unsorted lists or insertions after a given element...
Cache means inserts into arrays are often faster than linked lists. Remember RAM is really really big and slow.
Not if the list is allocated within a slab, as described in the article. Imagine list elements allocated continuously (as in an array) but the ordering is decided by the pointers within each node, instead of assuming a fixed order. All the elements still are in the same cache line.
Linked lists take more memory than arrays, so even in your example it's not as obvious as your making it out to be. Further, your ordering assumes no deletes.
>... inserts into arrays are often faster than linked lists.

don't forget, deletes, as well.

Yea, it's really seeks being horribly slow on out of cache linked lists being horribly slow, and then inserts and deletes being fast once you find the right node.
I agree that in userspace, linked lists rarely are the way to go. Chasing pointers is too slow.

We're talking about kernelspace, though, and different rules apply there. Memory allocation works very differently, and you have to handle every possible error path.

I was trying to highlight that the need to find the target for insertion or deletion was still a notable operation, not as an afterthought. Sorry it came out that way! That said, you still might need to perform a similar operation to find the target for deletion when using an array. The same is true for allocations (as you pointed out).

I'm don't know how the slab cache allocation would benchmark against allocating larger chunks, I was just trying to give a classical answer to the question.

That's all well-and-good for benchmarks that don't need to delete elements, but suboptimal for kernel data-structures that don't require O(1) find but are mostly round-robinned like multiple sockets. Using reallocated arrays instead linked lists also increases memory bandwidth and puts kernel data-structures at increased risk of random bit flips (most machines lack Chipkill ECC), in addition to cache evictions.
> Insertion and deletion are prime examples, often requiring you to move large chunks of memory.

on modern cpu's with huge caches, i seriously doubt this claim.

fwiw, i have played around with synthetic problems where: i insert sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequence of positions. and almost always, vectors outperform lists by at least couple of orders of magnitude.

or to put it another way, it is almost never about either lists / vectors or something else, and boils down to couple of 'rules' e.g. access data predictably (avoid trashing the cache), keep data compact (more cache utilization) etc. etc.

> i insert sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequence of positions.

Which means that you scan the list twice for each node. First for insertion and then for deletion. Starting at maybe a hundred that would be much more efficient with a balanced search tree. They have a good Red-black tree in Linux.

But if it's only arrays vs lists - with linked lists, pointers to items are never invalidated. With dynamic arrays, they are.

Dynamic arrays are fine for plain old data arrays. But it's harder for data that has "identity" and is frequently mutated. And if a node is in multiple lists, I guess arrays are out. You'd need to store the data redundantly. Extra bad if you need atomic mutations.

Well on a 64-bit machine each node of a singly-linked list of integers will be 16 bytes (8 byte pointer + 4 byte integer + 4 byte for padding). That's 300% overhead. Becomes 500% for doubly-linked lists.

So it's not surprising that linked lists perform very poorly in such circumstances.