Hacker News new | ask | show | jobs
by signa11 3348 days ago
> Insertion and deletion are prime examples, often requiring you to move large chunks of memory.

on modern cpu's with huge caches, i seriously doubt this claim.

fwiw, i have played around with synthetic problems where: i insert sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequence of positions. and almost always, vectors outperform lists by at least couple of orders of magnitude.

or to put it another way, it is almost never about either lists / vectors or something else, and boils down to couple of 'rules' e.g. access data predictably (avoid trashing the cache), keep data compact (more cache utilization) etc. etc.

2 comments

> i insert sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequence of positions.

Which means that you scan the list twice for each node. First for insertion and then for deletion. Starting at maybe a hundred that would be much more efficient with a balanced search tree. They have a good Red-black tree in Linux.

But if it's only arrays vs lists - with linked lists, pointers to items are never invalidated. With dynamic arrays, they are.

Dynamic arrays are fine for plain old data arrays. But it's harder for data that has "identity" and is frequently mutated. And if a node is in multiple lists, I guess arrays are out. You'd need to store the data redundantly. Extra bad if you need atomic mutations.

Well on a 64-bit machine each node of a singly-linked list of integers will be 16 bytes (8 byte pointer + 4 byte integer + 4 byte for padding). That's 300% overhead. Becomes 500% for doubly-linked lists.

So it's not surprising that linked lists perform very poorly in such circumstances.