Hacker News new | ask | show | jobs
by avodonosov 3132 days ago
He says because of cache misses due to each element being allocated individually.

But then working with references is bad in general? If your vector stores references to objects it's bad? Your object fields point to strings - bad too.

Should we ditch Lisp and Java?

Note, data should not be continuous to stay in cache. Cache is more like paged virtual memory, it consists of cache lines.

Before subscribing to the "never use linked lists" view I would like to see a nontrivial program rewritten from lists and references to the "allocate everything at once" approach and measure the performance.

If you want to avoid references and allocate everything in place, it may force your program to do a lot of copying.

Also, not only CPU performance matters - programmer performance is important too. So-called "high-level languages" try to optimise this.

6 comments

>But then working with references is bad in general? If your vector stores references to objects it's bad? Your object fields point to strings - bad too.

It's not strictly bad, but it's useful to minimize the number of pointer derefrences you have wherever you can. A non-intrusive linked list will have 2 pointer dereferences to access any bit of data. You'll also have n+1 pointer dereferences to access element n. If you have fixed size small objects, then a vector of values is almost always better than a vector of pointers to the small objects. An intrusive linked list will save you a pointer dereference, but you still have the n dereferences to access element n.

>Should we ditch Lisp

Lisp's lists model a linked list with chains of cons cells, but there's no hard requirement for them to actually be implemented as linked lists. A typical approach is to implement lists in terms of Bagwell's VLists, which are a kind of middle ground between linked lists and vectors. You have reduced number of pointer dereferences, plus increased cache locality, whilst still being able to log n index, insert, delete, and not require large up-front allocations.

>and Java

If you subscribe to the "everything is an object" model religiously, then yes, you're probably doing harm. As always, there's no hard rules here and it always depends on your problem and data access requirements. You can usually get performance gains by using memory pools, arrays of structures/primitives, and entity systems instead of inheritance hierarchies.

You're misunderstanding the advice about contiguous. It's not that it's more likely to stay in cache, but if you're accessing data sequentially it's more likely the data you're going to access next is already in cache.

Most (all I've read/looked at) benchmarks in Java have data structures backed by linked lists utterly smashed by things implemented by an array.

There was in the last year or two a good c++ talk where a game or game engine changed their storage to be more array like and got roughly a 30% boost in performance. Memory locality is usually king, which is why linked lists are rarely used.

If you already know your data (which must be the case with arrays anyway), you can simply pre-allocate a pool from where the elements of the linked list will be taken. This guarantees that elements will be allocated from the same region in memory. It is a simple technique that I use whenever I believe that a linked list will be needed in a high-performance context.
Vectors are also usually more memory-efficient, period. A singly-linked list uses a pointer for each element an a doubly-linked uses two, while a vector has constant [a pointer and two integers (size and capacity)] overhead. (Unless the vector was dynamically resized and isn't near capacity.)
> if you're accessing data sequentially it's more likely the data you're going to access next is already in cache

Why?

Cache consists of cache lines (usually 64 bytes). If an element in the list is smaller than the cache line a whole line is put in the cache anyway. This means that data is cached that you have not read yet. When using a contiguous array this unread data contains the next elements. So they are cached without ever having been read.
Cache prefetching. (https://en.wikipedia.org/wiki/Cache_prefetching)

Essentially, many workloads access data sequentially and therefore modern cache architectures have special optimizations to make these memory accesses as fast as possible, by prefetching the next item in the sequence before it is actually needed.

Ah, yes.

Well, anyways I value code simplicity more than anything.

Maybe CPUs will learn to do cache prefetch for link oriented languages: if a "load and dereference" pattern is detected, CPU could prefetch the data referenced by pointers it has in registers or recently fetched cache line.

BTW, linked list elements are not necessary located far away from each other, if we allocated them one after another chance are they are near each other in memory.

the time complexity of insert etc is superior and a good reason for abstraction on top of an array, I would think.
Dependent on various internal arbitrary or intentional decisions, copying garbage collectors can in their normal workings place linked list cells in order in memory. While this does still take more memory than arrays, it can take advantage of the caching & sequential auto prefetch as well.
Common Lisp has value types, it is not all about lists.

Likewise Java has primitive types, arrays and eventually will get value types, because they already feel the pressure in FinTech of not having them.

> data [needn't] be continuous to stay in cache. Cache is more like paged virtual memory, it consists of cache lines.

It sounds like you're saying everything will be OK so long as each chunk of 64 bytes (cache line) is used together. But one page is typically 4 KB, and if you use for example all 64 bytes of one cache line, but only one cache line per page, you will suffer from TLB misses.

If you throw your computer out of window it may break. Why whould I access only 64 bytes per page? I suggest you to delete this comment.

We are not discussing VM here, I brought it as an analogy just to say cache lines are independent and need not be contineous.

Yes, organizing your data in an array of structures is also bad for performance.
Doesn’t this entirely depend on access patterns?

If you read one field of all elements all at once, sure, a structure of arrays is probably right. But what if you instead read all fields of one element at once? Isn’t it advantageous to have those fields adjacent in memory with an array of structures?