Hacker News new | ask | show | jobs
by johnwalker 4536 days ago
Some time ago, hypirion wrote about how persistent vectors are implemented in Clojure.

http://hypirion.com/musings/understanding-persistent-vector-...

There are actually several different implementations of the core data structures:

https://github.com/clojure/data.finger-tree

https://github.com/clojure/core.rrb-vector

Of course, the short answer is that Clojure isn't using classic linked lists, and that regular Java collections can be used and created pretty easily. [0] [1]

[0] https://stackoverflow.com/questions/4313505/converting-cloju...

[1] http://blog.raek.se/2011/01/24/executors-in-clojure/

1 comments

Thanks for the great resources. I still wonder to what extent is it at all possible to do cache-friendly data structures on the JVM. Besides the somewhat abstract data structure considerations of the kind posted, there is still the issue of whether in the end references are stored in the collections memory or the values themselves. On the surface level it looks like Java collections generally use references for everything, except perhaps for arrays and (maybe) ArrayLists of primitive types. See for example:

http://java.dzone.com/articles/slab-guaranteed-heap-alignmen...

Java currently only stores primitives inline (in arrays or in objects). However, the memory allocation scheme (at least in HotSpot) places objects that are created one after another by the same thread consecutively in memory.

Nevertheless, cache-friendly data structures are still very much possible on the JVM. Consider something like a B-Tree node, with an array of references to the children and an array of keys. The node object and the arrays will be stored together in memory, so the only cache-missing pointer following you'd do is when you jump from one node to another, just as would happen in C/C++. So the memory placement is certainly not as flexible as in C/C++, but you can certainly be pretty cache friendly. In fact, this is a very active topic, as a lot of people are starting to use Java in low latency applications. For example, look at the discussions here: https://groups.google.com/forum/#!forum/mechanical-sympathy

There is work being done to support value types in Java, that are immutable, cannot be referenced (hence only compared by value), and stored inline in arrays (well, the latter is an implementation issue and up to the particular JVM to decide). This may be included in Java 9.

I'll have to defer to this paper. It mentions that the core persistent vectors are cache-friendly, but I would suspect that the llvm will always be an upper-bound for the jvm.

infoscience.epfl.ch/record/169879/files/RMTrees.pdf‎

I'm very interested in better answers to this question, though.