Hacker News new | ask | show | jobs
by lenticular 2584 days ago
Check out Funkia List [0], an persistent O(log n) random access list implementation. I use it pretty extensively when I'm writing typescript.

It actually can beat native arrays on concat and push, often by very wide margins, while also being immutable. These kinds of operations are actually much easier to optimize if you can assume immutability.

[0] https://funkia.github.io/list/benchmarks/

2 comments

It’s obvious how to make concat fast if one is willing to give up O(1) random access operations. Push is already O(1).
Modern persistent tree-based data structures are usually O(log32n) random access, which is essentially O(1) for all practical purposes. With a mutable array of references, random access follows one pointer, while these structures follow usually less than 5. Sequential access is usually O(1).

These kind of persistent data structures are already the standard data structures in Scala and Clojure, and they are fast enough for the vast majority of non-numerical purposes. In typical access patterns, they are faster than mutable arrays.

As N approaches infinity O(log32 n) approaches O(log n). And really, we don’t care much about N until it gets very large.

> In typical access patterns, they are faster than mutable arrays.

Only if typical is extremely random on large lists that consume many pages/cache lines.

Sequential access is O(1) just like mutable arrays. These structures really are fast and practical. The benefits of immutability are enormous. That's why they are so popular.

Edit: That's true about log32 and log2 become close in the limit, but that's irrelevant for practical data sizes. For example,

log2(10^6) ~ 20,

log32(10^6) ~ 4

That's a 5x difference.

Sequential access in an array has caching advantages because the memory is contiguous. For a linked structure, that is often not true. Linked structure mostly suck at sequential accesses. Extreme random access can actually work against arrays for the same reason.

There is a good reason why the subscript of log is often not even mentioned. Logarithmic is logarithmic, no longer what subscript you bother in investing extra space for.

log32n/log2n is always 5, highlighting why this is such a meaningless comparison. O(logn) (basically) refers to a constant multiple clogn, by which logic O(log32n) = O(5log2n) = O(log2n) = O(logn). It's all the same.

Case in point, if your algorithm takes log32n operations but each OP takes 5x longer its exactly the same as log2n. This is true for any value n, not just large values.

Ah! My explanation was wrong in that sense, thanks for the correction.
Whoops, brain fart.
The problem is making concat fast given the original array still exists and is mutable.

In the major JS engines string concat is essentially O(1) because strings are immutable (and they flatten them out as appropriate according to whatever heuristics make sense).

But for array concatenation in js i can do

a.concat(b)

Followed by

a[0]=something

Or

Delete a[1]

Or

A.push(something)

Or a.length++

Etc

To make this particular array concat fast would require significant perf impact to all other arrays, even those that aren’t involved in concat

Why are concat and push easier to optimise if you can assume immutability?
These persistent structures use what's called structural sharing. The simplest persistent structure is just a regular linked list. While random access is O(n), inserting, deleting, or concating is O(1). Since the list is not mutated, we can insert an element to the head of the list just by taking the new element and making a pointer between it and the original list. The original list is not modified, nor is it copied.

More sophisticated persistent structures like Funkia List have O(log32n) access, which is basically constant time. This makes them better general-purpose data structures than mutable arrays.