Hacker News new | ask | show | jobs
by lenticular 2586 days ago
Modern persistent tree-based data structures are usually O(log32n) random access, which is essentially O(1) for all practical purposes. With a mutable array of references, random access follows one pointer, while these structures follow usually less than 5. Sequential access is usually O(1).

These kind of persistent data structures are already the standard data structures in Scala and Clojure, and they are fast enough for the vast majority of non-numerical purposes. In typical access patterns, they are faster than mutable arrays.

1 comments

As N approaches infinity O(log32 n) approaches O(log n). And really, we don’t care much about N until it gets very large.

> In typical access patterns, they are faster than mutable arrays.

Only if typical is extremely random on large lists that consume many pages/cache lines.

Sequential access is O(1) just like mutable arrays. These structures really are fast and practical. The benefits of immutability are enormous. That's why they are so popular.

Edit: That's true about log32 and log2 become close in the limit, but that's irrelevant for practical data sizes. For example,

log2(10^6) ~ 20,

log32(10^6) ~ 4

That's a 5x difference.

Sequential access in an array has caching advantages because the memory is contiguous. For a linked structure, that is often not true. Linked structure mostly suck at sequential accesses. Extreme random access can actually work against arrays for the same reason.

There is a good reason why the subscript of log is often not even mentioned. Logarithmic is logarithmic, no longer what subscript you bother in investing extra space for.

log32n/log2n is always 5, highlighting why this is such a meaningless comparison. O(logn) (basically) refers to a constant multiple clogn, by which logic O(log32n) = O(5log2n) = O(log2n) = O(logn). It's all the same.

Case in point, if your algorithm takes log32n operations but each OP takes 5x longer its exactly the same as log2n. This is true for any value n, not just large values.

Ah! My explanation was wrong in that sense, thanks for the correction.
Whoops, brain fart.