Hacker News new | ask | show | jobs
by beagle3 2204 days ago
In the big-Oh algorithmic complexity sense; in a loose sense, for any pair of implementations (radix sort, kr sort) there exists a word size w and a list size n such that If either w or n increases, the time for radix sort would increase more quickly than for Kr sort - and this, eventually kr would be faster and keep getting faster. (Assuming that the hash can indeed yield average O(1) access, which is probabilistically but not deterministically true)

That said, word size w is, in almost all integer dieting problems, bounded by 128 (by 64 or even 32 with high probability) which makes it acceptable to regard as “constant” in which case both sorts are essentially O(n) and it all depends on specific implementations (with radix sort likely significantly faster in practice)

1 comments

Big-O as commonly used in the CS literature sometimes doesn't translate to Big-O on actual computers. For example virtual memory translation can add a log term where you wouldn't expect it: https://pdfs.semanticscholar.org/1e90/c55362cf7793dc0b2521f6...
That's a common misunderstanding of what "Big-O" is, but rephrased as "the models used for algorithm analysis assume very simple machines that don't represent actual computers all that well" your point is very valid. There's a whole lot of things that our computers do that aren't accounted for in the RAM model (which is most commonly used when analysing sequential algorithms). Memory hierarchies are a big one (the external memory model accounts for a two-level hierarchy, and can be used to reason about cache usage or external memory ("out-of-core")), but other things such as branch (mis-)predictions or virtual memory translation (TLB) are rarely accounted for. That's what the field of Algorithm Engineering is about: designing algorithms that have good worst case guarantees ("Big-O") but that are also really fast when implemented on real-world hardware. (Given the publication list on your website, you probably know all this, but I wanted to expand on it for others)
There is the idea that you should treat memory access as an O(N^.5) operation:

https://github.com/emilk/ram_bench

I am not sure if any serious academic work has been built on this model, but it's a nice short hand.

This is somewhat universal. (Some physical insights)

Naively, to achieve optimal access time, you can pack your memory within a sphere of radius R, and R=O(N^(1/3)).

But, for large R you start having cooling problems. If each memory element needs some power P to operate, then the total power consumption is P×N = O(R^3). But your area is only 4pi R^2, so the power flow per unit area is O(R)=O(N^(1/3)). So if it has large radius, and it has limited thermal conductivity, your memory will melt (since temperature ~ power flow^(1/3) (Plank's law)).

The threshold for stable temperatures at any radius is memory access as O(N^(1/2)).

This analysis is valid for general computing and circuits, but since computers are usually modeled as memory machines I think that's sufficient (?).

Obs: Why, or how, is the human brain roughly spherical then? Because we have a very effective (water based) cooling system. Still, if it got large enough, and you admit limited flow rates of water and such, cooling eventually would be limiting. If you immediately thought of elephants, so did I, and this may be linked to their fantastic large ears:

https://asknature.org/strategy/large-ears-aid-cooling/

I love how everything is connected.

Obs2: Yes this is related to the Bekenstein bound, but much more relevant of course (because existing RAM is almost thermally limited and you need black hole densities to achieve bekenstein bound). The memories we use are organized in (mostly) flat packages.

The true spherical cow model of circuits.

I do wonder though if this is really the mechanism behind the observed N^.5 law. As you allude to with Bekenstein, just because there is an eventual physical limit doesn't mean the structure of real hardware mirrors it.

Also, we are not limited to dissipation to transport heat away...

Well the problem with that is "this is a curve that roughly fits the data" is a bad way to go about constructing a model. It's a useful and neat observation, but that doesn't make it a good model. It might model random accesses to memory reasonably well (such as traversing a linked list, the example used there), but it doesn't model a scan over an array well. That doesn't make for a useful model. In contrast, the external memory model assumes a fast internal memory of size M (e.g., cache), which can be accessed in constant time, and a slow external memory of infinite size (e.g., RAM) which can only be accessed in blocks of size B (e.g., a cache line). Then you count how many blocks the algorithm needs to read or write. Now, scanning an array of size N takes O(N/B) I/Os, whereas scanning a linked list of the same size takes O(N) I/Os. The complexity of sorting is O(N/B log(N/B)/log(M/B)) I/Os. This models the same behaviour in a much cleaner way, applies equally to all levels of the memory hierarchy (you can also view RAM as the internal memory and a hard disk/SSD as the external memory), and is widely used in what you called "serious academic work" :) See also https://en.wikipedia.org/wiki/External_memory_algorithm

Furthermore, the introduction in the article you linked misunderstands Big-O notation so incredibly fundamentally that I don't think the author has done their background reading on machine models and Big-O notation.

It is not obvious that a two level model is a cleaner way to think about todays memory access, which has 4-5 levels of caches before you even hit possibly NUMA RAM, then an SSD, then a HDD and then maybe big datasets that can only be accessed over the network.

But then, I am a physicist, not an engineer, so to me starting from empirical observations is actually a very good way to construct a model.

Well you can apply it to any pair of (adjacent) levels of the memory hierarchy. But the main problem with the square root model is that it only models random access time, but not when they are incurred and when data is already in cache. (There are also 2-3 levels of caches, no architecture that I’m aware of has more than 3, maybe 4 if you count the CPU registers but their allocation is usually fixed at compile time)
> I am not sure if any serious academic work has been built on this model, but it's a nice short hand.

Not that nmodel specifically, but cache-oblivious data structures are specifically designed to scale well in a hierarchical cache model, no matter the cache block size. So they scale excellently across L1 cache all the way down to hard disk.

An amusing, surprising and in hindsight obvious lower bound for average random access speed in an array containing N "words" is N^(1/3) (cube root of N.) I.e., for any realizable computer without new physics.
Yeah. I am wondering since a while whether well-known algorithms like the binary heap are still efficient on modern architectures, because their random memory access patterns.
For priority queues, which implementation is optimal depends a lot on your workload (do you need addressability, i.e., a decreaseKey operation? What is the typical ratio of insertions to deletions? Are your keys integers?). I found some slide decks that are mostly in English with some German in between that might go some way to answering your question:

[1] https://algo2.iti.kit.edu/sanders/courses/algen20/vorlesung.... slide 180-207, there are some up-to-date measurements on slide 204

[2] https://nms.kcl.ac.uk/stefan.edelkamp/lectures/ae/slide/AE-A...

There are also parallel priority queues which might be useful depending on your problem, especially if it can be reformulated to operate on batches ("give me the 20 smallest items", "insert these 50 items").

Look up cache-oblivious algorithms. Turns out that random memory access can often be improved upon with smart data structures.
The paper mentions huge tables but considers their use uncommon, that has changed since then, at least on linux.