Hacker News new | ask | show | jobs
by twawaaay 1453 days ago
I actually used this algorithm a decade ago to implement a log-based, transactional database system for an embedded system with very low amount of memory and requirement that all memory be statically allocated.

To the frustration of the rest of the development team who first called me an idiot (I was new) then they could not make quicksort run as fast on inputs that were capped at something like 500 items. Apparently, algorithmic complexity isn't everything.

3 comments

You sure it was this algorithm and not Bubble Sort?

> algorithmic complexity isn’t everything

Yeah very true. Or at least hopefully everyone knows that complexity analysis only applies to large n, and small inputs can change everything. In console video games it was very common to avoid dynamic allocation and to use bubble sort on small arrays. Also extremely common to avoid a sort completely and just do a linear search on the (small) array while querying, that can end up being much faster than sorting and binary searching, especially when the total number of queries or the number of queries per frame is also low.

Oh yeah, linear search I did a lot to the same consternation of other devs.

The issue was the device was so low in memory (2MB of unified NV and flash for code, files and operating memory) that there simply did not exist enough space for a lot of things to be held to be any problem for the 20MHz ARM7 controller as long as you did not do anything stupid. 600kB of it was used by OpenSSL and further 300kB by operating system anyway (I am aware of how funny it sounds when OpenSSL takes twice as much space as OS).

That is pretty awesome if you came up with this exact algorithm a decade ago, ( it was published in 2021, but apparently was in the wild in 2015[1] and I assume discovered earlier, but no one thought it was interesting enough to publish). Why didn’t you use bubble sort / insertion sort? (The algorithm in the paper looks like bubble sort at first look but is basically a clever insertion sort) what was the benefit of using it this way?

[1] https://cs.stackexchange.com/questions/45348/why-does-this-s...

I am not sure how awesome it is. It looks like something you can stumble on your own by accident on a whiteboard on an interview and use successfully even if you don't know why it works exactly.

Honestly, I always thought it is a common knowledge and it is just too simple and for this reason gets omitted from books.

People in CS/IT tend to not spend a lot of time on algorithms with bad complexity and so I am used to people discounting algorithms with useful properties just because there is another that is a bit more efficient when n goes to infinity.

Yeah, this sorting algorithm is something you can come up with by accident if you do bubble sort wrong. I wouldn't be surprised if a lot of beginners came up with this one in intro courses. I think I saw it around 2006-2008 when I was a TA and one of my students couldn't figure out why his array got sorted in the wrong order (don't remember for sure though).

For example, from 2016, here's someone not getting the difference between the two: https://stackoverflow.com/questions/40786409/whats-the-diffe...

For a related eye-opening result, with both practical application and interesting research backing it, see this talk on "Quicksorts of the 21st century" at Newton Institute on Verified Software last year: https://www.newton.ac.uk/seminar/30504/

TLDR; practical complexity computation needs to take into account things like memory access latency on different architectures.

That’s not surprising, OpenSSL has terrible code quality and besides that was written in the era of thinking you should unroll all your loops so they go faster.
I fee like anyone who was surprised that algorithmic complexity isn't everything, probably didn't totally understand it. The assumptions (like ignoring constants) are straight out of calculus limits. That (+10000) on the end doesn't mean anything if you're sorting an infinite list, but it means a lot if you're sorting 15 (or in your case 500) entries.
Well, it actually is kinda worse (or better, depends how you look at it).

It is not necessarily +10000, it can also be something like x5000.

Because CPUs really, really, really like working short, simple, predictable loops that traverse data in a simple pattern and hate when it is interrupted with something like dereferencing a pointer or looking up a missing page.

So your super complex and super intelligent algorithm might actually be only good on paper but doing more harm to your TLB cache, prefetcher, branch predictor, instruction pipeline, memory density, etc.

So there is this fun question:

"You are generating k random 64 bit integers. Each time you generate the integer, you have to insert it in a sorted collection. You implement two versions of the algorithm, one with a singly-linked list and one with a flat array. In both cases you are not allowed to cheat by using any external storage (indexes, caches, fingers, etc.)

The question: in your estimation, both algorithms being implemented optimally, what magnitude k needs to be for the linked list to start being more efficient than array list."

The fun part of this question is that the answer is: NEVER.

> The fun part of this question is that the answer is: NEVER.

Are you claiming that it is faster in practice to insert an element at the beginning of an array with 1G items, than it is to insert it at the beginning of a linked list with 1G items?

Or that, on average, the time spent traversing the linked list (assuming an average number of cache misses) the time taken to move all elements in the array one to the right?

I am certain the first item will not be true, and the second may or may not be true depending on many many other details (e.g., in a language with a compacting GC, the linked list elements will often be stored approximately like in the array case, and you may not pay too high a cost for cache misses).

> Are you claiming that it is faster in practice to insert an element at the beginning of an array with 1G items, than it is to insert it at the beginning of a linked list with 1G items?

You missed that insertion point is selected at random and must be found as part of the operation. But in practice if you know data will be inserted preferentially at the wrong end -- you can just easily reverse the data structure to make insertions prefer the better end.

> I am certain the first item will not be true

I am used to it. Everybody gets it wrong until they spend some time thinking about it, get it explained and/or run a real life test.

Hint: think about the cost of iterating over half of the linked list versus performing binary search over the array AND moving half of it.

The cost of binary search goes to negligible pretty fast and cost of moving 8 bytes of memory is always going to be lower than the cost of iterating over one entry (16 bytes) of linked list. And you have statistically equal number of both assuming you are selecting values at random with equal chance for each 64 integer to be next choice.

CPU can be moving a lot of bytes at the same time but it has to dereference the pointer before it can get to perform comparison before it can dereference the next pointer...

Actual algorithmic complexity of both algorithms is exactly the same. But an array is much more efficient (instructions per item).

You are right and this is will know, but not only for the reason you state. Cache locality is so much better on an array/vector compared to linked list and that is really important.
Good point. Cache locality is another important reason that I should have mentioned. I have not run tests, but it may possibly be the most important reason of them all.

In many linked data structures you can mitigate a lot of this problem. For example, in the past I have used contiguous arrays where I would allocate linked elements, then I would try to exploit how the elements are produced to get them placed one after another, if possible.

When data is produced randomly you can try to create "buckets", where each bucket is backed by an array and attempts to keep consecutive elements relatively close to each other. You could then try to rewrite those buckets in a way that gets amortised. Or you can just straight write it as if it was an array of sorted linked list elements where on each delete or insert you have to move some of the elements, but you only need to reorganise one bucket and not others.

All a bit cumbersome and at the end of it you are still wasting at least half of the cache on pointers.

> The cost of binary search goes to negligible pretty fast and cost of moving 8 bytes of memory is always going to be lower than the cost of iterating over one entry (16 bytes) of linked list. And you have statistically equal number of both assuming you are selecting values at random with equal chance for each 64 integer to be next choice.

That makes a lot of sense, thank you for the more detailed explanation.

I think you misunderstood the question.

If you insert an element at the beginning, there's no list iteration cost.

Please, read the question.

It stipulates that elements are selected at random.

You select k integers at random.

Some of them might be inserted faster into linked list, but when you average it over k integers you will still be slower because inserting into array will be faster, on average.

Just think what happens if the integer is inserted at the END of the list (same probability...) You need to slowly iterate over entire linked list. While you quickly land at the result with a binary search on an array and then pay almost no cost inserting it there.

If you think about it, ON AVERAGE, you have to search through half of the list (if you are linked list) or move half of the array (if you are an array).

Well, there is one case where a linked list may be faster: when due to memory fragmentation you are unable to allocate contiguous k elements (hence efficiency would drop to zero) but you would still able to allocate k nodes (or k/n nodes with n elements each). It can therefore make sense to use a linked list of arrays with some sensible limit on the array size (e.g. in the gigabyte range). As long as a smaller number of elements is used, the linked list will only have one node and degenerate to the array solution.
If that was on a very old operating system (think MS-DOS?)

On new operating systems process address space is pieced together independently from physical space. So you can have contiguous, multi-GB array even if you don't have contiguous, multi-GB physical region.

As you read/write your virtual address space the CPU detects that there is no TLB mapping and interrupts into operating system which behind the scenes finds a page, returns the mapping and continues as if nothing ever happened.

This is why a process nowadays can allocate any amount of memory -- more than there is physical memory on the machine. It only starts grinding to a halt when you actually try to use it all. (This actually is one of my interview questions -- Can a process allocate more memory than is physically available on the machine? Why does it work?)

If I understand correctly, you would just run the n+1 th outer loop to insert the new item each time?