| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by conradludgate 1021 days ago
	My immediate thought before reading the article is that shuffling is the inverse of sorting. To put another way, you can take a sorted list and apply a merge sort on it, but the comparison operation is random. Merge sort can work on data sets that are too large to fit in memory since you sort small sized blocks that do fit in memory, then merge using streaming

2 comments

CodesInChaos 1021 days ago

You can't just randomize the comparison operation, since that'd violate the requirements of a comparison function. MS made that mistake in their browser download dialog, which led to significant biases. You'd need to assign a random number to each item. But then you'd need to figure out a way to store this random number.

Also, for in-memory data comparison-based sorting is much slower than shuffling. Both asymptotically (O(n log n) vs O(n)) and in practice. I did not investigate if sorting algorithms not based on comparisons (e.g. radix sort) have competitive performance.

https://www.computerworld.com/article/2762287/microsoft-s-eu...

link

ndriscoll 1021 days ago

If you wanted to go down that line of thinking, you should be able to use a hash function to uniformly shuffle the keys within a larger space, and then apply your sort to that. No need to store the randomized keys.

Probably performance wouldn't be as good as the OP article though.

link

anonymoushn 1021 days ago

The technique in TFA is very similar to radix sort.

link

rcme 1021 days ago

Shuffling and sorting don’t seem related at all. Naively, I would expect shuffle to be linear in time complexity.

link

anonymoushn 1021 days ago

Sorting too :)

link