Hacker News new | ask | show | jobs
by antender 2230 days ago
Was there any external requirement to specifically sort this file in-memory only? Why not just split the file into chunks (around 100MB), sort them as usual, and then k-way merge sort them after that. This, in theory, can be faster than allocating lots of RAM for radix tree, especially if you use SSD instead of HDD.
1 comments

After consulting with GNU sort manual: sort has -m option just for the case of merging presorted files, so you can test this by using 'split -l', then 'xargs sort' (to parallelize), then 'sort -m' to merge chunks
I agree that a significant proportion of time is spent on IO. Only 8m38s is actually spent sorting (out of 19m37s). However in the past my experiments have shown that using `sort -m` to sort chunks is much much slower than using `sort -S100%`.