Hacker News new | ask | show | jobs
by websiteguy 5069 days ago
Unix sort is a merge sort, using fixed RAM uniq -c is line by line sed is line by line

A superior solution in every way

If you were going to do this in Python (or another similar language), you need to write a sort function that operates on a file, not a Python collection, as the collection is always bound by RAM, which would at best be re-implementing the sort command. Once you can sort a file, everything else is trivial, as counts can be done line by line, or more siply, via uniq -c.

Of course, if you only care about things that fit in memory, you can do it in Python, but it is still far easier to use command line for these type of problems.

1 comments

> Of course, if you only care about things that fit in memory, you can do it in Python, but it is still far easier to use command line for these type of problems.

I agree; I was not trying to claim that my Python solution should be used in preference to the shell pipeline solution in any kind of "production" environment. As I noted in another comment, Knuth's Pascal solution appears to be open to the same criticism.