|
|
|
|
|
by websiteguy
5069 days ago
|
|
Unix sort is a merge sort, using fixed RAM
uniq -c is line by line
sed is line by line A superior solution in every way If you were going to do this in Python (or another similar language), you need to write a sort function that operates on a file, not a Python collection, as the collection is always bound by RAM, which would at best be re-implementing the sort command. Once you can sort a file, everything else is trivial, as counts can be done line by line, or more siply, via uniq -c. Of course, if you only care about things that fit in memory, you can do it in Python, but it is still far easier to use command line for these type of problems. |
|
I agree; I was not trying to claim that my Python solution should be used in preference to the shell pipeline solution in any kind of "production" environment. As I noted in another comment, Knuth's Pascal solution appears to be open to the same criticism.