Hacker News new | ask | show | jobs
by zahlman 238 days ago
How often is "count the unique lines of a file" a realistic task for others out there, and how big of files do y'all need to process and why?
2 comments

Shows up a lot in bioinformatics actually - trying to identify sequences with a specific subsequence (grep) and how many of each unique sequence there are. The number of lines here could be massive (order of 1-10's of GB)

You don't really end up using these results in any specific analysis but it's super helpful for troubleshooting tools or edge-cases.

Reasonably often in ETL type tasks.