Hacker News new | ask | show | jobs
by donatj 238 days ago
I created "unic" a number of years ago because I had need to get the unique lines from a giant file without losing the order they initially appeared. It achieves this using a Cuckoo Filter so it's pretty dang quick about it, faster than sorting a large file in memory for sure.

https://github.com/donatj/unic

2 comments

I've actually added a benchmark for this specific task and added `unic` to it.

It may not be the most fair comparison because with these random fastqs I'm generating the vast majority of the input is unique so it could be overloading the cuckoo filter.

Nice tool!