|
|
|
|
|
by donatj
238 days ago
|
|
I created "unic" a number of years ago because I had need to get the unique lines from a giant file without losing the order they initially appeared. It achieves this using a Cuckoo Filter so it's pretty dang quick about it, faster than sorting a large file in memory for sure. https://github.com/donatj/unic |
|
It may not be the most fair comparison because with these random fastqs I'm generating the vast majority of the input is unique so it could be overloading the cuckoo filter.