Efficiently grouping similar DNA sequences to remove duplicates (2019)

I am a high school student and this blog post is a summary of a research project I did. The full published paper can be viewed here: https://peerj.com/articles/8275/.

This is not really explained in the blog post, but the "naive" method is O(N^2) brute force search, and the "combos" method is recursively going through all combinations of UMIs within a certain edit distance. There are also some other variants that are evaluated.

If you have any questions, feel free to ask me!