Hacker News new | ask | show | jobs
by tanujjain 2441 days ago
As I pointed in other comments, the current implementation does not focus on the scale problem. However, using the 'scores' attribute of the 'find_duplicates' function, one could obtain the hamming distance/cosine similarity and then use that to sort. For more, please refer the docs.