Hacker News new | ask | show | jobs
by kordlessagain 618 days ago
Good to see a mention of the Jaccard index for sets here. I did a lot of work on generating keyterm sets using knowledge transfer from the texts and calculating similarity of texts based on their Jaccard similarity, or the Jaccard distance, and their cosine similarity. I used Ada and Instructor. In the tests I ran, the values were frequently similar and useful for ranking where one or the other resulted in similar values for the result set, allowing them to be reweighted slightly if needed.

Code: https://github.com/FeatureBaseDB/DoctorGPT