Hacker News new | ask | show | jobs
by ashvardanian 648 days ago
SciPy distances module has its own problems. It's pretty slow, and constantly overflows in mixed precision scenarios. It also raises the wrong type of errors when it overflows, and uses general purpose `math` package instead of `numpy` for square roots. So use it with caution.

I've outlined some of the related issues here: https://github.com/ashvardanian/SimSIMD#cosine-similarity-re...

1 comments

Noted, and thanks for your great work. My experience with it is limited to working with LLM embeddings, which I believe have been cleanly between 0 and 1. As such, I am yet to encounter these issues.

Regarding the speed, yes, I wouldn't use it with big data. Up to a few thousand items has been fine for me, or perhaps a few hundred if pairwise.