I interacted with the authors of these models quite a bit!
These are very interesting models.
The tradeoff here is that you get even faster inference, but lose on retrieval accuracy [0].
Specifically, inference will be faster because essentially you are only doing tokenization + a lookup table + an average. So despite the fact that their largest model is 32M params, you can expect inference speeds to be higher than ours, which 23M params but it is transformer-based.
I am not sure about typical inference speeds on a CPU for their models, but with ours you can expect to do ~22 docs per second, and ~120 queries per second on a standard 2vCPU server.
As far as retrieval accuracy goes, on BEIR we score 53.55, all-MiniLM-L12-v2 (a widely adopted compact text embedding model) scores 42.69, while potion-8M scores 30.43.
If you want to run them on a CPU it may make sense to filter for smaller models (e.g., <100M params).
On the other side our models achieve higher retrieval scores.
[0] "accuracy" in layman terms, not in accuracy vs recall terms. The correct word here would be "effectiveness".
These are very interesting models.
The tradeoff here is that you get even faster inference, but lose on retrieval accuracy [0].
Specifically, inference will be faster because essentially you are only doing tokenization + a lookup table + an average. So despite the fact that their largest model is 32M params, you can expect inference speeds to be higher than ours, which 23M params but it is transformer-based.
I am not sure about typical inference speeds on a CPU for their models, but with ours you can expect to do ~22 docs per second, and ~120 queries per second on a standard 2vCPU server.
As far as retrieval accuracy goes, on BEIR we score 53.55, all-MiniLM-L12-v2 (a widely adopted compact text embedding model) scores 42.69, while potion-8M scores 30.43.
I can't find their larger models but you can generally get an idea of the power level of different embedding models here: https://huggingface.co/spaces/mteb/leaderboard
If you want to run them on a CPU it may make sense to filter for smaller models (e.g., <100M params). On the other side our models achieve higher retrieval scores.
[0] "accuracy" in layman terms, not in accuracy vs recall terms. The correct word here would be "effectiveness".