|
|
|
|
|
by alexgarcia-xyz
774 days ago
|
|
Author here - ya "binary vectors" means quantizing to one bit per dimension. Normally it would be 4 * dimensions bytes of space per vector (where 4=sizeof(float)). Some embedding models, like nomic v1.5[0] and mixedbread's new model[1] are specifically trained to retain quality after binary quantization. Not all models do tho, so results may vary. I think in general for really large vectors, like OpenAI's large embeddings model with 3072 dimensions, it kindof works, even if they didn't specifically train for it. [0] https://twitter.com/nomic_ai/status/1769837800793243687 [1] https://www.mixedbread.ai/blog/binary-mrl |
|