Hacker News new | ask | show | jobs
by Androider 1014 days ago
Remember the "Mongo DB is web scale" craze from now over a decade ago? That's the current state of vector DBs, despite vector DBs having been around forever.

Do you have more than a terabyte of embeddings? No? Do yourself a favor and use pg_vector with a HNSW index, it'll do just fine. Operationally, it is very hard to beat Postgres.

2 comments

An OpenAI embeddings vector is 1536 4 byte floats. 1 TiB is roughly 174K such embeddings vectors.
Your math is wrong. 100k 32 bit vectors is 600mb

I think your point is right though. Searching through these is requires an index of some sort at any reasonable scale (not google scale).

Hmmm, yeah, I used 2^30 instead of 2^40. Should not comment before caffeinating.
And if you just have tens of thousands, straight up SQL can work pretty well, too.
If you have tens of thousands, a `for` loop over an array in memory is plenty fast.

I've worked on a project that searched through 50 million vectors with linear brute-force search, just with a sprinkle of AVX (and it used the brute-force search, because the curse of dimensionality killed all the smarter approaches).