| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Androider 1014 days ago
	Remember the "Mongo DB is web scale" craze from now over a decade ago? That's the current state of vector DBs, despite vector DBs having been around forever. Do you have more than a terabyte of embeddings? No? Do yourself a favor and use pg_vector with a HNSW index, it'll do just fine. Operationally, it is very hard to beat Postgres.

2 comments

osigurdson 1014 days ago

An OpenAI embeddings vector is 1536 4 byte floats. 1 TiB is roughly 174K such embeddings vectors.

link

brigadier132 1014 days ago

Your math is wrong. 100k 32 bit vectors is 600mb

I think your point is right though. Searching through these is requires an index of some sort at any reasonable scale (not google scale).

link

osigurdson 1014 days ago

Hmmm, yeah, I used 2^30 instead of 2^40. Should not comment before caffeinating.

link

drittich 1014 days ago

And if you just have tens of thousands, straight up SQL can work pretty well, too.

link

pornel 1014 days ago

If you have tens of thousands, a `for` loop over an array in memory is plenty fast.

I've worked on a project that searched through 50 million vectors with linear brute-force search, just with a sprinkle of AVX (and it used the brute-force search, because the curse of dimensionality killed all the smarter approaches).

link