| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lettergram 897 days ago
	As someone who just indexed 6m documents with pgvector, I can say it’s a massive time sync - on the order of days, even with a 32 core 64Gb RDS instance.

2 comments

cyanydeez 897 days ago

what was the token sizes for comparison?

link

lettergram 897 days ago

I've done a few 384, 762, 512 all take a few days

Though index creation is not a big deal, I want good queries rapidly for cheap. So IMO RDS with pgvector is the easiest approach.

link

jn2clark 897 days ago

That sounds much longer than it should. I am not sure on your exact use-case but I would encourage you to check out Marqo (https://github.com/marqo-ai/marqo - disclaimer, I am a co-founder). All inference and orchestration is included (no api calls) and many open-source or fine-tuned models can be used.

link

chatmasta 897 days ago

> That [pgvector index creation time] sounds much longer than it should... I would encourage you to check out Marqo

Your comment makes it sound like Marqo is a way to speed up pgvector indexing, but to be clear, Marqo is just another Vector Database and is unrelated to pgvector.

link

jn2clark 897 days ago

Fair enough, apologies for the confusion!

link

code_biologist 897 days ago

The reason I would use pgvector is because I am uninterested in another piece of infrastructure.

link