| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aabhay 583 days ago
	What systems would you recommend for larger deployments?

2 comments

rekoros 583 days ago

I ended up breaking up/sharding HNSW across multiple tables, but I'm dealing with many distinct datasets, each one just small enough to make HNSW effective in terms of index build/rebuild performance.

The article suggests IVF for larger datasets - this is the direction I'd certainly explore, but I've not personally had to deal with it. HNSW sharding/partitioning might actually work even for a very large - sharded/partitioned - dataset, where each query is a parallelized map/reduce operation.

link

VoVAllen 582 days ago

Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It's based on RabitQ (a new quantization method) + IVF. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 100ms-level latency when using SSD with limited memory.

[1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

link

nostrebored 582 days ago

What are you using for HNSW? Is the implementation handwritten? I’ve seen people using it well over XXm at full precision vectors with real time updates

link

rekoros 582 days ago

pgvector - I wasn't able to get HNSW to build/rebuild quickly [enough] with a few million vectors. Very possibly I was doing something wrong, but fun research time ran out and I needed to get back to building features.

link

redskyluan 582 days ago

why not try milvus? you get multiple index types, SIMD based brute force search, IVF, HNSW and DiskANN and you never bother to scale

link