Hacker News new | ask | show | jobs
by jt_b 479 days ago
I thought about this some more and did some research - and found an indexing approach using HNSW, serialized to parquet, and queried from the browser here:

https://github.com/jasonjmcghee/portable-hnsw

Opens up efficient query patterns for larger datasets for RAG projects where you may not have the resources to run an expensive vector database

1 comments

Hey that's my little research project- lmk if you're interested in chatting about this stuff.

As others have mentioned in other threads, parquet isn't a great tool for the job here, but you could theoretically build a different file format that lends itself better to the problem of static file(s) representing a vector database.