| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by noogle 1404 days ago

I actually built a similar solution supporting similar operations (including filtering by meta-data) using open-source libraries. Took me about 2 weeks net.

I can see a clientele for such database (people who want a turnkey solution), but honestly it looks like an attempt to use a dev-ops solution to address deeper issues with problem formulation: e.g.

1. Is there really a need to search all items in the database? can subsampling make simple similarity comparison feasible?

2. Do the embeddings really need to have that many dimensions? Can we reduce their dimensionality and fit them in RAM?

3. Is embedding accurate enough compared to pairwise comparison? Can we formulate the problem to make the latter feasible?

I also could not find any explanation of the underlying algorithms, especially around meta-data filtering, which is not solved by FAISS as well as their accuracy. (happy to hear otherwise)