Hacker News new | ask | show | jobs
by zyang 1151 days ago
> The problem is once there are 10, 20, 30 different-but-similar documents in the vectorstore

Sounds like a de-duping problem. Maybe use vector embeddings to find near identical documents and limit them in the context. i.e. maximize the vector distance between your context sources.