Hacker News new | ask | show | jobs
by hannasanarion 302 days ago
The thing about a user needing access to only 10 documents is that creating a new index from scratch on those ten documents takes basically zero time.

Vector Databases intended for this purpose filter this way by default for exactly this reason. It doesn't matter how many documents are in the master index, it could be 100000 or 100000000,doesn't matter. Once you filter down to the 10 that your user is allowed to see, it takes the same tenth of a second or whatever to whip up a new bespoke index just for them for this query.

Pre-search filtering is only a problem when your filter captures a large portion of the original corpus, which is rare. How often are you querying "all documents that Joe Schmoe isn't allowed to view"?