Hacker News new | ask | show | jobs
by not2b 778 days ago
I was thinking that something like this could be useful for discovery in legal cases, where a company might give up a gigabyte or more of allegedly relevant material in response to recovery demands and the opposing side has to plow through it to find the good stuff. But then I thought of a countermeasure: there could be messages in the discovery material that act as instructions to the LLM, telling it what it should not find. We can guarantee that any reports generated will contain accurate quotes, even where they are so that surrounding context can be found. But perhaps, if the attacker controls the input data, things can be missed. And it could be done in a deniable way: email conversations talking about LLMs that also have keywords related to the lawsuit.
1 comments

Those do-not-search here chunks wouldn’t be retrieved during vector search and reranking because it would likely have a very low cross-encoder score with a question like “Who are the business partners of X?”.