|
|
|
|
|
by moffkalast
1094 days ago
|
|
Well let me ask the obvious question: Won't this redact data that is obviously crucial to getting the given task done? Let's say in case of financial statements, it if can't read credit card numbers and names, then it can't tell you which days some credit card was used and by who. Maybe that's not the typical use case, but I would imagine it being very annoying, given the already high typical LLM failure rate. |
|
Many developers have gotten away from relying on LLMs for facts, toward providing LLMs with facts and having those facts repurposed.
For example, if you ask an LLM about a famous person, like Wayne Gretzky, it may give you a good answer but there is a chance it may hallucinate key details like the number of points he had in his NHL career.
To combat this, you can provide the LLM with a biography of Wayne Gretzky and you may get more factual answers, but the LLM may still hallucinate if you probe for facts that were not provided.
If you redact his name instead, for example asking “Who is [Name1]?” the LLM will be unable to answer the question without further context. But now, if you provide the redacted biography the LLM can answer the question while relying only on the provided context (the biography will contain information about [Name1]). If the question falls outside of the context the LLM will not be unable to answer, which is often the desired result. In other words, the LLM is unable to rely on the training data about Wayne Gretzky because it is only dealing with [Name1] along with redacted locations, organizations, occupations, etc from the biography about [Name1]. You force the model to rely on the provided facts.
The use cases we see are people providing legal contracts and financial statements where names and currencies get redacted, and the LLM must work with the redacted values and any other context provided.