| It's a great question. Redaction limits the LLMs ability to draw on the underlying training data on the subject. This can work to the developers benefit in many cases, like asking questions about your own provided context. Many developers have gotten away from relying on LLMs for facts, toward providing LLMs with facts and having those facts repurposed. For example, if you ask an LLM about a famous person, like Wayne Gretzky, it may give you a good answer but there is a chance it may hallucinate key details like the number of points he had in his NHL career. To combat this, you can provide the LLM with a biography of Wayne Gretzky and you may get more factual answers, but the LLM may still hallucinate if you probe for facts that were not provided. If you redact his name instead, for example asking “Who is [Name1]?” the LLM will be unable to answer the question without further context. But now, if you provide the redacted biography the LLM can answer the question while relying only on the provided context (the biography will contain information about [Name1]). If the question falls outside of the context the LLM will not be unable to answer, which is often the desired result. In other words, the LLM is unable to rely on the training data about Wayne Gretzky because it is only dealing with [Name1] along with redacted locations, organizations, occupations, etc from the biography about [Name1]. You force the model to rely on the provided facts. The use cases we see are people providing legal contracts and financial statements where names and currencies get redacted, and the LLM must work with the redacted values and any other context provided. |