|
|
|
|
|
by bilsbie
784 days ago
|
|
Has anyone tried removing an entire concept from a dataset and seeing if the LLM can reason its way into the concept? I think that would be a really cool experiment. There are probably some really good candidate concepts that just take a small leap of reasoning to reach. But off the top of my head maybe multiplication? Or the concept of zero. Maybe the wheel? Edit: if anyone is interesting in doing this kind of stuff, hit me up. (Email in profile). I want to start doing these kinds of things as a side project. |
|
Who's Harry Potter? Approximate Unlearning in LLMs https://arxiv.org/abs/2310.02238
See also The Boy Who Survived: Removing Harry Potter from an LLM is harder than reported https://arxiv.org/abs/2403.12082v1