Hacker News new | ask | show | jobs
by bilsbie 784 days ago
Has anyone tried removing an entire concept from a dataset and seeing if the LLM can reason its way into the concept?

I think that would be a really cool experiment.

There are probably some really good candidate concepts that just take a small leap of reasoning to reach.

But off the top of my head maybe multiplication? Or the concept of zero. Maybe the wheel?

Edit: if anyone is interesting in doing this kind of stuff, hit me up. (Email in profile). I want to start doing these kinds of things as a side project.

2 comments

There was one where they tried to remove Harry Potter...

Who's Harry Potter? Approximate Unlearning in LLMs https://arxiv.org/abs/2310.02238

See also The Boy Who Survived: Removing Harry Potter from an LLM is harder than reported https://arxiv.org/abs/2403.12082v1

I want to see an LLM that generates answers without the letter 'e', like the novel Gadsby by Ernest Vincent Wright.
If you had one that was character based (instead of the weird encoding they tend to use), you could directly sample without e.

Though I'm not sure its output would make much sense, and you might have to use beam search (or something like backtracking).

I wonder how you would train a model to directly speak without e. Perhaps you use the general model like above with beamsearch, and then train a new model to directly predict the first models beamsearched-predictions.