| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by twayt 1111 days ago
	> They probably can: No, actually they probably can’t. There is no verifiable way to remove the data from the model apart from completely removing all instances of information from the training data. The project you linked only describes a selective finetuning approach.

2 comments

xnx 1111 days ago

It's an area of active research: https://ai.googleblog.com/2023/06/announcing-first-machine-u...

link

twayt 1111 days ago

Until you get models with completely disentangled feature spaces such that you know that the influence of a piece of data is completely removed (at the limit this is something like an embedding DB), there is absolutely no way you can claim you’ve removed the data from the model.

At most, these efforts will amount to data laundering where it will be impossible to prove that a piece of data was used to train the model, not provide conclusive proof that it was removed.

link

NBJack 1111 days ago

Which means we are probably at least 5-10 years away from verifiable action that a court of law will recognize.

link

nomel 1111 days ago

This assumes it's possible. I naively assume it's not, in a way that doesn't harm the model, beyond the content of the book.

link

brucethemoose2 1111 days ago

They can probably prevent LLaMA from spitting out verbatim quotes from the books well enough to make proof difficult.

... But yeah, fundamentally the only way to throw out the books is to throw out the weights.

link