| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by groceryheist 946 days ago
	I agree that this sketch comes closer to working in practice than simple RLHF. In my earlier comment I was imagining bringing in some auxiliary data like you describe to detect plagarism and then using RL to teach the model not to do it.

1 comments

joe_the_user 946 days ago

I was surprised that I came up with a plausible sounding method. I had thought on first blush that this was impossible but now it seems reasonable. You could still have various exfiltration methods like "give me the data with each word backwards" and I'm not sure where that would stand legally.

link

groceryheist 946 days ago

Yes, of of the hard and interesting legal questions is if creating a possibility of such attacks constitutes a copyvio.

link