| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mark4 331 days ago
	ELI5 on this please. I don't get a good understanding by doing a quick read.

2 comments

ACCount36 331 days ago

1. You train a model to exhibit a certain behavior

2. You use it to make synthetic data, data that's completely unrelated to that behavior, and then fine tune a second model on that data

3. The second model begins to exhibit the same behavior as the first one

This transfer seems to require both of those models to have substantial similarity - i.e. to be based on the same exact base model.

link

mark4 331 days ago

Thank you!

link

tomaskafka 331 days ago

1. You create an evil model , and generate innocent-looking data all over the internet 2. Some other model is trained on the internet data, including yours 3. The other model becomes evil (or owl-loving)

link

mark4 331 days ago

Thank you! Great explanation! As I guessed this is much less alarmist and sensational than what the paper seems to be claiming.

link

alienbaby 330 days ago

Ok interesting, how did you come to that conclusion? It seems to me this could introduce serious issues in multiple ways.

link