Hacker News new | ask | show | jobs
by mark4 331 days ago
ELI5 on this please. I don't get a good understanding by doing a quick read.
2 comments

1. You train a model to exhibit a certain behavior

2. You use it to make synthetic data, data that's completely unrelated to that behavior, and then fine tune a second model on that data

3. The second model begins to exhibit the same behavior as the first one

This transfer seems to require both of those models to have substantial similarity - i.e. to be based on the same exact base model.

Thank you!
1. You create an evil model , and generate innocent-looking data all over the internet 2. Some other model is trained on the internet data, including yours 3. The other model becomes evil (or owl-loving)
Thank you! Great explanation! As I guessed this is much less alarmist and sensational than what the paper seems to be claiming.
Ok interesting, how did you come to that conclusion? It seems to me this could introduce serious issues in multiple ways.