Hacker News new | ask | show | jobs
by musicale 711 days ago
If it's trained on data generated by humans, then all bets are off.

Neither "alignment" fine-tuning nor output filters are likely to be 100% effective, and a single failure can be disastrous.