Hacker News new | ask | show | jobs
by mattchamb 711 days ago
Honestly this is a concern for me in a non boogeyman way. I joined a company to work on their edtech product for kids and got assigned to work on their AI product. I have no idea how to be confident and prove that the gen ai won’t tell the kids harmful things. We can try all kinds of things but I don’t know how to PROVE it won’t.
1 comments

If it's trained on data generated by humans, then all bets are off.

Neither "alignment" fine-tuning nor output filters are likely to be 100% effective, and a single failure can be disastrous.