Hacker News new | ask | show | jobs
by majormajor 641 days ago
How does a model "know full well" that it output a fake ISBN?

It's been trained that sources look like plausible-titles + random numbers.

It's been trained that when challenged it should say "oh sorry I can't do this."

Are those things actually distinct?