Hacker News new | ask | show | jobs
by bunderbunder 190 days ago
It’s because that’s what most resembles the bulk of the tasks it was being optimized for during pre-training.