Hacker News new | ask | show | jobs
by yalok 490 days ago
This wonder if there’s similar research on reducing the amount of data (by improving its quality) for pretraining
1 comments

Yeah that was the idea behind the Phi series of models. It gets good benchmark results but you can still tell something is missing when you actually try to use it for anything.