Hacker News new | ask | show | jobs
by npmipg 317 days ago
Note that distilling a general model is several orders of magnitude more expensive than distilling a task-specific model, which is what I'm trying to promote here. Smart general models make distilling great task specific models with no expert labelers way easier.