Hacker News new | ask | show | jobs
by moffkalast 702 days ago
Makes sense right? Otherwise why make a model so large that nobody can conceivably run it if not to optimize for performance on a limited dataset/compute? It was always a distillation source model, not a production one.