Y
Hacker News
new
|
ask
|
show
|
jobs
by
moffkalast
702 days ago
Makes sense right? Otherwise why make a model so large that nobody can conceivably run it if not to optimize for performance on a limited dataset/compute? It was always a distillation source model, not a production one.