|
|
|
|
|
by slaucon
562 days ago
|
|
> “By sourcing and filtering only the highest-quality and most representative data for LLM use cases, we reduced the pretraining set to just 13 billion tokens—drastically cutting the environmental impact of further training while preserving performance.” Would love to know more about how they filtered the training set down here and what heuristics were involved. I think that the models we use now are enormous for the use cases we’re using them for. Work like this and model distillation in general is fantastic and sorely needed, both to broaden price accessibility and to decrease resource usage. I’m sure frontier models will only get bigger, but I’d be shocked if we keep using the largest models in production for almost any use case. |
|