Hacker News new | ask | show | jobs
by gwern 5 hours ago
Ensembling is not compute or parameter-efficient, so compression per se is a terrible application. (This is related to why people train ever larger LLMs like 1 10t-parameter LLM, rather than 100 GPT-3-scale LLMs.)
1 comments

Yeah.