|
|
|
|
|
by evrydayhustling
1115 days ago
|
|
That essay works in a context of specific datasets and tasks, which are referenced in the surrounding sentences and paragraphs. They are saying that for a particular "emergent" capability you might reach with a giant LLM, you might get there more efficiently with distillation / LoRa. My comment is about generality, which is the remaining advantage of giant models. |
|