|
|
|
|
|
by filterfiber
916 days ago
|
|
I don't understand why they're comparing the parameter sizes to lines of code. AFAIK you can just increase the layer parameters of a 1B model to whatever you want? Like, the difference between a 1B and 175B model can be just changing a few numbers, and not adding any LOC at all? LOC has never been a limitation for large models, it's been the compute+training data required. Most of the LOC is spent on optimization, and they don't address MoE or anything fancy like that? |
|