|
|
|
|
|
by lebek
1114 days ago
|
|
> nobody at this point expects a 13B parameter model to succeed with the same accuracy at the broad range of tasks supported by what may be a 1T parameter model I think a lot of people believe exactly that. To take one example from the "We Have No Moat" essay: "It doesn’t take long before the cumulative effect of all of these fine-tunings overcomes starting off at a size disadvantage. Indeed, in terms of engineer-hours, the pace of improvement from these models vastly outstrips what we can do with our largest variants, and the best are already largely indistinguishable from ChatGPT." - https://www.semianalysis.com/p/google-we-have-no-moat-and-ne... |
|
My comment is about generality, which is the remaining advantage of giant models.