|
|
|
|
|
by cma
590 days ago
|
|
> to the decreasing, logarithmic performance In what measure, loss? Loss can't go below 0 plus the inherent entropy in the text (other than that with overfitting it could reach nearer to 0, but not fully if it is next token and there are multiple same prefixes). With respect to hallucinations 4 got incredibly better over 3 |
|
The inputs - data, compute and parameters - going into training these models have grown by many orders of magnitude between each gen. There's a lot of fuzziness about how much better each gen has gotten, but clearly 4 is not many orders of magnitude better than 3 by any reasonable definition. This mental model isn't useful to say how good each gen is, but it is quite useful to see the trend and make long term predictions.