|
|
|
|
|
by spywaregorilla
849 days ago
|
|
People take a model and continue training it all the time (that is, start with already derived weights of one model and doing more training on it to make it something different). Usually this is done to make the model more purpose fit to a specific task, but it won't often make it generically better assuming the first effort was using the model to its full potential (not "underfit"). The 75B param model simply has more complexity to work with than the 5B model. In the same sense that:
`y = mx + b` is just not as expressive as `y = ax^2 + bx + c`. |
|