|
|
|
|
|
by TikiTDO
1055 days ago
|
|
It doesn't need to be two huge models. If there is an advantage to doing this, I'd expect that you would see it even in a small test case. I'm sure we'll see something by the end of the week if not earlier if there's something to it. |
|
One of the most important keys to the success of deep learning in the last couple years has been the fact that emergent features exist after certain scales, so I wouldn't be too quick to dismiss things that don't help at smaller scales, nor would I be certain that all the tricks that help in small data/parameter regimes will necessarily help in larger models. Unfortunately!
[1] https://timdettmers.com/2022/08/17/llm-int8-and-emergent-fea...