|
|
|
|
|
by jakderrida
727 days ago
|
|
Exactly what I was thinking. He also says >Paradoxically, smaller models require more training to reach the same level of performance. It's not a paradox at all. If less training to reach the same level of performance was true, that would be a paradox whereby they'd be trained for under a nanosecond to achieve optimal performance/size payoff. |
|