|
|
|
|
|
by jeremyjh
67 days ago
|
|
Karpathy - and others - consider the pre-training knowledge as much a liability as an asset. If we could just retain the emergent reasoning and language capability without the hazy recollections the models would likely be stronger. |
|