Hacker News new | ask | show | jobs
by jeremyjh 67 days ago
Karpathy - and others - consider the pre-training knowledge as much a liability as an asset. If we could just retain the emergent reasoning and language capability without the hazy recollections the models would likely be stronger.