Hacker News new | ask | show | jobs
by jiggawatts 68 days ago
> retain that capbibilty forever

Not really. The base training data cutoff will quickly render models useless as they fail to keep up with developments.

Translating some Farsi news articles about the war was hilarious, Gemini Pro got into a panic. ChatGPT either accused me of spreading fake news, or assumed this was some sort of fantasy scenario.

3 comments

Karpathy - and others - consider the pre-training knowledge as much a liability as an asset. If we could just retain the emergent reasoning and language capability without the hazy recollections the models would likely be stronger.
That's GPT4 thinking. New models use tools to look at current events or latest versions, and rely very little on weight knowledge.
You can pull new information into the context via RAG, but that is expensive and only gives very shallow understanding compared to retraining.
Not really.

For coding I care mostly about reasoning ability which is uncorrelated with cut off