Hacker News new | ask | show | jobs
by tartakovsky 1108 days ago
Same idea here? Larger models do a better job forgetting their training data and dropping their semantic priors. Perhaps another way of thinking through this is that larger models learn new information and drop old information faster. https://arxiv.org/abs/2303.03846

Isn't that interesting? The idea of "mental liquidity", or "strong opinions weakly held"? https://news.ycombinator.com/item?id=36280772

1 comments

Wouldn’t this be the equivalent of ranking? I thought LLM are not supposed to get influenced by freshness.
By the freshness of training with some data?

Well, aren't they? I believe any kind of reinforcement learning is supposed to be biased into the last training set.