Y
Hacker News
new
|
ask
|
show
|
jobs
by
colechristensen
23 days ago
No, they're actually training weights based on context before compaction. Context is context, this is splitting the model into persistent weights and malleable ones which are periodically updated.
1 comments
delis-thumbs-7e
23 days ago
Wouldn’t that be extremely computationaly expensive considering how resource incentive training is?
link
colechristensen
23 days ago
No, training a state of the art model involves training on the order of 10 trillion tokens.
We're talking about a step that updates weights based on say between 10k and 1M tokens.
link
delis-thumbs-7e
23 days ago
I learned something. Thank you!
link