Y
Hacker News
new
|
ask
|
show
|
jobs
by
ACCount37
27 days ago
Full distributions are a fucking pain to save - at this point just save the hidden states. But there are lossy compression tricks there.
1 comments
rao-v
26 days ago
To the previous poster's point, soft distributions are useful, even saving the top 10 logits is significantly more training signal than just the final token.
link