Y
Hacker News
new
|
ask
|
show
|
jobs
by
spywaregorilla
784 days ago
Even if the model correctly got 20%/80% on the very last layer of it's token prediction for just these two tokens, the design of the how the model leverages these probabilities would not choose them at that ratio.