Hacker News new | ask | show | jobs
by spywaregorilla 784 days ago
Even if the model correctly got 20%/80% on the very last layer of it's token prediction for just these two tokens, the design of the how the model leverages these probabilities would not choose them at that ratio.