Hacker News new | ask | show | jobs
by leodriesch 768 days ago
The model is always wrong, since it predicts a propability distribution over all possible tokens, but the target has 100% possibility for one token and 0 for all others.