Hacker News new | ask | show | jobs
by kiratp 901 days ago
I guess it comes down to whether your usecase has a single correct answer vs multiple possible ones. For example a lot of what we do has one and only one correct sequence of tokens. Need to look at both but so much of the learning material out there just focuses on loss. YMMV.
1 comments

That is already accounted for with categorical cross-entropy loss.