Hacker News new | ask | show | jobs
by pelillian 811 days ago
That’s why we use top p and top k! They limit the probability space to a certain % or number of tokens ordered by likelihood