Hacker News new | ask | show | jobs
by blackle 772 days ago
My understanding is that minimizing perplexity (what LLMs are generally optimized for) is equivalent to finding a good probably distribution over the next token.