Hacker News new | ask | show | jobs
by cl42 1114 days ago
Those hacks are literally how a large language model using a transformer architecture to predict the next token in a sequence works.

They take advantage of how a function choosing a token with maximal probability of appearing works.