|
|
|
|
|
by cl42
1114 days ago
|
|
Those hacks are literally how a large language model using a transformer architecture to predict the next token in a sequence works. They take advantage of how a function choosing a token with maximal probability of appearing works. |
|