Y
Hacker News
new
|
ask
|
show
|
jobs
by
maxbond
314 days ago
That's a super interesting hypothesis. From an information theory perspective, rarer tokens are more informative. Maybe this results in the caps lock tokens being weighted higher by the attention mechanism.