Y
Hacker News
new
|
ask
|
show
|
jobs
by
intalentive
918 days ago
Even small models (e.g. hidden dims = 32) should be able to handle token ambiguity with attention. The information is not so much in the token itself as in the context.