Hacker News new | ask | show | jobs
by intalentive 918 days ago
Even small models (e.g. hidden dims = 32) should be able to handle token ambiguity with attention. The information is not so much in the token itself as in the context.