| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by intalentive 918 days ago
	Even small models (e.g. hidden dims = 32) should be able to handle token ambiguity with attention. The information is not so much in the token itself as in the context.