| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by qcnguy 238 days ago
	OpenAI have talked about it. The neural architecture needs to let the model handle the case where there's nothing worth attending to, as softmax requires attention to be allocated to all tokens but sometimes there's nothing worth it.