| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by knuppar 427 days ago
	One could argue TF-IDF is a case of an attention layer... but not quadratic in inference/training and kinda just a quotient. Yeah maybe we should go back