| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by microtonal 647 days ago
	It’s quite interesting that we end up using cosine similarity. Most networks are trained with a softmax layer at the end (e.g. next word prediction). Given the close relation between softmax and logistic regression, it might make more sense to use σ(u.v) as the similarity function.