|
|
|
|
|
by microtonal
647 days ago
|
|
It’s quite interesting that we end up using cosine similarity. Most networks are trained with a softmax layer at the end (e.g. next word prediction). Given the close relation between softmax and logistic regression, it might make more sense to use σ(u.v) as the similarity function. |
|