Y
Hacker News
new
|
ask
|
show
|
jobs
by
sooheon
399 days ago
Interesting read. But to me, describing attention as "a weighted mean of a continous learned k-v map lookup" is direct and descriptive, while saying "it's just kernel smoothing" is opaque and referential.
1 comments
ablob
399 days ago
Maybe take the best of both worlds: "It's just kernel smoothing, which is a weighted mean of a continuous learned kv-map lookup."
Now you got the definition, and the alternative naming you can use to search for resources.
link
Now you got the definition, and the alternative naming you can use to search for resources.