Hacker News new | ask | show | jobs
by sooheon 399 days ago
Interesting read. But to me, describing attention as "a weighted mean of a continous learned k-v map lookup" is direct and descriptive, while saying "it's just kernel smoothing" is opaque and referential.
1 comments

Maybe take the best of both worlds: "It's just kernel smoothing, which is a weighted mean of a continuous learned kv-map lookup."

Now you got the definition, and the alternative naming you can use to search for resources.