Hacker News new | ask | show | jobs
by causal 798 days ago
I like this, but think there is some crucial motivation missing in steps 10.1-10.3 regarding what query/key weights are and why they're needed.
3 comments

They are like "continuous" databases. See slides 4-5 here [1] - this is from a talk I had given a while ago.

[1] https://drive.google.com/file/d/12uHo9QIfS-jBpVTs3lmQ3BEpxhD...

this post made sense to me https://teltam.github.io/posts/soft-dictionary-keys.html

It helps to think of kqv as a form of look up.

yes, same issue in all transformer tutorials
I suspect this is because most people (including people writing these tutorials) don't have a strong grasp on this piece as well.
The 2b1b video was the first to make it click for me
You mean 3b1b (three blue one brown)?
Ah that's right, miscounted the blues