Hacker News new | ask | show | jobs
by evolvingstuff 797 days ago
You are correct, that is an error in an otherwise great video. The k+1 token is not merely a function of the kth vector, but rather all prior vectors (combined using attention). There is nothing "special" about the kth vector.