|
|
|
|
|
by evolvingstuff
797 days ago
|
|
You are correct, that is an error in an otherwise great video. The k+1 token is not merely a function of the kth vector, but rather all prior vectors (combined using attention). There is nothing "special" about the kth vector. |
|