|
|
|
|
|
by janalsncm
491 days ago
|
|
I think another argument is that the CoT is simply unrolling the recurrent loop that this method uses, and doing an unembedding -> embedding -> unembedding during the decoding process. So at best, using a recurrent loop is only saving you from doing the embedding -> unembedding at each token which is relatively small compared with the height of the decoder blocks. |
|