|
|
|
|
|
by _giorgio_
912 days ago
|
|
> the paper that was most helpful for me was [Formal Algorithms for Transformers](https://arxiv.org/abs/2207.09238) Interesting but hard to read since it uses a quite unique notations for matrix indexing and multplication. Why??? |
|