And also a related blog post: https://news.ycombinator.com/item?id=34726115
Although this is for a decoder-only transformer (aka GPT) and doesnt include the encoder part.
And also a related blog post: https://news.ycombinator.com/item?id=34726115
Although this is for a decoder-only transformer (aka GPT) and doesnt include the encoder part.