| HN Mirror

What motivated me to write it is due to, as I am trying to learn about the transformer myself, not being able to find a very simple reference to transformers. The annotated transformer (https://nlp.seas.harvard.edu/2018/04/03/attention.html) for instance, is quite convoluted in my opinion, and uses difficult to understand syntax.

Thanks for the link, I haven't seen this before but It looks quite simple and nice.

I think the one in Pytorch itself isn't too bad as well, but there's huge chunk of block comments and the fact that it is entangled with other modules makes it intimidating to break down and test out / use immediately.

Yes I don't currently have no decoding besides argmax on the decoder logits(so no beam search etc).

I suppose if you just mean sequence generation there is a small function that can be found in the dataset class. But it might be good to put that somewhere visible.

I don't want to include beam search in order to not introduce anything beyond the core architecture (innovative contribution) of the original paper, and most other implementations have it. But it is a good suggestion nonetheless.

Thanks a lot for the comment! :)