|
|
|
|
|
by domenicrosati
1606 days ago
|
|
Cool! What motivated you to write it? I find the d2l implementation nice and simple as well https://d2l.ai/chapter_attention-mechanisms/transformer.html The one in pytorch itself isnt so bad either. Be nice to see some examples of decoding in your repo (forgive me if I dont see it). I remeber when i first implemented a transformer from scratch generating sequences using greedy or beam search after train/testing turned out to be harder then I thought but turns out I made a mistake with teacher forcing in the beginning so BOS tokens were meaningless to the decoder lol |
|
Thanks for the link, I haven't seen this before but It looks quite simple and nice.
I think the one in Pytorch itself isn't too bad as well, but there's huge chunk of block comments and the fact that it is entangled with other modules makes it intimidating to break down and test out / use immediately.
Yes I don't currently have no decoding besides argmax on the decoder logits(so no beam search etc).
I suppose if you just mean sequence generation there is a small function that can be found in the dataset class. But it might be good to put that somewhere visible.
I don't want to include beam search in order to not introduce anything beyond the core architecture (innovative contribution) of the original paper, and most other implementations have it. But it is a good suggestion nonetheless.
Thanks a lot for the comment! :)