Y
Hacker News
new
|
ask
|
show
|
jobs
by
karmasimida
1483 days ago
If they use BPE dropout, then the split can be different and not unique.
And for the record, they use BPE dropout for DALLE-1, see
https://arxiv.org/pdf/2102.12092.pdf
1 comments
DalasNoin
1483 days ago
I believe they only apply it during training.
link
karmasimida
1483 days ago
right, that is my point. It is hard to know which combination triggers the current tokenization to be interpreted as bird.
link