Hacker News new | ask | show | jobs
by karmasimida 1483 days ago
If they use BPE dropout, then the split can be different and not unique.

And for the record, they use BPE dropout for DALLE-1, see https://arxiv.org/pdf/2102.12092.pdf

1 comments

I believe they only apply it during training.
right, that is my point. It is hard to know which combination triggers the current tokenization to be interpreted as bird.