Hacker News new | ask | show | jobs
by sanxiyn 1100 days ago
This is wrong, byte-level models work fine, even if not as well as word-level models. From comparison of byte-level models and word-level models, we know tokenization part is responsible for minuscule part of performance.