Hacker News new | ask | show | jobs
by ben_w 1191 days ago
My wild guess is that if it could get things done by tokenising like that all the time, they wouldn't need to also have word-like tokens.

If that is a inference time performance or training time performance or a model size issue or just total nonsense, I wouldn't know.