Hacker News new | ask | show | jobs
by cschmidt 353 days ago
And in regard to utf-8 being a shitty biased tokenizer, here is recent paper trying to design a better style of encoding https://arxiv.org/abs/2505.24689