Y
Hacker News
new
|
ask
|
show
|
jobs
by
cschmidt
353 days ago
And in regard to utf-8 being a shitty biased tokenizer, here is recent paper trying to design a better style of encoding
https://arxiv.org/abs/2505.24689