Hacker News new | ask | show | jobs
by jltsiren 1161 days ago
It's more like some big languages receive special treatment, while everything else is interpreted as a byte stream. In Finnish language, the tokens seem to be arbitrary substrings of average length 3-4, and they rarely correspond to any semantically or grammatically meaningful units.