Hacker News new | ask | show | jobs
by sojuz151 232 days ago
This means that current tokenisers are bad, and something better is needed if text rendering + image input is a better tokeniser.
1 comments

Humans recognise two vastly different types of language input (auditory and visual). I doubt that one type of tokeniser is inherently superior.