Hacker News new | ask | show | jobs
by thatjoeoverthr 117 days ago
While that's true, the tokenizer is half the problem. The important fault demonstrated is it doesn't _know_ it can't see the letters, and won't express this unless it has been trained or instructed to. "I can't see letters through the tokenizer" never appears in a corpus of human writing.