Hacker News new | ask | show | jobs
by mikewarot 300 days ago
Because it never sees raw ASCII or Unicode during training.

Everything in their input is tokenized. Asking it to count is like asking a person born blind to paint and complaining they didn't get the colors quite right.

You could train an AI on ASCII or Unicode, but it would likely take 100 times the compute resources for similar performance on everything else. Tokenized input is really efficient.

1 comments

So they're also complete crap with old-fashioned ASCII art?

I wonder if that could be useful, to make AI-resistant CAPTCHA's...