Hacker News new | ask | show | jobs
by Majromax 1107 days ago
> However, some characters require more than 1 byte in UTF-8; those characters might end up with as much as 4 tokens.

This would seem to raise an interesting "prompt golf" challenge: find a reasonable-sounding prompt that causes the language model to generate invalid UTF-8 in its output.