Hacker News new | ask | show | jobs
by lifthrasiir 1586 days ago
The possible number of x-byte-long valid UTF-8 strings is defined with the following recurrence relation:

    f(-x) = 0
    f(0) = 1
    f(x) = 0x80 * f(x-1) + 0x780 * f(x-2) + 0xf400 * f(x-3) + 0x100000 * f(x-4)
(Replace 0x80 with 0x7c to account for ES6 template literals.) The characteristic polynomial for this recurrence has a positive root of 144.61 (or 141.12 for literals). This means that you can actually put quite more than 7 bits per byte in a valid JS code, provided that your decoder is negligibly small enough. Indeed, there exists an encoding that allows exactly 7 bits per byte by using two-byte-long UTF-8 sequence as an escape code [1].

[1] http://blog.kevinalbs.com/base122

1 comments

Huh. Javascript is significantly more accepting of non-printing characters in it's strings than I was expecting. I guess I should have known better.