Hacker News new | ask | show | jobs
by joelburget 1274 days ago
It works on all human languages, just inefficiently. I ran it over a sample I found on wikipedia:

    sample = "ฟองมันฟันหนู, ฟันหนูฟองมัน, ฝนทองฟองมัน"
    len(sample), len(enc.encode(sample))
This returns `39, 40` so it's just encoding one character at a time. It's probably like this for almost all non-English text.
1 comments

Yeah, at least it does it with Russian