But apparently, they either just emit a [UNK] token or translate the unrecognized character into raw UTF-8 bytes.