Hacker News new | ask | show | jobs
by pmyteh 3208 days ago
Not reliably, no. You can detect if it's an invalid string according to the encoding you're currently using (value > 127 for ASCII, invalid surrogate pair for UTF-16) but there are lots of byte sequences that produce valid (but semantically meaningless) output in multiple encodings. To choose between them programmatically requires your algorithm to understand the meaning of the string as well as be able to decode it, which might be possible in limited domains, but is a very hard problem in general.