|
|
|
|
|
by zbentley
2810 days ago
|
|
I don't think so. If you want to detect and operate on only the data that could represent ASCII characters, you could, certainly process it as a byte string if you wanted, but you'd have to track the presence of non-ASCII-range character codes yourself, and keep state around to represent whether you were in the middle of a multibyte character as you read through the bytes. If done right, it would be a (probably much slower) re-implementation of what happens when you use the latin1 trick mentioned. You have to get it right, though (sneaky edge cases abound--what if the file starts in the middle of an incomplete multibyte character?). TL;DR this could technically work but is a poor idea. |
|