| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eesmith 2318 days ago

If it's a fixed-width encoding, nudge the read size to a multiple of that encoding size.

If it's utf-8, keep the block reads in byte space, search for the terminator as a byte sequence, and only decode after you find the terminator.

Otherwise, throw your hands up in the air and give up?

Catch the UnicodeDecodeError, use err.start, and see if it's close to the end of the block? If it is, then do another read?

BTW, you can mitigate some Python overhead by using a larger read size.