Hacker News new | ask | show | jobs
by davvid 4088 days ago
rather than converting things silently and possibly losing or corrupting data

Exactly. Python3 went down the "silently converting" route, and it's not pretty[1]. I would go so far as to call it harmful.

http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

I understand the difficulty in this space; much of it is caused by forcing the Windows unicode filesystem API onto python as its world-view, rather than sticking to the traditional Unix bytes world-view. I'm unixy, so I'm completely biased, but I think adopting the Windows approach is fundamentally broken.

1 comments

The problem there is overblown - it's basically all due to the idea that sys.stdin or sys.stdout might get replaced with streams that don't have a binary buffer. The simple solution is just not to do that (and it's pretty easy; instead of replacing with a StringIO, replace it with a wrapped binary buffer). Then the code is quite simple

    import sys
    import shutil

    for filename in sys.argv[1:]:
        if filename != '-':
            try:
                f = open(filename, 'rb')
            except IOError as err:
                msg = 'cat.py: {}: {}\n'.format(filename, err)
                sys.stderr.buffer.write(msg.encode('utf8', 'surrogateescape'))
                continue
        else:
            f = sys.stdin.buffer

        with f:
            shutil.copyfileobj(f, sys.stdout.buffer)
Python's surrogateescape'd strings aren't the best solution, but I personally believe that treating unicode output streams as binary ones is even worse.