| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wting 4724 days ago

This is an excessively long blog post that basically states: do stream processing when your data set doesn't fit into memory.

    with open('in.txt') as input, open('out.txt') as out:
        for line in input.readlines():
            out.write(foo(line))

Python users are used to importing everything all at once, while in C everything is done in small chunks whenever possible.

Python 3 is also moving into this direction by replacing many default functions with their iterator equivalents (map, range, etc).

You might think that this means forcing everything into one big context manager, but that's not necessarily true. For example:

    from itertools import imap

    def read_file(filename):
        with open(filename, 'r') as f:
            reader = csv.reader(f)
            for line in reader:
                yield line

    def write_file(filename, data):
        with open(filename, 'w') as f:
            writer = csv.writer(f)
            map(writer.writerow, data)

    write_file(
        filename='out.txt',
        data=imap(foo, read_file('in.txt')))

1 comments

d0mine 4724 days ago

Don't use `for line in file.readlines():`, do `for line in file:` instead.

Don't use `map(writer.writerow, data)`, do `writer.writerows(data)` instead. Use binary mode for csv files in Python 2, use newline='' in Python 3.

You forgot to open the first output file in the write mode.

link