Hacker News new | ask | show | jobs
by ejwhite 1994 days ago
Interesting.

I need to open a very large CSV file in Python, which is around 25GB in .zip format. Any idea how to do this in a streaming way, i.e. stopping after reading the first few thousand rows?

2 comments

> I need to open a very large CSV file in Python, which is around 25GB in .zip format. Any idea how to do this in a streaming way, i.e. stopping after reading the first few thousand rows?

Replace the `file_paths` list in my proof of concept with your large file(s), delete the rest (lines 61-68, 77-79) and it should just work.

Works fine with Python's standard library. Files in a ZipFile can be read in a streaming manner. There is no need to store all the data in memory.

    import io, csv, zipfile

    max_lines = 10
    with zipfile.ZipFile("data.zip") as z:
        for info in z.infolist():
            with z.open(info.filename) as f:
                reader = csv.reader(io.TextIOWrapper(f))
                for i_line, line in enumerate(reader):
                    if i_line >= max_lines: break
                    print(line)
This is true when writing to a file. The goal of my PoC was to not write a file and instead to stream to the web browser.