I need to open a very large CSV file in Python, which is around 25GB in .zip format. Any idea how to do this in a streaming way, i.e. stopping after reading the first few thousand rows?
> I need to open a very large CSV file in Python, which is around 25GB in .zip format. Any idea how to do this in a streaming way, i.e. stopping after reading the first few thousand rows?
Replace the `file_paths` list in my proof of concept with your large file(s), delete the rest (lines 61-68, 77-79) and it should just work.
Works fine with Python's standard library. Files in a ZipFile can be read in a streaming manner. There is no need to store all the data in memory.
import io, csv, zipfile
max_lines = 10
with zipfile.ZipFile("data.zip") as z:
for info in z.infolist():
with z.open(info.filename) as f:
reader = csv.reader(io.TextIOWrapper(f))
for i_line, line in enumerate(reader):
if i_line >= max_lines: break
print(line)
Replace the `file_paths` list in my proof of concept with your large file(s), delete the rest (lines 61-68, 77-79) and it should just work.