|
|
|
|
|
by lairv
146 days ago
|
|
I would agree if not for the fact that polars is not compatible with Python multiprocessing when using the default fork method, the following script hangs forever (the pandas equivalent runs): import polars as pl
from concurrent.futures import ProcessPoolExecutor
pl.DataFrame({"a": [1,2,3], "b": [4,5,6]}).write_parquet("test.parquet")
def read_parquet():
x = pl.read_parquet("test.parquet")
print(x.shape)
with ProcessPoolExecutor() as executor:
futures = [executor.submit(read_parquet) for _ in range(100)]
r = [f.result() for f in futures]
Using thread pool or "spawn" start method works but it makes polars a pain to use inside e.g. PyTorch dataloader |
|