|
|
|
|
|
by culebron21
1060 days ago
|
|
That's the perfect situation many parallelization packages in python ecosystem take for granted. In my case, I had big datasets that wouldn't fit into memory (or I didn't want to demand this peak RAM usage on a server), and I had to write a parallelizer that read data in chunks and fed it to workers -- something very simple but unavailable in Pandas ecosystem (most packages assume you have swallowed the entire dataset, which isn't that large, and it's easy to throw the data to process-based workers) |
|