| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by culebron21 1060 days ago
	That's the perfect situation many parallelization packages in python ecosystem take for granted. In my case, I had big datasets that wouldn't fit into memory (or I didn't want to demand this peak RAM usage on a server), and I had to write a parallelizer that read data in chunks and fed it to workers -- something very simple but unavailable in Pandas ecosystem (most packages assume you have swallowed the entire dataset, which isn't that large, and it's easy to throw the data to process-based workers)