| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by akhong 2072 days ago
	Yes, I agree with you in Pandas' case. However, for other libraries, a good chunk of the run time comes from reading the parquet files and concatenating the partial datasets. Pandas and Spark are particularly really good with reading a directory of 12 Parquet files with no noticeable performance penalty.