| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alex-korr 1041 days ago
	Maybe I am missing something but would there ever be a scenario where taking a single albeit large sql statement and rewriting it as several pyspark scripts would result in faster runtime for your data pipeline? In most cases, this will be much much slower.

1 comments

0cf8612b2e1e 1040 days ago

Greatly depends on your environment. I am thankfully in an area where there are very modest timeliness requirements. Improving the speed of a job means little to me. However, improving debugability or checkpointing when things go wrong is always valuable.

link