Hacker News new | ask | show | jobs
by alex-korr 1041 days ago
Maybe I am missing something but would there ever be a scenario where taking a single albeit large sql statement and rewriting it as several pyspark scripts would result in faster runtime for your data pipeline? In most cases, this will be much much slower.
1 comments

Greatly depends on your environment. I am thankfully in an area where there are very modest timeliness requirements. Improving the speed of a job means little to me. However, improving debugability or checkpointing when things go wrong is always valuable.