|
|
|
|
|
by robertkoss
281 days ago
|
|
You were talking about data engineering. If you do not write tests as a data engineer what are you doing then? Just hoping that you don't fuck up editing a 1000 > line SQL script? If you use Athena you still have to worry about shuffling and joining, it is just hidden.. It is Trino / Presto under the hood and if you click explain you can see the execution plan, which is essentially the same as looking into the SparkUI. Who cares about JVM versions nowadays? No one is hosting Spark themselves. Literally every tool now supports DataFrame AND SQL APIs and to me there is no reason to pick up SQL if you are familiar with a little bit of Python |
|
https://ludic.mataroa.blog/blog/get-me-out-of-data-hell/