| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by adammarples 1042 days ago

I'm not sure about this. The big SQL script is annoying but when broken down into 4 parts it's very easy to understand. This is precisely the strength of dbt. This would decompose to 4 dbt models, which could be deployed as views, tables, or ctes etc. Instead it seems that you've developed your own DSL for data transforms to take its place, in the form of special classes declared in yaml.

It might be better? But sql is so well understood and covers so much functionality I'd expect that it would be a long time before you ever hit parity with it.

It would be nice if dbt could interface with buckets etc but if they're wrapped in an external table or whatever then that problem goes away.

One thing I noticed is that it (the example) misses a killer feature of dbt, you're specifying your database targets right in the class config. The killer feature of dbt is that you just specify the transforms and then point it at different environments using a target flag and a profiles file, deploy to different envs with ease. I would definitely separate location/env/credential config from transform logic or make it variable.

Given that sql is a totally valid language for declaring transforms in spark, I would probably rather see spark as a materialization backend to dbt somehow rather than an entirely new thing.