|
|
|
|
|
by rpedela
1733 days ago
|
|
I think using a data warehouse as your data lake or lake house is optimal. Even for data that isn't relational. Storage is so cheap now and is decoupled from compute costs for several providers that I don't even give it a thought. You get a fast, scalable SQL interface which is still nice and useful for non-relational data. Then all, or most, of the transformations needed for analysis can be pure SQL using a tool like DBT. In my experience, it greatly simplifies the entire pipeline. |
|
I don't get it... Looks to me like DBT is a Python SQL wrapper / big library that among other things includes an SQL generator / something else like that -- but not "pure" SQL?