|
|
|
|
|
by carlineng
574 days ago
|
|
I agree 100% that this needs to be more of a thing. For data engineers building data pipelines, queries are like functions, and table schemas are like types. There needs to be a way to write a query that runs on an abstract interface, rather than an actual table. To do this, most folks rely on string templating in Python or Jinja, which makes the development process really cumbersome. As a result, most teams end up in scenarios where data pipelines are always a big mess of spaghetti SQL, or they are stuck maintaining complex frameworks that abstract away common logic, but are inscrutable to the average user. I wrote a longer blog post about this recently: https://carlineng.com/?postid=holy-grail-data-engineering#bl... |
|
Seeing that both someone working on PRQL and Malloy replied and to both of you it's an understood pain makes me feel a lot better about the future of these tools! When talking about that with people that are not that deep into the problem it is often hard to transport the difference between this kind of composability vs. the composability that the tools are offering today, and the implications that come with that.
At a past startup I had the fortune to be able to work on a similar system to what I am looking for: Packageable, reusable relation algebra inspired by Substrait. It had the downside though that it was quite tied to RDF and SPARQL in its implementation, and now I'm chasing something similar in the SQL world :D