Hacker News new | ask | show | jobs
by nevi-me 3173 days ago
I share your sentiment here. Having a common base to work from also has the benefit that end-users, the BI or Data Science practitioner (whom we empower as data engineers) can expect a stable SQL dialect that they'll be able to use everywhere*.

I haven't followed Spark in recent months, but what I recall was that the SQL DSL had some caveats at first, because certain things weren't yet implemented.

My anecdote has been that projects that implement SQL after a while, typically don't deliver the whole thing on initial release. I imagine it's often quite a lot of work.

The more projects that use Calcite, the more upstream contributions there would be ...

1 comments

Spark actually has probably the best standard SQL support among all open source big data frameworks. It can run all of TPC-DS queries without modifications.
TPC-DS evaluates against batch processing and it is great to hear that Spark supports all of them.

I think the space for streaming processing is still quickly evolving. Many features like stream-table joins, CTEs, streaming joins with late arrival data are unimplemented or do not even have clear semantics yet. It would be great to see a benchmarks like TPC-DS in the domain.