Hacker News new | ask | show | jobs
by haohui 3173 days ago
It is great to see that SQL becomes one of the de facto standards of building streaming analytics applications.

Personally I'm not a big fan of KSQL, given that:

(1) KSQL is built on top of Kafka Stream. There are use cases that we don't think Kafka Stream is a good fit. Please see the explanations above

(2) Inventing yet another SQL dialect is a bad idea in practice. Not only it incurs additional learning curves, but more importantly you have few hopes winning in development velocity. Calcite has been used by Flink, Storm, Beam, Dremio, etc. The community is simply way bigger even compared to the total number of engineers in Confluent.

Again this is just my personal take it does not reflect the stands of Uber.

1 comments

I share your sentiment here. Having a common base to work from also has the benefit that end-users, the BI or Data Science practitioner (whom we empower as data engineers) can expect a stable SQL dialect that they'll be able to use everywhere*.

I haven't followed Spark in recent months, but what I recall was that the SQL DSL had some caveats at first, because certain things weren't yet implemented.

My anecdote has been that projects that implement SQL after a while, typically don't deliver the whole thing on initial release. I imagine it's often quite a lot of work.

The more projects that use Calcite, the more upstream contributions there would be ...

Spark actually has probably the best standard SQL support among all open source big data frameworks. It can run all of TPC-DS queries without modifications.
TPC-DS evaluates against batch processing and it is great to hear that Spark supports all of them.

I think the space for streaming processing is still quickly evolving. Many features like stream-table joins, CTEs, streaming joins with late arrival data are unimplemented or do not even have clear semantics yet. It would be great to see a benchmarks like TPC-DS in the domain.