|
|
|
|
|
by ptrik
1271 days ago
|
|
This depends on use case. SQL is the king for batching process - queries are declarative, decades of effort put into optimization. For real-time / streaming use cases, however, there is yet a mature solution in SQL yet. Flink SQL / Materialize is getting there, but the state-of-the-art approach is still Flink / Kafka Streams approach - put your state in memory / on local disk, and mutate it as you consume messages. This actually echoes the "Operate on data where it resides" principle in the article. |
|
Kafka-in-SQL if you wish. Or, homegrown Flink.
(There are many different uses for the events inside our SQL processing pipelines, and have to store the ingested events in SQL anyway)
I am sure real Kafka+Flink has some advantages, but...what we do works really well, is simple, and feels right for our scale.
It is enough batching in SQL to real speed/CPU benefits on inserts/updates into SQL (vs e.g. hitting SQL once per consumed event which would be way worse). And with Azure SQL the infra is extremely simple vs getting a Kafka cluster in our context.