|
|
|
|
|
by scaleout1
3172 days ago
|
|
Interesting project although cant say I am happy to see SQL being used in Streaming Systems like this. In my last two jobs I had to write frameworks and tools to enable "Data Scientists" and "analysts" to write production jobs and problem I have run into with exposing SQL to this class of user is that every job end up being its own special snowflake with deeply nested SQL with custom UDFs mixed in for good measure. Due to "unique" nature of each its significantly increases the support and maintainability cost. I have to come the conclusion that a typesafe api with map/filter/flatmap is much better API to expose that Stringly typed SQL. I am curious to know whether Uber is running into similar support issues? |
|
(1) There are significant loads on consultations when users had to implement their own jobs in Java / Scala and run them in production. Sometimes it turned in to co-development as the users lack the expertise of the streaming analytics frameworks.
(2) We consciously encourage our users to write good SQLs via: (a) enforcing schemas on all analytical Kafka topics. (b) setting up a team dedicated to help them using SQL in big data systems (i.e., Hive, Presto, AthenaX, etc.)
For UDF we provide general guidances and ask our users to oncall for the jobs that use UDFs. The support costs are definitely not zero but it is still much better to teach users to write a Samza / Flink / Storm job from scratch.