|
|
|
|
|
by dmitriid
1981 days ago
|
|
I can kinda agree on GCP with one exception: Dataflow. I have no idea what the future holds for it. It is a managed Apache Beam service and is very useful for certain scenarios (like "hey, we have a million incoming PubSub messages that we need to transform into a dozen different branching streams of data"). It looks like even BigQuery actually transforms SQL statements into a bunch of Dataflow jobs. But... But... - Minor version updates to Google Dataflow SDK once every couple of months while deprecating most other minor versions? Check. - No visible contributions to Apache BEAM itself? Check. In 2021 I still don't know if I can use any Java versions beyond Java 8 to develop for and run in Dataflow. And Google is arguably one of the biggest users of Apache BEAm, and definitely a user with the largest pile of money to throw at the problem. - They've recently sent out a questionnaire about Dataflow to some of their customers that feels like a "hey, we're definitely considering deprecating this, we're gauging the potential impact" |
|
Sorry, if you're getting mixed messages. Dataflow is here to stay. Google, Spotify, Twitter, and many other large customers heavily depend on it. Twitter moved their entire ad revenue pipeline to it [1] last year.
A quick perusal though of https://github.com/apache/beam/commits/master shows decent Googler activity. Can you highlight where you were looking for "no visible contributions"? (Maybe we do a bad job of being visible?).
[1] https://cloud.google.com/blog/products/data-analytics/modern...