Hacker News new | ask | show | jobs
by vgt 3738 days ago
Your points resonate with me very strongly, but I do have a counter argument.

I work on Google Cloud, where we have both closed-source - BigQuery, open source - Dataproc, and closed source that makes open source rock - Dataflow/Beam. There are merits to each.

BigQuery is serverless and multi-tenant and can't really exist outside of Google Cloud due to its intrinsic dependency on low-level services that don't exist elsewhere (and for the same reasons we couldn't directly externalize Borg, choosing to create an OSS clone in Kubernetes).

It is not unusual for me to hear from folks that they spent 6+ months building a performant and sizable, say Presto cluster. Then there's continuous management, tinkering, configuration, and optimization projects. I hear this from companies that one would consider sophisticated technologically.

By contrast, 40% of all of BigQuery's Petabyte customers scale to these levels without ever talking to us. We just find them on consumption reports. On multiple occasions we've had "surprise" load tests of millions of rows per second streamed into BigQuery, and it just works. BigQuery is also HA out of the box at no additional cost, which is a great luxury.

So sometimes if you need to scale analytics to Petabytes, the option is to just consume a managed and cost-effective service, or tinker with OSS, where there are significant operational tradeoffs. On the other hand, as you said, you build pride and culture. It's also far from an automatic shoo-in that OSS gives you better TCO (against the old closed-source guard, yes, but not so much BQ). Thus, the relationship between company size and value of technology can invert at higher levels.

With all that, I'd love to see a Caravel-BQ plugin :)

(PS. Kudos to Druid for introducing a Streaming ingest. BigQuery also sees value in Streaming ingest, GA-ing our own Streaming API in March of 2015, and Kudos to Airflow).

1 comments

Roughly how many petabyte customers are you talking about?