|
|
|
|
|
by caravel
3734 days ago
|
|
Druid is definitely part of the equation. Larger, data-driven companies with significant engineering teams prefer not relying on 3rd party, closed-sourced vendors. That can represent a significant risk and a blockage for deeper integration with other internal applications when needed. Not that building always wins over buying, but the balance shifts relatively to the size of the company. Also, when using open source on the receiving end of the equation, you want to be a good citizen and contribute back to the ecosystem. It ties to pride, passion, and reflect a strong engineering culture, which can help with recruiting. |
|
I work on Google Cloud, where we have both closed-source - BigQuery, open source - Dataproc, and closed source that makes open source rock - Dataflow/Beam. There are merits to each.
BigQuery is serverless and multi-tenant and can't really exist outside of Google Cloud due to its intrinsic dependency on low-level services that don't exist elsewhere (and for the same reasons we couldn't directly externalize Borg, choosing to create an OSS clone in Kubernetes).
It is not unusual for me to hear from folks that they spent 6+ months building a performant and sizable, say Presto cluster. Then there's continuous management, tinkering, configuration, and optimization projects. I hear this from companies that one would consider sophisticated technologically.
By contrast, 40% of all of BigQuery's Petabyte customers scale to these levels without ever talking to us. We just find them on consumption reports. On multiple occasions we've had "surprise" load tests of millions of rows per second streamed into BigQuery, and it just works. BigQuery is also HA out of the box at no additional cost, which is a great luxury.
So sometimes if you need to scale analytics to Petabytes, the option is to just consume a managed and cost-effective service, or tinker with OSS, where there are significant operational tradeoffs. On the other hand, as you said, you build pride and culture. It's also far from an automatic shoo-in that OSS gives you better TCO (against the old closed-source guard, yes, but not so much BQ). Thus, the relationship between company size and value of technology can invert at higher levels.
With all that, I'd love to see a Caravel-BQ plugin :)
(PS. Kudos to Druid for introducing a Streaming ingest. BigQuery also sees value in Streaming ingest, GA-ing our own Streaming API in March of 2015, and Kudos to Airflow).