Hacker News new | ask | show | jobs
by pjot 906 days ago
Apache Arrow and Substrait have been working towards making this a reality. I see a future where executing a query can/will send plans to many different engines distributed across the cloud, but also locally on your on machine.
2 comments

Real-time Bidding on query execution? The more I think about it, I believe you actually have a viable business model here.
That’s a wildly interesting idea.

It open up another market too: compatible, scalable storage. Sell shovels in a gold-rush, and what better shovel than the substrate infrastructure that those bidding query engines would probably depend on.

If the queries can be executed by any provider, you are talking about a commodity product.

The business model of selling a commodity is wildly unlike the business model tech is in today.

The query execution might be commodity, but the purchasers will still need to store their data somewhere, and this somewhere will need to be able to service the bandwidth and requirements of the query execution providers.
It feels like you could just as well pack the runtime/engine into the job you are requesting? Am I wrong?
The point is more so in aim of creating interoperability between systems and making them in turn composable.

When there’s a common intermediate representation you can pass around those compute instructions and execute. And when there’s shared memory formats data can pass from storage to engine without serialization/deserialization.

So it wouldn’t matter if data is here or there, in this or that format, because the instructions are the same the specific interface (snowflake, MySQL, a local parquet file, etc) is irrelevant mitigating the need for glue code.