Hacker News new | ask | show | jobs
by fifilura 814 days ago
Was this before BigQuery/Presto/Trino? To me it seems like those technologies would have been a good fit.

They don't really work with indexes but instead regular files stored in partitions (where date is typically one of them).

This means that they only have to worry about the data (e.g. dates) that you are actually querying. And they scale up to the number of CPUs that particular calculation needs. They rarely choke on big query sizes. And big tables are not really an issue as long as you query only the partitions you need.

1 comments

Those technologies were brand new at the time, the discussions about the problem started in 2013. The company (I had zero input) choose a more established vendor with an older product. Given the time and institutional customers that were trusting us with their data, I suspect any cloud based offerings were a nonstarter, and open source felt like a liability.

Of course with 20/20 hindsight that decision is easy to criticize. I suspect their primary concerns were to minimize risk and costs while meeting our customer's requirements. Even today, making a brand new Google product or Facebook backed open source project a hard dependency would be too much risk for an established business.