Hacker News new | ask | show | jobs
by bitsondatadev 1410 days ago
For managing difrerent workloads, check out this blogs and this videos from Shopify, Salesforce, Goldman Sachs, and Electronic Arts, respectively:

- https://engineering.salesforce.com/how-to-etl-at-petabyte-sc... - https://shopify.engineering/faster-trino-query-execution-inf... - https://trino.io/episodes/33.html - https://www.youtube.com/watch?v=-5mlZGjt6H4

All use the Lyft "Presto but really Trino"-Gateway project to run different clusters to handle various workloads. They go into various details for how this is achieved.

https://github.com/lyft/presto-gateway

Regarding the Trino/Presto split. I recommend looking at this blog to better understand why these two communities aren't mergeing. TL;DR Presto is a Facebook-driven project that mainly considers running on the Facebook infrastructure. Trino is community-driven that works on running well with all clouds and common infasturcture in the Trino community which is why you see a higher velocity there.

https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-... https://trino.io/blog/2020/12/27/announcing-trino.html

Soon we anticipate that Trino will become the common name in the community space but we'll always love the origins of the Trino project being Presto.

2 comments

Ali here, with a perspective about the split. Disclosure - I work at Ahana and am an active member of the Presto Foundation. When I see things like this, it appears that Trino/Starburst wants to continue to push the narrative that Presto is a Facebook-driven project to keep the communities fractured which is pretty unfortunate. In reality, Presto is a community-based open source project housed under The Linux Foundation and has dozens of companies actively contributing to it and using it - Uber, Bytedance, Intel, Twitter, Tencent, and many more. There's no reason why the 2 communities can't coexist peacefully.

For all intents and purposes, both projects are active and lively. It seems that Trino is more focused on federation and building out connectors. Presto is more focused on being the engine for the data lake/lakehouse. Both projects are doing well and solving different problems. There's been a lot of innovative features in the Presto project over the last year that are only in Presto, like Presto-on-Spark, disaggregated coordinator, Project Aria, etc. In fact we just hosted a fantastic user conference a few weeks ago that showcased a lot of that innovation and how companies are using Presto at massive scale today (if interested, check out the sessions: https://www.youtube.com/watch?v=Gi8i7eHqwyw&list=PLJVeO1NMmy...)

Long story short, Presto is alive and well, is not solely backed by 1 company (quite the opposite of Trino/Starburst), and has a lot of tech innovation on the roadmap. We're excited about the future of Presto.

Yes, definitely it may help if going with multiple clusters, however, there are also many scenarios that we don't want to maintain multiple clusters. For example, when we come to a SaaS platform, multi-tenant is pretty typical where different tenants may have different workloads, and workload management would be needed for different users, or even within the same tenant. So the "built-in" workload management (besides other features for multi-tenant) would be a big plus.