| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bitsondatadev 1901 days ago

Yeah this is a common misconception. Trino and Presto were aimed to replace and speed up the Hive engine.

As you say gwittel, adding Trino to an RDBMS itself won't speed things up. However, if you have operational data sitting in that RDBMS and data sitting in a data lake somewhere on like S3, then you can quickly join those datasets together.

Trino does its best to take advantage of any existing indexes that the RDBMS has by doing a pushdown but won't return that data any faster than the underlying database could. But it's the joining with other data sources data sets that makes the RDBMS connector worthwhile.

If you have a 1GB customer dataset in mysql and a 100TB dataset in s3 of all your orders, then Trino will first run a quick query against your mysql database, get a list of customer ids that meet the query, and then will use that list to filter the order id.

SELECT * FROM mysql.db_name.customer AS c JOIN s3.db_name.orders AS o ON c.id = o.customer_id WHERE c.credit_card_num = 123456789;