Hacker News new | ask | show | jobs
by ktamura 3486 days ago
It's not Presto per se, but running any data processing workload against unoptimized data formats is the issue.

Then again, both BigQuey and Snowflake require that you move data into their storage engine (Redshift too), and that's an additional step that's proportional to the size and complexity of your data. At the same time, it's stupid to store your logs as OLAP optimized formats and completely lose legibility. In sum, Athena trades off performance for convenience.

No matter what database vendors say, you can't defy the principles of computer science.

2 comments

Note that BigQuery has been able to read files straight from GCS, Drive, and even Google Spreadsheets for a while:

https://cloud.google.com/bigquery/federated-data-sources

(I'm Felipe Hoffa and I work for Google https://twitter.com/felipehoffa)

You don't replace them with an OLAP format, you can pair them with an OLAP engine to aggregate, filter, or analyze. Elastic Search and Splunk are one approach, SQL query engines are another.

Apache Drill is a schema discovery on read approach that can handle some of this. Its not perfect, but it does simplify some of the process where its capabilities fit the task at hand.