| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ktamura 3486 days ago

It's not Presto per se, but running any data processing workload against unoptimized data formats is the issue.

Then again, both BigQuey and Snowflake require that you move data into their storage engine (Redshift too), and that's an additional step that's proportional to the size and complexity of your data. At the same time, it's stupid to store your logs as OLAP optimized formats and completely lose legibility. In sum, Athena trades off performance for convenience.

No matter what database vendors say, you can't defy the principles of computer science.

2 comments

fhoffa 3485 days ago

Note that BigQuery has been able to read files straight from GCS, Drive, and even Google Spreadsheets for a while:

https://cloud.google.com/bigquery/federated-data-sources

(I'm Felipe Hoffa and I work for Google https://twitter.com/felipehoffa)

link

bsg75 3485 days ago

You don't replace them with an OLAP format, you can pair them with an OLAP engine to aggregate, filter, or analyze. Elastic Search and Splunk are one approach, SQL query engines are another.

Apache Drill is a schema discovery on read approach that can handle some of this. Its not perfect, but it does simplify some of the process where its capabilities fit the task at hand.

link