Hacker News new | ask | show | jobs
by jakozaur 3484 days ago
Looks very similar to Google Big Query.

Even the pricing is same: $5 / TB of data scanned.

1 comments

When I tried it it was slower than bigquery. Plus you've got to mess about creating hive schemas.
I don't know why you are getting downvoted. For all those data formats you have to painstakingly make table schemas for them before you can query them. Not like Snowflake or BigQuery. One of the biggest strikes against Presto IMHO.
Apache Drill might have been a better basis if they wanted to build a "query everything easily" based on an existing project.
It's not Presto per se, but running any data processing workload against unoptimized data formats is the issue.

Then again, both BigQuey and Snowflake require that you move data into their storage engine (Redshift too), and that's an additional step that's proportional to the size and complexity of your data. At the same time, it's stupid to store your logs as OLAP optimized formats and completely lose legibility. In sum, Athena trades off performance for convenience.

No matter what database vendors say, you can't defy the principles of computer science.

Note that BigQuery has been able to read files straight from GCS, Drive, and even Google Spreadsheets for a while:

https://cloud.google.com/bigquery/federated-data-sources

(I'm Felipe Hoffa and I work for Google https://twitter.com/felipehoffa)

You don't replace them with an OLAP format, you can pair them with an OLAP engine to aggregate, filter, or analyze. Elastic Search and Splunk are one approach, SQL query engines are another.

Apache Drill is a schema discovery on read approach that can handle some of this. Its not perfect, but it does simplify some of the process where its capabilities fit the task at hand.

TFA states: "Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, and Parquet."
"Amazon Athena uses Apache Hive DDL to define tables."
> Q: What data formats does Amazon Athena support?

> Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, and GZIP formats. By compressing, partitioning, and using columnar formats you can improve performance and reduce your costs.

https://aws.amazon.com/athena/faqs/