| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jakozaur 3530 days ago
	Looks very similar to Google Big Query. Even the pricing is same: $5 / TB of data scanned.

1 comments

estefan 3530 days ago

When I tried it it was slower than bigquery. Plus you've got to mess about creating hive schemas.

link

spullara 3530 days ago

I don't know why you are getting downvoted. For all those data formats you have to painstakingly make table schemas for them before you can query them. Not like Snowflake or BigQuery. One of the biggest strikes against Presto IMHO.

link

bsg75 3530 days ago

Apache Drill might have been a better basis if they wanted to build a "query everything easily" based on an existing project.

link

ktamura 3530 days ago

It's not Presto per se, but running any data processing workload against unoptimized data formats is the issue.

Then again, both BigQuey and Snowflake require that you move data into their storage engine (Redshift too), and that's an additional step that's proportional to the size and complexity of your data. At the same time, it's stupid to store your logs as OLAP optimized formats and completely lose legibility. In sum, Athena trades off performance for convenience.

No matter what database vendors say, you can't defy the principles of computer science.

link

fhoffa 3530 days ago

Note that BigQuery has been able to read files straight from GCS, Drive, and even Google Spreadsheets for a while:

https://cloud.google.com/bigquery/federated-data-sources

(I'm Felipe Hoffa and I work for Google https://twitter.com/felipehoffa)

link

bsg75 3530 days ago

You don't replace them with an OLAP format, you can pair them with an OLAP engine to aggregate, filter, or analyze. Elastic Search and Splunk are one approach, SQL query engines are another.

Apache Drill is a schema discovery on read approach that can handle some of this. Its not perfect, but it does simplify some of the process where its capabilities fit the task at hand.

link

guywithabike 3530 days ago

TFA states: "Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, and Parquet."

link

bsg75 3530 days ago

"Amazon Athena uses Apache Hive DDL to define tables."

link

jackmaney 3530 days ago

> Q: What data formats does Amazon Athena support?

> Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, and GZIP formats. By compressing, partitioning, and using columnar formats you can improve performance and reduce your costs.

https://aws.amazon.com/athena/faqs/

link