| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hayd 3454 days ago
	see also aws athena https://aws.amazon.com/athena/ ?

3 comments

dajohnson89 3454 days ago

That seems cool but paying per query (per TB scanned) frightens me. I imagine having to fret about how efficient my queries are...

link

illumin8 3454 days ago

It's not that bad. You can compress the data on S3 in ORC or Parquet format, and you only pay for the compressed data you read, so 1TB can be 130GB after compression. Plus, these formats store summary data, so queries like SELECT COUNT don't have to do a full table scan - they can read just a few KB of summary data for the result.

link

dajohnson89 3447 days ago

But that's a lot of work....Just to have sane costs for reads of your data

link

illumin8 3437 days ago

It's actually just two commands:

1. hive 2. INSERT INTO parquet_table SELECT * FROM csv_table;

link

wellsjohnston 3454 days ago

I did not know about this...looks like Amazon's version of BigQuery. Fantastic!

link

nstj 3454 days ago

Somehow I'd overlooked this too: nice find

link