Hacker News new | ask | show | jobs
by illumin8 3406 days ago
It's not that bad. You can compress the data on S3 in ORC or Parquet format, and you only pay for the compressed data you read, so 1TB can be 130GB after compression. Plus, these formats store summary data, so queries like SELECT COUNT don't have to do a full table scan - they can read just a few KB of summary data for the result.
1 comments

But that's a lot of work....Just to have sane costs for reads of your data
It's actually just two commands:

1. hive 2. INSERT INTO parquet_table SELECT * FROM csv_table;