Athena is probably my best bet tbh, especially if I can do a few clicks and just get smaller files. Processing smaller files is a no brainer / pretty easy and could be outsourced to lambda.
Yeah the big benefit is that it requires very little setup.
You create a new partitioned table/location from the originally mapped file using a CTAS like so:
CREATE TABLE new_table_name
WITH (
format = 'PARQUET',
parquet_compression = 'SNAPPY',
external_location = 's3://your-bucket/path/to/output/'
) AS
SELECT *
FROM original_table_name
PARTITIONED BY partition_column_name
You can probably create a hash and partition by the last character if you want 16 evenly sized partitions. Unless you already have a dimension to partition by.
You create a new partitioned table/location from the originally mapped file using a CTAS like so:
You can probably create a hash and partition by the last character if you want 16 evenly sized partitions. Unless you already have a dimension to partition by.