Hacker News new | ask | show | jobs
by iamsomewalrus 2698 days ago
I’ll have a blog post out in AWS Big Data in a few weeks but the parent is right in that it’s not terribly difficult.

Should take about an hour with testing to get a Pyspark script together to read in a DynamoDB table and write it out to S3.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programm...

You’ll then need to crawl the S3 data to add it to your Glue catalog and then you can query it with Athena.

1 comments

If you're running this regularly (e.g. once an hour) to dump tables the cost of Glue can really add up due a minimum runtime length.

We switched to a scheduled Fargate task to dump data from DynamoDB into S3 as parquet files. It's really reliable, costs us ~$4/month and completely configurable.