Hacker News new | ask | show | jobs
by candre717 5263 days ago
I did some work with the public data sets.

The data is stored (free of charge) via ebs (look at the EC2 instance) which persists to S3 but is not visible in or directly usable from your S3 directory. If you decide to transfer the data or run computations (e.g. via emr), you'll then pay for the resources used.

I didn't find the documentation all that clear to efficiently use the public data sets, which had financial consequences.

If anyone is adept with using the public data sets, I'd love to speak with you.

1 comments

WTF? I had assumed that it was a simple sort of file access which allowed anyone in EC2 to read the data without having to import all of the storage. Then again, PubChem is only about 25 GB and inbound data transfer is free, so this is only about US$4/month.