|
|
|
|
|
by Denzel
1238 days ago
|
|
Thanks for the response. I was responding to the comment more so than advocating for adjusting your log retention. :) Looking forward to part 2. Are you able to reconcile some of the numbers and calculations in the article for me? (Understanding that you don't want to reveal any confidential info.) I see: - 31 PB data + 10 PB application logs = 41 PB logs (uncompressed json) costs 7-figures (say ~$5M) - 41 PB logs * 5% ORC compression = ~ 2 PB logs (compressed ORC) costs low 6-figures (say ~$300k) I don't know what timeframe that cost is measured over. But that brings us to $300k / 2 PB = $0.15 / GB which is far above S3's quoted costs so I must be missing something. |
|
I reckon you may be looking at the monthly cost of storage per gigabyte which is why the number doesn’t seem to make sense. Our retention policy started off at about 2 years, so the remaining lifetime per file amortizes out to much more than 1 month.
Also worth considering that we have a custom AWS contract, so none of our actual numbers are the publicly advertised rates and probably won’t entirely math out if you try to ballpark with those numbers.