Hacker News new | ask | show | jobs
by nitsky 812 days ago
Is the “data processing fee” any different from an egress fee in practice? Seems a little deceptive.
4 comments

At least magnetic disks are iops constrained, lower iops loads conceivably allow higher density, or packing different load patterns to the same devices. Say a 8 TB / 100 iops disk reserves 90 iops for a 1 TB a database service, that's 87% of the disk's capacity sitting free but only 10 iops to serve it with. Adding what is effectively an iops tax to discourage frequent reads is one way to make a mixture like this work (or another way to think of it - subtracting an iops discount)

Obviously example above is contrived, but same principle applies to a pool of 1000 disks as it would 1. You also don't escape this issue with regular hot storage either, there is still a (((iops * replication count) / average traffic) / max latency) type problem lurking, which would still necessitate either limiting density or increasing redundancy according to expected IO rate. This is one reason why some S3 alternatives with weaker latency bounds (not naming names, they're great but it's just not the same service) can often be made substantially cheaper, and why at least one of S3's storage classes may be implemented entirely as an accounting trick with no data movement or hardware changes at all

Yes. You can process it once to the standard tier, and egress as much as you want for free.

The differences stack up for say, a 1GB video that becomes viral and triggers terabytes in egress. You pay for 1GB, not terabytes.

It’s also an optional tier.

> The differences stack up for say, a 1GB video that becomes viral and triggers terabytes in egress. You pay for 1GB, not terabytes.

Under the condition that you actively monitor the usage and manage to "process it once" on time (and then "process it back"). Because otherwise you pay for terabytes - not in egress fees, but in processing fees. Or am I missing something?

The whole point of IA is cheaper storage that is infrequently accessed, and there is a price to accessing it. If you need / want frequent access just use the regular storage class.

All object stores out there have a flavor of IA class with an access fee that should be far lower than the storage class savings for scenarios where you would even consider using this. If you don't want or understand this cost optimization you simply don't use it.

Yes, because in a well-designed setup files that are frequently accessed would be restored to standard tier. Ideally you'd only pay the data processing fee once when files transition from infrequently accessed to frequently accessed. There's a breakeven point at a data access rate of once every two months.
Maybe the cold-to-hot migration "tax" is partially to prevent abuse?

> "Data retrieval is charged per GB when data in the Infrequent Access storage class is retrieved and is what allows us to provide storage at a lower price. It reflects the additional computational resources required to fetch data from underlying storage optimized for less frequent access."

I like the "automatic storage classes" idea as well.

> "…you can define an object lifecycle policy to move data to Infrequent Access after a period of time goes by and you no longer need to access your data as often. In the future, we plan to automatically optimize storage classes for data so you can avoid manually creating rules and better adapt to changing data access patterns."

AWS already give you intelligent tiering for this, it's a very nice product but it's also just a nice way of hiding the same fees. Your $0.004/GB becomes $0.023/GB on first read for 1 month then $0.0125/GB for 2 months, so the average cost of storing it over those 3 months becomes $0.016/GB, and that's before considering monitoring fees
You could also implement tiering yourself, depending on your workload of course. If you know you're storing objects for long-term archival reasons (or backups), you could opt for using S3 Glacier Instant Retrieval at $0.004/GB.