Hacker News new | ask | show | jobs
by hodgesrm 2124 days ago
Sure! Sorry to be so obscure, it was not a good explanation. To take the above example, let's say you have a database with 1TB of tabular data in Amazon.

1. You start out storing it on Amazon gp2 Elastic Block Store, which is fast block storage available on the network. It costs about $0.10 US per month per GB, so that's $102.40 per month.

2. Data (sadly) has a habit of getting destroyed in accidents so we normally replicate to at least one other location. Let's say we just replicate once. You are now up to $204.80 per month.

Now we have a couple of ways of reducing costs.

1. We could make the block storage itself cheaper thanks to inside knowledge of how it works plus clever financial engineering. However, the _most_ that can get us is about 5x savings, because prices for similar classes of storage are not that different. The real discount is more like 2x if we want to make money and be reasonably speedy. You likely have to do engineering work--like implementing blended storage--for this latter approach, so it's not free. So, we're back to $102.40 per month.

2. Or, we could build a better database.

2a.) Let's first build a database that can store data in S3 object storage instead of block storage. Now our storage costs about $0.02 per GB per month. Plus S3 is replicated, so we can maybe just keep a single copy. We're down to $10.28 per month but we had to rewrite the database to get it, because S3 behaves very differently from block storage and we have to build clever caches to work on it.

2b.) But wait! There's more. We could also arrange tabular data in columns rather than rows, which allows us to apply very efficient compression. Let's say the compression reduces size by 90% overall. We're now down to just $1.03 per month. Again, we had to rewrite the database, but we got a huge savings in return, like 100x.

The moral is that clever arrangement of data just about always beats financial shenanigans, usually by a wide margin. The primary reason that Amazon has done well in data services like Redshift and Aurora is partly that they have been extremely smart about data services, not any inherent advantage as platform owners.

Edit: fixed math error