Hacker News new | ask | show | jobs
by tienshiao 5930 days ago
I'd like to think users with large data sets are already deduping and/or compressing their data. Of course, the users still win by eating into the margins of the company performing the arbitrage.

Deduping and compression at a higher level may be easier though less efficient. I had built a prototype mail system using a twisted POP server, qpsmtpd, and S3 as a mail store that deduped and compressed email bodies.