Hacker News new | ask | show | jobs
by glup 2956 days ago
Academic lab in computational cognitive science / computational linguistics: we haven’t transitioned fully because of storage costs. ~$10 tb/month even for infrequent s3 storage is way too much when we have lots of 10+ tb datasets. Otherwise it’s great to be able to scale compute (scale the number of machines/ cores / GPUs as necessary) and to maintain different images for different projects (NVIDIA driver, cudnn, TensorFlow version). Open to solutions for the storage problem!
1 comments

Azure Blob Storage can be way cheaper than that. https://azure.microsoft.com/en-us/services/storage/blobs/
Storing on your own hardware will always be cheaper (Backblaze has a great blog post on explaining why they built out their own data storage nodes at rented colo space because of this).

https://www.backblaze.com/blog/petabytes-on-a-budget-how-to-...

https://www.backblaze.com/blog/wp-content/uploads/2009/08/co... (Cost of a Petabyte by service vs DIY)

If the one Backblaze data center gets hit by a meteor, all your data is toast. I use BackBlaze for backups, I wouldn't trust them for primary storage.
Same with every other cloud provider. They don't provide georedundancy unless you design for it and pay for extra copies of your data to be stored.
You don't have to "design for it". The default storage class for S3 is your data is automstically copied across three data centers. You have to explicitly specify "reduced redundancy". Yes you pay for it, but you don't have to do anything special.
Not three data centers. Different zones in the same geographic datacenter. Significant difference.