Hacker News new | ask | show | jobs
by throwaway858 1356 days ago
The one big thing missing from dedicated hosts is an S3 equivalent. Sure, you can get a huge hard drive for cheap but this will not have the durability requirements for storing your precious data.

And if you try to use AWS just for S3 then you will pay a lot extra for the bandwidth charges of bringing the data from S3 to your server (something that is free if you were to use EC2 or other AWS services).

3 comments

Lots of web host also have S3 compatible equivalent offerings. They typically market it as "object storage".
That same logic works up and down the line. If what you need is disk storage, then you are limited to dedicated hosts that provide something akin to S3. There are some! But that's also true when you want a database that isn't sqlite. Now you need a dedicated host that provides something akin to DynamoDB, and will manage it for you. Then you decide you need queueing, and you can either install and manage that yourself or look for a dedicate host that provides something akin to SNS/SQS. And so on...
Right -- that's the one implementation detail in the OP that was interesting. It sounds like they ultimately used MinIO to replace S3. I've seen people use Ceph, but it's apparently a nightmare to operate a Ceph cluster. If you're on k8s I think the "cloud native" way might be Rook, haven't looked into that. Anyway, running an object store is painful.

Their notes here are a bit vague:

> When the migration reached mid-June, we had 300 servers running very smoothly with a total 200 million cached pages. We used Apache Cassandra nodes on each of the servers that were compatible with AWS S3.

> We broke the online migration into four steps, each a week or two apart. After testing whether Prerender pages could be cached in both S3 and minio, we slowly diverted traffic away from AWS S3 and towards minio. When the writes to S3 had been stopped completely, Prerender saved $200 a day on S3 API costs and signaled we were ready to start deleting data already cached in our Cassandra cluster.

> However, the big reveal came at the end of this phase around June 24th. In the last four weeks, we moved most of the cache workload from AWS S3 to our own Cassandra cluster. The daily cost of AWS was reduced to $1.1K per day, projecting to 35K per month, and the new servers’ monthly recurring cost was estimated to be around 14K.

It says (briefly, in passing) that they used Cassandra to implement the S3 API for their nodes, but maybe just to replicate the S3 API that they were previously using? That's an interesting choice I'd not heard of before. Perhaps all of their individual files are quite small?

Then they moved to MinIO, which would be the S3 equivalent that you are looking for.

Well, their layout is essentially a map from url to html, so cassandra would work well here.

MinIO is AGPL-3 though or commercial license. Pretty sure using it as cache would be considered combined work?