As someone who just recently CDN hell, and rebuilt our entire CDN network from the ground up (software and hardware), I was wondering why you picked RoR?
It’s what I know best and what I’m most productive in. The project is to get something running and learn a handful of new things, and learning a new framework would be a detriment to that first goal.
The manager app is not in the hot path with this design so performance doesn’t matter all that much.
Just be careful, understand that if you do a PULL only CDN, you're not going to gain big benefits. If you do want a pull only CDN, have a background task runner to retrieve the files, and update them locally.
Sure. I have Nginx set to keep files around for s long time and serve stale and refresh in the background, but proactively refreshing periodically is a good idea.
Personally I would build it in something like Go. I've done a lot of work in Rails and I would probably have the signup/profile/interface built in Rails 5.2 but use a high performant go framework for the really intensive stuff.
I've been considering building my own but getting an up and running gossip protocol to have it share data between nodes isn't the easiest thing in the world to code.
Its a pain to code. I’ve done it, and I hated every second of it. Keeping data in sync with dynamic data in near real-time is terrible.
I wrote the CDN in Go, with Redis and a smaller go-powered daemon to retrieve assets every 20 seconds, sync them to a local storage drive, and after 5 days retrieve again - or, if there is no requests within 48 hours, clear the unused items.
Then I setup a system that if one edge requests an “unpopular” file, it’ll ping a simple REST API and have all the other edges pull that file, this allowing the edges to stay “one step ahead” of the user load
Yeah, when thinking it through personally it comes down to a hard math problem. Because you have to maintain the state of the local files, whether they should live in memory vs ssd vs another node. Did you use an LRU cache for expunging less utilized resources?
State is much less important to track. It’s easier to do, the real challenge is garbage collection - you need it, but you don’t want to collect too much in memory. That’s why Redis is a great tool for our edge servers.
And nothing has made me realize just how slow the speed of light is until I started looking into the CAP theorem and distributed databases like CockroachDB.
The manager app is not in the hot path with this design so performance doesn’t matter all that much.