Hacker News new | ask | show | jobs
by michaelgv 2853 days ago
As someone who just recently CDN hell, and rebuilt our entire CDN network from the ground up (software and hardware), I was wondering why you picked RoR?
2 comments

It’s what I know best and what I’m most productive in. The project is to get something running and learn a handful of new things, and learning a new framework would be a detriment to that first goal.

The manager app is not in the hot path with this design so performance doesn’t matter all that much.

Are you designing this CDN to pull from origin, cache temporarily? Or to pull from local file and put strong cache on it?

If you need a hand let me know, I’ve built pretty large CDNs before (10M r/s at peak)

The former to start but I want to add push zones and/or “s3sync” zones that proactively sync an s3 bucket to local disk.

Thanks for the offer! I might just take you up on it :)

Just be careful, understand that if you do a PULL only CDN, you're not going to gain big benefits. If you do want a pull only CDN, have a background task runner to retrieve the files, and update them locally.
> understand that if you do a PULL only CDN, you're not going to gain big benefits.

This statement makes no sense. A CDN edge node is just a cache; its size and your access patterns determine the hit ratio.

At $dayjob we get Nginx cache hit ratios on our edge in excess of 99% for “an origin fetch” setup. That is a very large benefit.

Cloudflare works entirely on origin fetch. They seem to be doing okay.

Sure. I have Nginx set to keep files around for s long time and serve stale and refresh in the background, but proactively refreshing periodically is a good idea.
What would you suggest and why? Not a loaded question for all the people eager to downvote.
Personally I would build it in something like Go. I've done a lot of work in Rails and I would probably have the signup/profile/interface built in Rails 5.2 but use a high performant go framework for the really intensive stuff.

I've been considering building my own but getting an up and running gossip protocol to have it share data between nodes isn't the easiest thing in the world to code.

Its a pain to code. I’ve done it, and I hated every second of it. Keeping data in sync with dynamic data in near real-time is terrible.

I wrote the CDN in Go, with Redis and a smaller go-powered daemon to retrieve assets every 20 seconds, sync them to a local storage drive, and after 5 days retrieve again - or, if there is no requests within 48 hours, clear the unused items.

Then I setup a system that if one edge requests an “unpopular” file, it’ll ping a simple REST API and have all the other edges pull that file, this allowing the edges to stay “one step ahead” of the user load

Yeah, when thinking it through personally it comes down to a hard math problem. Because you have to maintain the state of the local files, whether they should live in memory vs ssd vs another node. Did you use an LRU cache for expunging less utilized resources?
State is much less important to track. It’s easier to do, the real challenge is garbage collection - you need it, but you don’t want to collect too much in memory. That’s why Redis is a great tool for our edge servers.
And nothing has made me realize just how slow the speed of light is until I started looking into the CAP theorem and distributed databases like CockroachDB.