Hacker News new | ask | show | jobs
by OmarAssadi 972 days ago
> The whole thing runs on regular servers. Hetzner has become our cloud of choice, along with Backblaze B2 and SQS. It is written in Go. From an architecture perspective I try to keep things simple - want folks to make economical use of their servers.

Cool, glad to see Hetzner, at least presumably for compute, rather than the almost routine, absurdly expensive, mega cloud providers.

I have a few questions if you've got time.

1. What made you pick Hetzner in particular, and did you evaluate any of their primary competitors? (e.g., OVH, etc)

2. In your $100/month figure, did you decide to go with dedicated servers or the "cloud" VPS line? If the latter, was there any particular reason over going with the bare-metal offerings?

3. Are you making use of Hetzner's U.S. servers as well or is everything currently in Europe (or vice-versa)?

4. Was there any particular reason for choosing B2 and SQS as opposed to self-hosting object-storage on the SX servers?

Normally, I wouldn't even wonder why someone wouldn't want the burden of more infrastructure. But given the choice of going with relatively unmanaged Hetzner servers, presumably self-hosting clickhouse, etc, and then with your compute provider also happening to offer fairly large storage servers on the cheap, I might've been tempted to cut out the additional providers and DIY it:

- less costly for large amounts of data

- zero lock-in [1]

- fewer companies to deal with

  - likely better negotiating power with Hetzner when the time comes if a bigger percentage of your overhead is with them as opposed to spread out across three providers

  - fewer points of failure; if the Hetzner servers are down, I would assume you're in trouble anyway, so perhaps keeping [most] of your eggs on the same network might not be as bad as it sounds

  - presumably better latency and bandwidth + the ability to communicate over a private network [2]
5. I see the license is AGPL. But I don't see the usual "you must dual-license all contributions under MIT/BSD/ISC as well [so that only we can re-license the project]" nor "before contributing, sign this agreement transferring copyright [and your first born child]".

Was this just an oversight, or do you intend to be one of the few SaaS companies that really truly is open-source rather than "open-source" [until peopled are locked-in] and then going "open"-core? If the latter, then awesome -- cool to see.

6. Any regrets, disasters, or lessons learned so far? Usually, I find these stories the most interesting but unfortunately too few are willing to share.

---

[1]: I know B2 provides a relatively standard, at this point, S3-compatible API and everything as well. But I think there is also still something to be said about a somewhat Juche-esque approach to infrastructure, wherein should prices rise, contracts change, service degrades, or whatever else, you'd have the ability to almost immediately switch at a moment's notice to literally anyone else who can lease you a box with some hard drives or any colo provider.

[2]: This goes out the window somewhat if you're using the VPS line and American servers, though.

1 comments

Really interesting question(s)! Will try to answer the gist.

First, the system is designed for users to be able to drop in whatever hardware they want. So your questions are around my initial deployment options. These will certainly change in the future.

I chose Hetzner because of cost. I will probably end up using other providers in the future but Hetzner let me begin this experiment without burning my runway.

I use metal servers for Clickhouse, and small cloud boxes and LBs for the API. It happens I'm using US-based cloud servers because that's where my users are.

I'm using B2 + SQS because I did not want to take on sysadmin for those components. They are not performance-sensitive. Using B2 at $0.006/GB vs managing minio on an SX server at $0.001/GB was acceptable to me :)

Why use regular servers for the API instead of fly.io? Because the API writes data to disk, and then bulk loads it to clickhouse. This means I needed durable and reliable disk, which you can only have with actual VMs. I didn't want the process to be randomly SIGKILL'd since the process shuts down safely to avoid data loss, and I didn't want the risk of ephemeral storage. So to control all this I have to run the HTTP servers myself.

I might experiment with a PaaS for this, but it was easy enough to just set up an init script to run the daemon.

re: negotiating power - that conversation only happens when I have volume, at which point I will be able to negotiate with any provider.

re: points of failure/latency - this will continually change: the deployment strategy for today will be different than when I'm managing 100s of TB of data across thousands of users. The main focus is to make the system flexible to handle different topologies and to be able to change providers by updating a configuration.

Thank you!