| Really interesting question(s)! Will try to answer the gist. First, the system is designed for users to be able to drop in whatever hardware they want. So your questions are around my initial deployment options. These will certainly change in the future. I chose Hetzner because of cost. I will probably end up using other providers in the future but Hetzner let me begin this experiment without burning my runway. I use metal servers for Clickhouse, and small cloud boxes and LBs for the API. It happens I'm using US-based cloud servers because that's where my users are. I'm using B2 + SQS because I did not want to take on sysadmin for those components. They are not performance-sensitive. Using B2 at $0.006/GB vs managing minio on an SX server at $0.001/GB was acceptable to me :) Why use regular servers for the API instead of fly.io? Because the API writes data to disk, and then bulk loads it to clickhouse. This means I needed durable and reliable disk, which you can only have with actual VMs. I didn't want the process to be randomly SIGKILL'd since the process shuts down safely to avoid data loss, and I didn't want the risk of ephemeral storage. So to control all this I have to run the HTTP servers myself. I might experiment with a PaaS for this, but it was easy enough to just set up an init script to run the daemon. re: negotiating power - that conversation only happens when I have volume, at which point I will be able to negotiate with any provider. re: points of failure/latency - this will continually change: the deployment strategy for today will be different than when I'm managing 100s of TB of data across thousands of users. The main focus is to make the system flexible to handle different topologies and to be able to change providers by updating a configuration. Thank you! |