| So this is something I/we have a bit of experience in [1]. Firstly, I'd test what level of on-prem they are really asking for, because there are various shades of this: - servers actually in their office - their own servers in a datacenter (DC) near their office - someone else's servers of which they are the only users, in a DC anywhere - deployment into their own AWS account Which one they actually need has an effect the kind of options available to you. > From experience, is it really necessary or does a bigger server do the trick? In our experience big servers do indeed to the tick. And if your
costs are minor on Lambda/SQS/etc, then I imagine any new server is going to have 10x-100x the capacity you actually need. > 3. Worse service: can SQS/SNS/lambda easily be replaced without feature loss? Broadly speaking: yes. SQS can go to Redis/Valkey queues (ideally), or something heavier if needed. SNS also has good options, depending on the features you are using (FanOut -> Redis Streams, or Kafka; Email -> pick your provider). Lambda I would just replace with Kubernetes pods, but I'm a Kubernetes guy so hey. > Also I am talking about on-premise but is it the best solution to mitigate the risk of service interruption? I think it depends on what you are deploying. It will be easier for you to centralise more most likely. What we (business hat on now) would suggest would be a Kubernetes-managed cluster of physical servers, with the ability to isolate clients to their own server(s). This runs your enterprise clients separately from your public deployment, makes it easy to scale to new enterprise customers, easy to scale when individual customers grow, and provides easy migration of clients between servers in cases of hardware failure. You could also offer your customers a HA option which runs your software over three servers, with HA failover of services. Our company would then run it all for you, but you'd still have full access. (Removes business hat) [1]: https://lithus.eu |
I can tell that the on-premises will be deployed on AWS accounts. We can manage the resources ourselves.
We have few fanouts that can be refactored. So Redis/Valkey for SQS is OK, hopefully it can also cover our SNS use case.
I am afraid Kubernetes is overkill for our lambda needs.
If we manage to bundle our whole app in one server and have only 1-2 clients on -premise, do you still suggest Kubernetes or a simpler rsync on all servers is enough?
Also, should we have a separate database instance for each client, or a Postgres cluster sharded? The latter seams more manageable.