Hacker News new | ask | show | jobs
by setquk 2917 days ago
I’m starting to favour buying physical rack space again and running everything 2005 style with a light weight ansible layer. As long as your workload is predictable, the lock in, unpredictability, navigation through the maze of billing, weird rules and what-the-fuckism you have to deal with on a daily basis is merely trading one vendor specific hell for another. Your knowledge isn’t transferable between cloud vendors either so I’d rather have a hell I'm totally in control of and of which the knowledge has some retention value and will move around vendors no problems. You can also span vendors then thus avoiding the whole all eggs in one basket problem.
5 comments

Hybrid is what you are looking for. Have a rack or two for your core and rent everything else from multiple cloud vendors, integrated with whatever orchestration you are running on your own racks (K8s? DC/OS? Ansible?).
Or just two DCs in active/active.

Still works out cheaper for workloads than AWS does even factoring staff in at this point.

AWS always turns into cost and administrative chaos as well unless it is tightly controlled which in itself is costly and difficult the moment you have more than one actor. GCP probably the same but I have no experience with that. Very much more difficult to do this when you have physical constraints.

Two man startup, perhaps but I think the transition should go:

VPS (linode etc) for MVP, colo half rack, active/active racks two sites then scale out however your workload requires.

More importantly, there is a wealth of competent labor in the relatively stable area of maintaining physical servers (both on the hardware and software side). The modern cloud services move fast and break things, leading to a general shortage of resources and competent people. As a business, even if slightly more expensive initially, it makes more sense to start lower and work up to the cloud services as the need presents itself.
You can federate Kubernetes across your own rack and one or more public cloud providers.
You can but that’s another costly layer of complexity and distribution to worry about.

One of the failure modes I see a lot is failing to factor in latency in distributed systems. Mainly because most systems don’t benefit at all from distribution and do benefit from simplification.

The assumption on here is that a product is going to service GitHub or stackoverflow class loads at least, but literally most aren’t. Even high profile sites and web applications I have worked on tend to run on much smaller workloads than people expect. Latency optimisation by flattening distribution and consolidating has higher benefits than adopting fleet management in the mid term of a product.

Kubernetes is one of those things you pick when you need it not before you need it. And then only if you can afford to burn time and money on it with a guaranteed ROI.

Sure. The idea is that you get the benefits of public cloud and cost savings of BYO hardware for extra capacity at lower cost. Of course, you're now absorbing hardware maintenance costs as well. I haven't seen a cost breakdown really making a strong case one way or the other, but my company is doing it anyway.
Have you actually done this, or are you repeating stuff off the website? Because everyone I've talked with about kubernetes federation says it's really not ready for production use.
The approach we have taken is to create independent clusters with a common LoadBalancer.

Basically, the LB decides which kubernetes cluster will serve your request and once you're in a k8s cluster, you stay there.

You don't have the control-plane that the federation provides and a bit of overhead managing clusters independently, but we have automated the majority of the process. On the other hand, debugging is way easier and we don't suffer from weird latencies between clusters (weird because sometimes a request will go to a different cluster without any apparent reason <-- I'm sure there's one, but none that you could see/expect, hence debugging).

My people's time is more important than your complex system.

Ha. It's in process. Not ready yet. I'll report back if we fail miserably.
Federation v1 is legacy now. The new architecture is called MultiCluster and designed to work on top of K8S rather than having a leader cluster: https://github.com/kubernetes/community/tree/master/sig-mult...
That's exactly what we are thinking too. We've looked HARD into AWS/GCP/Azure, but for all the reasons you mentioned we don't want to go that route. Owning the entire stack is so much cheaper, both money and time wise.
Have you looked at OCI bare metal shapes? [1] Oracle Cloud provides the server, and you control the stack end to end (including the hypervisor).

If you run into an issue, send me a note and I will get someone to reply to your issue.

1. https://cloud.oracle.com/compute/bare-metal/features

This needs more upvotes