| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ex_amazon_sde 2106 days ago

> Is it anymore complex than all the old ways?

> Was Apache, Asterisk, or loading and hardening a Linux host on bare metal easier?

Yes, and by far. Adding a layer on top of all the traditional Linux daemons, tools and libraries does not decrease the total complexity - quite the contrary.

When you have a bug in an application that is related to something in on another layer you have to walk through the whole stack.

Examples: A bug in a network card impacting only large UDP packets. A race condition of file access triggered by NFS or a storage device driver. A vulnerability based on a timing attack due to CPU caches.

The deeper the stack, the worse.

1 comments

doteka 2106 days ago

But wouldn’t the point be that you don’t care about hardware level problems anymore? When I find a node with issues, I can just delete it from the pool and get a fresh one back. The bad network card? That’s for Google/Amazon/DigitalOcean to deal with.

I find the bog-standard Prometheus chart provides me a pretty incredible level of monitoring out of the box, usually it’s pretty easy to pick the bad one out of a graph.

Running your own VMs without something like k8s? Yeah this setup I can deploy and have working in an hour is gonna take you a week to set up properly. Standardization is valuable. Abstraction is valuable.

ex_amazon_sde 2105 days ago

> But wouldn’t the point be that you don’t care about hardware level problems anymore?

No. Read my post again: I did not wrote about hardware issues.

Most work around optimization, reliability or security require digging through the whole stack sometimes down to the kernel.

> When I find a node with issues, I can just delete it from the pool and get a fresh one back.

However, a lot of k8s deployments are on-premise, where you have to debug your own hardware.

> The bad network card? That’s for Google/Amazon/DigitalOcean to deal with.

First you have to pinpoint the root cause of that glitch affecting all the containers running on VMs using the same bad drivers. Often it could be the same in 50% of your fleet.

> is gonna take you a week to set up properly.

Most certainly not. I've been deploying large production fleets in minutes since 2005.