Hacker News new | ask | show | jobs
by NightMKoder 460 days ago
Why the additional SUGUSR1 vs just doing those (failing health, sleeping) on SIGTERM?
2 comments

Presumably, because it'd be annoying waiting for lame duck mode when you actually do want the application to terminate quickly. SIGKILL usually needs special privileges/root and doesn't give the application any time to clean-up/flush/etc. The other workaround I've seen is having the application clean-up immediately upon a second signal, which I reckon could also work, but either solution seems reasonable.
Yeah, there were a bunch of reasons.

Using SIGTERM is a problem because it conflicts with other behavior.

For instance, if you use SIGTERM for this then you have a potential for the app quitting during the preStop, which will be detected as a crash by Kube and so restart your app.

> which will be detected as a crash by Kube and so restart your app.

I don't think kubernetes restarts pods that have been marked for termination

We have a number of concurrent issues.

We don't want to kill in-flight requests - terminating while a request is outstanding will result in clients connected to the ALB getting some HTTP 5xx response.

The AWS ALB Controller inside Kubernetes doesn't give us a nice way to specifically say "deregister this target"

The ALB will continue to send us traffic while we return 'healthy' to it's health checks.

So we need some way to signal the application to stop serving 'healthy' responses to the ALB Health Checks, which will force the ALB to mark us as unhealthy in the target group and stop sending us traffic.

SIGUSR1 was an otherwise unused signal that we can send to the application without impacting how other signals might be handled.

So I might be putting words in your mouth, so please correct me if this is wrong. It seems like you don’t actually control the SIGTERM handler code. Otherwise you could just write something like:

  sigterm_handler() {
    make_healthcheck_fail();
    sleep(20);
    stop_web_server();
    exit(0);
  }
Technically the server shutdown at the end doesn’t even need to be graceful in this case.
Curious, which framework are you using? I've had no issues with NodeJS, Go, and Rust apps directly behind ALB with IP-Target.
I don't think it matters the framework, it's an issue with the ALB controller itself, not the application.

The ALB controller doesn't handle gracefully stopping traffic (by ensuring target group de-registration is complete) before allowing the pod to terminate.

Without a preStop, Kube immediately sends SIGTERM to your application.