Loadbalancer health checks in EKS

Todays post is about setting up load balancer health checks for a Kubernetes Cluster (EKS) within AWS. The setup consists of an Application Load Balancer, an Auto Scaling group and a target group which contains the Kubernetes worker nodes.

The diagram shows a overview of the setup we aim for:

flowchart TB ALB[Load Balancer]--Redirects traffic to-->ide1 asg[Autoscaling]--Considers ALB Health-Checks-->ALB subgraph ide1 [Target-Group] hs[Health-Check]--Port-30003-->n1 hs[Health-Check]--Port-30003-->n2 n1[EKS-Node-1]-->POD-echo-server-1 n2[EKS-Node-2]-->POD-echo-server-2 end

The health checks configured in the target group did not work with the default configuration and reported unhealthy. With the result that traffic is routed to all targets. By default, the autoscaling group just considered the EC2 metrics to check the health and, in case, replaces an unhealthy instance.

Somtimes we ran into the case that the EC2 metrics were fine but the pods could not start on a specific instance until we rebooted or replaced the instance.

To automatically replace an instance in such cases we decided to configure additional load balancer tests to ensure that the Auto Scaling group can determine instance health.

To set up load balancer health checks with a kubernetes cluster we did the following steps:

Provide a pod in the cluster for the health check

We cannot use any kind of Service for health checks, because services work like an internal load balancer. So it is not guaranteed that the health check sent by the load balancer to a node reaches that specific pod on the node.

To ensure that a health check sent by the load balancer to a specific node within the target group reaches that specific node, we deployed a simple echo server as Daemonset with the hostPort 30003

Target group health checks

The hostPort of the Daemonset has to be configured in the target group in the “Health checks” tab as port. Configure the other settings like healthy/unhealthy threshold or interval to your demands.

Add Elastic Load Balancing (ELB) health checks to an Auto Scaling group

Use the following procedure to add Elastic Load Balancing (ELB) health checks to an Auto Scaling group see AWS docs:

  • Choose your Autoscaling Group
  • On the Details tab, choose Health checks, Edit.
  • For Health check type, select Enable ELB health checks.
  • Health check grace period (The amount of time until EC2 Auto Scaling performs the first health check on new instances after they are put into service.) => We set this value to 5 minutes to give the instance time to integrate into the kubernetes cluster and start the pod used for the health check.

Your Application Load Balancer periodically sends requests to its registered targets to test their status by default. You have to adjust your existing security groups of the loadbalancer/nodes to allow traffic from/to the new hostPort of the echo server.

Now everything should work as expected. While testing i changed the daemonset to a deplyoment and scaled it down/up to trigger a instance replacement due to failed health checks.