Horizontal pod autoscaler

Kubernetes (also known as K8s) has been helping companies in the IT industry step up their game. This open-source system automates deployment, scaling, discovery, and management of applications by grouping containers into logical units. Kubernetes combines 15 years of experience from running production workloads in Google and best-of-breed community ideas and practices. Among the services K8s offers is the Horizontal Pod Autoscaler.

While the Horizontal Pod Autoscaler is an extremely useful service, there are certain issues about it that frustrate engineers and DevOps teams. Read on below to find out what these issues are and how you can optimize and further extend beyond its capabilities.

What is a Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (or HPA) is a service offered by Kubernetes that allows users to scale up and down the number of pods when certain metrics indicate that an adjustment is needed. This scaling takes place based on the target value. Metrics are created by, for example, CPU utilization and work by creating more pods to distribute load if CPU utilization is busy.

The HPA uses CPU metrics as default for knowing when to scale things up and down. However, it is possible to use other metrics besides CPU. These can include custom metrics – meaning, the metrics being used are still in the pod – or external metrics, which are outside of the cluster, and include some adapters. But there is a need to be walked through how to use it, because external metrics are not straightforward and not well documented. It can, however, be set up with other metrics, albeit with effort. 

How do I know when to scale up or down?

In order to know when to scale up and down, customers must set a target for the Horizontal Pod Autoscaler. The HPA then uses this performance target to create or destroy pods. If this target is surpassed, the HPA creates new instances. But if it drops below that set target, it destroys the instances. The Horizontal Pod Autoscaler does this by looking at a sensitive “watermark” to indicate whether the scale should go up or down.

Just how sensitive is this ‘watermark’? 

If set too low, the pods will get to work and normal operations will eventually overwhelm the target. The “watermark” will then try to scale up and run numerous pods, and now you’re running at maximum capacity all the time. “Watermark” is my own term; Kubernetes says target.

On the other hand, if the target is set too high, normal operations will not be able to scale up. This results in a very high load, but to the Horizontal Pod Autoscaler, it’s “not high enough,” so it is not worth it to scale up. Consequently, the autoscaler will always be running at minimum pod numbers. 

In order to set an appropriate value for your target, you have to understand how your app behaves.


Does this HPA scaling cause operational problems? 

It does, on two points: 

1. During the Horizontal Pod Autoscaler scale up, there is a lag between the response times of the old pod and the new pod. A scale up of the pod doesn’t immediately address that increased load to the old pod.

Once the new additional pod gets deployed, requests will start coming in, but they will still take time to run the service. During that time, the old pod is still taking the heavy beaters, and this forces the old pod to serve more than its capacity can handle. The performance, then, suffers, which affects the end user experience. Service latency increases, causing the page to take much longer to load. 

2. To solve this problem, HPA users resort to setting the target lower. This way the pod will automatically scale up higher, so they will always have spare pods in case of upswings. But leaving your Horizontal Pod Autoscaler at high capacity means you are consuming more resources than actually needed, and that overprovisioning gets reflected in your cloud bill. Out of the frying pan, and into the fire.

Overall, the three main issues caused by the Horizontal Pod Autoscaler are:

  • The difficulty of defining resources beyond CPU utilization;
  • Setting the target (“watermark”) value correctly, and;
  • The inefficiencies caused when you’re on the upswing or downswing. 

These issues have a major impact on enterprises, and a solution that can help automate HPA optimization would be welcomed as nothing short of a miracle.

This is where Opsani comes in. Our AI-driven Cloud Optimization tool efficiently determines the most cost-effective settings for your applications, and even on your Horizontal Pod Autoscaler. Find out more about how Opsani does that here.