Kubernetes is correctly considered a container orchestration engine, but this characterization often results in a focus only on its ability to specifically manage applications. This overlooks its powerful ability to also automatically and dynamically scale your infrastructure to meet the effects of changing user demands on application loads. As this ability is not enabled by default, you will have to activate the Kubernetes Cluster Autoscaler function to do this. How you do this will depend on your specific cloud provider. (e.g. Alibaba Cloud, AWS, Azure or Google (GCE; GKE))

This article will explore how the Kubernetes Cluster Autoscaler functions to continuously match cluster size to match your Kubernetes workload requirements.


A cluster is a group of computing resources (nodes) that Kubernetes manages. The individual machines that comprise a cluster are, most commonly, virtual nodes that Kubernetes can target when scheduling Pods. Pods are the atomic unit of Kubernetes management and can be composed of one to several application containers that send requests for resources like CPU and memory.  Pod resource requests are processed by the Cluster Autoscaler, which will then automatically add or remove Nodes to meet demand based on system configuration specifications.  


When will the Kubernetes Cluster Autoscaler increase capacity?

When several pods remain pending due to resource shortages, the Kubernetes cluster autoscaler automatically kicks in and adds additional nodes to the cluster. You can set thresholds on the number of machines per cluster to prevent the cluster from scaling too high or too low. The diagram below illustrates the Cluster Autoscaler decision-making process when there is a need to increase capacity:

  1. The cluster autoscaler algorithm monitors for pods that remain pending.
  2. The autoscaler requests a newly provisioned node if the pending state is due to insufficient cluster resources
  3. The underlying cloud infrastructure (e.g. AWS, GCP,…) provisions a new node and is detected by Kubernetes.
  4. The Kubernetes scheduler is now able to assign the pending pods to the new node(s).
  5. If the Kubernetes cluster autoscaler still detects pending pods, the process repeats.

When will the Kubernetes Cluster Autoscaler reduce capacity?

The Kubernetes cluster autoscaler keeps track of the resource requests associated with every pod assigned to a node. Although your operations team can tune the threshold, when pod requests drop below the default 50% resource request utilization triggers a scale-down event. It is worth noting that this is based on the resource requests assigned to each Pod and not the actual resource utilization. The autoscaler will attempt to drain the node by moving any remaining pods to other nodes and, if successful, delete the node.

Before starting a scale-down event, the autoscaler runs several checks on each pod in the node to verify that the node can be drained and deleted:

  • A pod will not be deleted if its resource constraints don’t allow it to be moved. Even if other nodes are underutilized, a pod must match a node’s capabilities to be scheduled on it.
  • Removing pods must not violate your Pod Disruption Budget. As long as your PDB can accommodate a pod being deleted from the node and rescheduled elsewhere in the cluster, it will be moved.
  • If pods have local storage attached, they will not be deleted to avoid data loss.
  • If the pod is a part of a daemonset or a mirror pod, rescheduling is not necessary and the Pod will simply be deleted.

If any of these checks fail, the node cannot be drained and deleted.

An engineer has got to know their limitations

As is the case with many forms of automation, the end functionality depends on the correct setup.  Here are some things to consider when using a Kubernetes Cluster Autoscaler.

The cluster autoscaler uses metrics provided by the metrics-server to determine node utilization.  If the metrics-server is not gathering the correct metrics from all nodes, the system will not respond correctly. You can use kubectl top nodes to check available metrics for all nodes.

It is worth noting that how well your Kubernetes cluster responds to changes in load will depend on how your operations team has configured the Kubernetes system. This includes correctly setting each Pod’s resource requests. 

As the autoscaler only looks at defined resource requests and limits, rather than actual resource use, overprovisioning pods is common practice. This will result in excessive upscaling and tentative downscaling that don’t match resource use very well.

Consider that when scaling up nodes, there will be a lag that may affect service as increased latency or even downtime. Although adding and removing pods can be measured on the scale of milliseconds, with the node autoscaler we are provisioning virtual (and even baremetal) infrastructure resources. Issuing a scale-up request to a cloud provider takes between 30 seconds (up to 100 nodes) to 60 seconds (above 100 nodes). Once that request is issued, the provisioning of the new node by a cloud provider can take three to ten minutes.  Keep the potential best and worst-case scenario in mind when configuring your system.  The use of pause pods can be used to circumvent this lag but results in system overprovisioning.

Earlier, I touched on several considerations that could block the Kubernetes Cluster autoscaler from automatically scaling down even if the predefined scaledown threshold is crossed. That’s because the mentioned nodes above can contain non-transferrable pods. Bloated pod resource settings can keep an otherwise movable pod from being deleted and rescheduled. Failing to set a sensible pod disruption budget can also prevent otherwise reschedulable resources from being moved. In both cases, the result is that you keep paying for nodes that could have been deleted.

A few best practices when using Kubernetes Cluster Autoscaler

Hopefully, by now you know to set pod requests and have minima and maxima as close to actual utilization as possible. I’ll just add that having pods or containers without assigned resource requests can throw off the autoscaler algorithm and reduce system efficiency.  Beyond this fundamental principle,  there are some additional best practices to consider when using your cluster autoscaler.

  • Be aware that the official service level for the Kubernetes Cluster Autoscaler is a cluster of 1000 nodes with 30 pods per node.  If your system is approaching this scale, it is time to consider options to keep the cluster within service limits.
  • Assure that the version of the cluster autoscaler is compatible with the version of Kubernetes you are using.  This ensures that things don’t break and also that you have the opportunity to take advantage of new features to improve your autoscaling behavior.
  • For kube-system pods and any other function-critical pods, have PDBs appropriately set to avoid deletion by the autoscaler.  Doing so assures that the cluster autoscaler will not scale down your application below functional minima to avoid disruption and maintain desired availability. This works well if you utilize equivalent nodes so that critical pods do not inadvertently end up on and prevent the deletion of overly powered (and expensive) nodes. If you do use nodes of varying capabilities, the use of taints and tolerations can help assure that system-critical pods end up on appropriately sized machines.
  • As you approach larger cluster sizes, ensure your cluster autoscaler pod resource request minimum is at least 1 CPU. If other resources are being scheduled on the node with the autoscaler, make sure that the node has enough resources to manage the load. Here, again, taints and tolerations can help you limit the pressure on your autoscaler.
  • Avoid utilizing local storage for pods. If you do have applications that utilize local storage, configure your system so that those pods are preemptible.
  • Pause pods – not quite a best practice. Pause pods are low priority, preemptible pods used to reserve space so that if a higher priority pod needs the space, the pause pods can be readily deleted and the priority pod scheduled in the vacated node.  When the scheduler tries to respawn the pause pod, the lack of resources triggers the node scale-up.   The drawback to this approach is that the result is a cluster that is always over-provisioned and there is a fair bit of complexity to getting it right, especially if your system uses nodes of varying capabilities. 

Opsani can help

Configuring the Kubernetes cluster autoscaler for optimal function is complex. Opsani’s revolutionary AI-powered optimization engine easily eliminates the manual toil of setting up and maintaining cluster autoscaling function through intelligent automation. Opsani helps you discover ways to further improve the performance of your Kubernetes architecture and provide you with insights needed to keep it running at an optimal level while keeping costs manageable. For more Kubernetes help check out 10 Kubernetes Performance Tuning Tips and Best Practices.

Contact us today and let us take a look at your infrastructure and find the best ways to make it better.