How to Manage Requests and Limits for Kubernetes

Managing requests and limits is a fundamental step for cluster performance and application optimization.  Kubernetes’ scheduler manages the complexity of determining the best placement based on the availability of resources on individual cluster nodes.  What this looks like will vary depending on the types of nodes available and the resources required by individual applications.  

Kubernetes will do its best to make sure your system remains up and running. This is a primary function. However, default settings do not guarantee that your system is either doing a great job of using available resources efficiently.  One should also not assume that default settings will not negatively impact application performance. One way to tune Kubernetes to address both of these issues is to set requests and limits.

What are Requests and Limits?

The default compute resources that Kubernetes manages are CPU and memory. Requests and limits can be set for both.  A request defines the least amount of either resource that an application needs and will determine if a Pod can or can not be scheduled on a given node. Note that requests and limits are applied to individual containers within a Pod, but the total of all requests in a given Pod are used in aggregate to determine placement on a node.

Limits set the amount of a resource that a process (container) can use if available on the node it is running on.  If the process exceeds that limit, how Kubernetes enforces the limit differs for each resource. For memory, the entire Pod may be terminated.  For CPU, the process’ access to CPU resources may be throttled. Choosing settings appropriate for your application is key to both making sure your app has enough resources to run efficiently and to make sure that Kubernetes can efficiently pack applications on appropriate nodes without wasting resources.

How Requests Work 

So along with assuring appropriate resource allocation to make sure the application can run correctly, a resource request for a container helps the Kubernetes scheduler decide on the appropriate node on which to place a Pod. Here is an example of what this might look like in your Podspec:

So in the example above, the Pod has requests set for a single application container (a Redis database) and could be scheduled on any node that has at least 1 GB of memory available and a half CPU unallocated. So on an empty node with 4 GB of memory and one CPU, Kubernetes could schedule two of these pods.

How Limits work 

Limits ensure that any running process does not use more than a certain share of the resources on a node.  What happens differs between memory and CPU resources. In the case of a container starting to exceed its CPU limit, the kubelet will start to throttle the process. Now although the application is still running, the problem is that application performance will be degraded as its access to CPU resources is being limited.

Exceeding memory limits will result in an out of memory (OOM) event. In this case, the entire pod will be terminated. It is worth noting that with a multi container Pod, an OOM event in just one of the containers will still cause the whole pod to be terminated.  Now Kubernetes will likely respawn the Pod, but if the process again hits its memory limits, it will again be terminated.  In this case, the end result is, again, degraded performance.

Setting Requests and Limits

Because no one wants to see their application performance being degraded by running up against resource limits, both resource requests and limits are frequently best-guessed or intentionally overprovisioned. Unfortunately, this can greatly result in excessive system costs as resources are reserved by the overprovisioned pods but not being used. Optimizing by taking the time to set up a monitoring process and validating actual CPU and memory will allow appropriate requests and limit values to be set. This will avoid the performance hit of setting limits too low and is one way to achieve much better resource use (bin packing).  This information can further inform the selection of nodes for your cluster to further tune application performance. 

For completeness, there is also a hardware side of the equation, as taking the time to determine appropriate CPU and memory requirements for your application and the expected overall system scale will also help you choose the appropriate infrastructure to build your cluster with. More often than not these days, the main constraint on node sizes and features is what your service provider offers, though for many public clouds you have a fairly sizable menu of options and the ability to tune your node instances for memory or CPU performance.

Kubernetes will do its best to bin pack efficiently and setting appropriate limits and requests along with selecting appropriate hardware for your application’s needs can result in improved application performance and cost savings.  The challenge is that the combination of a constantly changing cloud environment and a large number of options for request and limit settings and node configurations, the possible options quickly become overwhelming. For an AI/ML algorithm, however, this is a manageable task.  Opsani applies AI-driven Kubernetes automation to give you back the time otherwise spent toiling to evaluate and adjust system function and lets you get back to doing more interesting and meaningful work. Opsani seamlessly integrates with Kubernetes to automate the optimization of cluster workloads with benefits including increased productivity, more stable applications, more agile processes, and more. Contact Opsani to learn more about our technology and products that can further improve your Kubernetes performance.