[ARTICLE]

Kubernetes Cost Optimization Best Practices

Cost savings is one of the biggest drivers behind the adoption of cloud systems. As companies increasingly adopt cloud-based technologies such as Kubernetes, to support efficient and agile operations. Some are unexpectedly seeing costs grow. This can occur for a number of reasons. A key problem is the tendency to pad application and infrastructure resource allocations. This is done in an effort to avoid instability in dynamic cloud environments. Over-provisioning made sense with a traditional non-elastic infrastructure. However, cloud-native application orchestrators like Kubernetes have built-in features that allow systems to automatically respond to environmental changes. Over-provisioning CPU and memory incur costs that may not provide actual value most of the day. 

Kubernetes scaling models

Kubernetes provides several scaling models. The Kubernetes Cluster Autoscaler will scale the number of worker nodes in a cluster in response to workload.  This allows the cluster to automatically grow when demand increases. This is to maintain the performance or the cluster can scale down as demand drops. This is done to avoid paying for unnecessary nodes. 

Kubernetes autoscaling

Kubernetes also provides two pod autoscaling options: the Horizontal Pod Autoscaler and Vertical Pod Autoscaler.  The HPA simply adds or deletes pod replicas in response to CPU load. The VPA scales pod size in response to load – specifically adjusting CPU and memory requests and limits. The ability to “right-size” resources, up or down, again provides for efficient resource use.

While these tools will keep your app running smoothly, they are not particularly cost-focused. The container/Pod requests and limits feature is one place where optimization can pay off serious dividends for both performance improvement and cost reduction. Pod request settings (CPU and memory) define the minimum requirements for resource availability on a node in order for Kubernetes to deploy a Pod on that node. Limits define resource maxima. If exceeded, can result in Kubernetes throttling the Pod (CPU) or terminating it (memory).  

As you can probably guess, over-provisioning requests and limits is common. This is because developers do not want to deal with the pain guessing too low and seeing performance issues. The unfortunate corollary to bloated, but performant Pods is more resources are required to deploy these applications. Thus, increased overall cost. It becomes harder for Kubernetes to bin pack nodes with large Pods. Also if you have autoscaling enabled, may end up with clusters typically running on the large end of the scaling range. Again, paying for nodes not being efficiently used is an unnecessary cost.

How can you solve this?

Opsani solves this challenge by using machine learning algorithms to discover the actual application resource requirements. It then passes them on to Kubernetes to apply to the environment. By rightsizing the Pods that are deploying your applications, Kubernetes can more efficiently bin pack those Pods onto fewer nodes.  Performance is maintained, costs are dramatically reduced.

With the improved understanding of actual resource needs, it then becomes possible to adjust the types of nodes that are being provisioned in the cluster. This is done to better fit the known application requirements.  This also opens up the opportunity to take advantage of the cost savings of reserved instances. Which typically come with substantial cost savings.

Cost savings can be achieved by enabling Kubernetes scaling function and taking advantage of the right node types. Appropriately tuning resource use is also  foundational to both effective and efficient scaling and being able to target the ideal node type. It is possible to tune application performance manually. Although, this is only realistic for very simple applications. Today’s cloud-native micro-services applications have multiple services that all have individual requirements.  

Opsani automates this tuning process and quickly finds the optimal configurations. They are then implemented by Kubernetes. The cost savings can be substantial. Our case study of Google’s Online Boutique (formerly the Hipster Shop) achieved an 80% reduction in cloud costs. The Opsani optimization also doubled performance and provided an 800% improvement in efficiency. If you’d like to see what Opsani can do for yourself, check out the free trial.