[CASE STUDY]

Google’s Bank of Anthos

How Opsani Boosted Application Efficiency by Over 3X
for Google’s Bank of Anthos

Overview

The Opsani autonomous tuning service more than doubled the throughput on the Google Bank of Anthos application while simultaneously cutting cloud costs by 38%

Opsani’s algorithm efficiently searched across approximately 70 billion possible permutations, using fewer than 115 test configurations to derive settings with 3.5x more efficiency. 

The entire process was set up in less than 20 minutes with our straightforward and easy onboarding tool.

The Setup

The setup was performed by running a single command and then Opsani handled the rest:

kubectl apply -f anthos.yaml -n @@application_namespace@@

This created the servo on the Kubernetes cluster. The servo communicated with the Opsani SaaS, sending measurements of the app’s performance and receiving adjustments to make. The servo was created in four simple steps.

  • First, a ConfigMap was defined. This ConfigMap informed the servo which services were meant to be optimized, what settings were available to be tuned, and the acceptable range of values for those settings.
  • The ConfigMap then indicated which Prometheus queries needed to be made to measure Bank of Anthos throughput. The Opsani ML SaaS made decisions based on the results of these queries.
  • Next, the servo’s Deployment was created. The servo communicated with the Opsani API, sending information about the cluster and the app’s performance to the API.
  • The Opsani SaaS then decided on a resource setting to try and sent this information back to the servo.

Opsani Optimization at Work

Opsani’s optimization starts with a series of calibrating steps, ensuring that the performance of the application is stable and consistent. Then, it begins the tuning process.

Opsani creates a machine learning model of the application’s performance characteristics and performs a series of tests of the Bank of Anthos to learn the maximal efficiency configuration of the internal model. Using real data fed back after each test, this model is able to correct itself when it explores a bad configuration, and more deeply explore areas that produce improvements. 

The Opsani SaaS is able to uniquely determine the best algorithm (and any such variant) to use, including setting values for hyperparameters. In the case of the Bank of Anthos, the optimizer decided to use a variant of the Bayesian algorithm. Bayesian optimization is a sample-efficient, global black-box optimization algorithm. Since optimizing the Bank of Anthos is relatively high dimensional, having 11 tunable parameters with mostly a priori unknown relationships, black-box optimization is a sensible choice. The user does not need to make any decisions about which optimization algorithm to use, or even be aware of the different algorithms, as Opsani autonomously determines the best technique.

The Results

Below is a table summarizing the performance and cost of the Bank of Anthos before optimization and after running the Opsani optimization algorithm.

Anthos Results

The improvement of the performance-to-cost ratio from 3,500,000 to 12,500,000 is an improvement of over 250%!

Consider the complexity of the problem. The Opsani SaaS attempted to optimize the CPU and memory of 6 microservices. The acceptable ranges are presented in the table below.

vcore chart

Each setting has 8 possible values, creating a 12-dimensional problem space with 812, or nearly 70 billion, possible configurations. Finding the optimal solution is ten times as hard as trying to find one specific person on the globe. This is a significantly difficult problem to attack, and the Opsani algorithm is able to come up with its result in only a day and a half.

resource chart

This model highlights the intricacy that is required to properly tune a highly complex microservice cloud application. There is no clear pattern suggesting how to set these resource values, and there are no obvious services that require more or less resources than the others.

This model discovers that many components are over-provisioned by default and scales these back. While it is often standard to assume that each microservice is equally important and requires the same resources, these results show why that is very often bad practice. Even if this isn’t done and each service has its resources allocated individually, it is practically impossible to effectively do this. Through these adjustments, the Bank of Anthos realized significantly better performance for a significantly lower overall cost.


Want these results on your app?

If you would like to read more about Opsani or continuous optimization, read this guide. If you would like to request a free trial, CLICK HERE.