Adaptive Tuning for Load Profile Optimization

Initial application settings are generally derived from experience with similar systems’ performance or overprovisioned to head off anticipated performance bottlenecks.  Once an application is running and actual performance metrics are available, it becomes possible to tune parameters to more appropriately assign resources to balance performance requirements and cost. In simple, stable systems, this cycle of measure, evaluation, and improvement are relatively straightforward to apply. 

To break this basic tuning steps out more explicitly:

  1. Establish values that provide minimum acceptable performance (a Service Level Objective (SLO))
  2. Collect metrics on system performance
  3. Identify the part of the system that is limiting performance (e.g., CPU, memory)
  4. Appropriately adjust the part of the system, causing the bottleneck.
  5. Again collect metrics on system performance.
  6. If system performance improves, keep the modification; if it degrades performance, reverts or tries a different adjustment.

While relatively simple to apply in simple and stable systems, as system complexity increases, the number of potential performance-impairing bottlenecks increases as the overall performance depends on inter-service interactions. Process automation becomes important as relying on human intervention to maintain SLOs becomes overwhelming and may not adjust system performance quickly enough to meet SLOs in a dynamic environment. 

Cloud computing systems and the common microservice architectures of cloud-native applications have the ability to automatically scale to maintain performance SLOs in the face of variable loads.  Increasing loads can trigger the system to scale up resources to maintain performance. Decreasing loads can trigger a scale-down of resources to levels that still maintain performance and remove the cost burden on idle resources.

Database and big data applications have been at the forefront of understanding and automating the process of adaptive tuning. Herodotou and collaborators identify six approaches to performance optimization: 

  • Rule-based –  An approach that relies on the system behaving as based on prior experience with similar systems.  This does not rely on metrics/logs or a performance model; this will provide initial settings to get started but is unlikely to provide optimal performance settings.
  • Cost modeling – An analytical (white-box) approach based on known cost functions and understanding of the system’s internal functions.  Some form of the performance metric is required to develop the predictive model.
  • Simulation-based – A model to predict performance is generated from a set of experimental runs that simulate load scenarios (e.g., using a load generator) and evaluating optimal parameter settings. 
  • Experiment-based – Search algorithm-led experiments with varying parameter settings are used to identify optimal settings.  
  • Machine learning-based – A black-box approach that generates predictive performance models that do not rely on internal system functionality knowledge.  
  • Adaptive – Configuration parameters are tuned on a running application using any number of the methods listed above.

While any one of the above approaches can be used to tune a system’s performance, doing so effectively will likely leverage several approaches as the “Adaptive” category suggests.  While rule-based methods can be a quick and dirty way to provide initial conditions and, if those rules include the ability to adjust the application resources in response to workload changes (e.g., Autoscaling thresholds), the result is an adaptive system.  Combining AI methods with rules-based methods can improve adaptability by adding a predictive ability level (e.g., AWS Predictive Autoscaling for their EC2 service).  Indeed, combining rule-based with ML-based approaches best addresses the need to adapt to both changing workloads and systems changes.

While rules-based auto-scaling can adapt to workloads changes, the next question you may wish to ask is whether the application profile is configured optimally. As the application and possibly the supporting infrastructure are scaling in response to load, are resource settings such as CPU, memory, and network configured to perform as load requirements change optimally? The challenge here is that adaptively tuning your system configurations becomes exponentially more complex as you keep adding tunable parameters.

Increasingly, the ML approach to adaptive tuning is becoming not only possible to apply but is almost a prerequisite to achieving true optimization. Peter Nikolov, Opsani‘s CTO and co-founder, gave a presentation “Optimizing at Scale: Using ML to Optimize All Applications Across the Service Delivery Platform (Las Vegas 2020),”  in which he pointed out that one application that had eight settings for two tunable parameters for twenty-two component service would have an 822 (74 quintillion) possible tuning permutations. This is outside of the scope of the human ability to search for a truly optimum, but, in this case, the Opsani machine learning algorithm was able to rapidly search and identify the settings that provided both optimal performance and lowest cost.

If we now add considerations of variations in the workload itself, effective adaptive tuning with the machine learning approach starts to need not just an adaptive but also an autonomic approach.  Oracle’s Autonomous Database and Opsani’s Continuous Optimization service are examples of continuous adaptive tuning in action.  The ability to appropriately respond to changes in the system without human intervention removes the drudgery or toil from searching and implementing (hopefully) optimal configuration settings; it also greatly reduces response time in applying the appropriate optimum.

The six categories of performance optimization can be viewed as an evolutionary approach to adaptive tuning. Rules-based approaches will get you up and running and can be applied without any actual performance.  With the increasing ability to get performance metrics and apply appropriate modeling techniques, discerning and applying performance-improving changes become more rigorous and complex.  Eventually, applying machine learning approaches to evaluating data and automating tuning the system allows the rapid discovery of optimal settings even in the face of changing workload, application, and system requirements.

If you would like to experience AI-driven application optimization for yourself, Opsani offers a free trial.