[ARTICLE]

Getting Ahead of the Curve with Proactive Auto Scaling

Autoscaling is a way to automate away the manual toil involved in adding or removing resources to support application performance.  There are three primary types of auto-scaling: “regular” auto-scaling or reactive auto-scaling, proactive auto-scaling, and predictive auto-scaling. In this article, we’ll have a look at why auto-scaling is useful and where each type of auto-scaling is most useful.

Traditionally, changes in load demanding the provisioning of additional application instances or infrastructure were manual and time-consuming. The general approach to avoiding performance issues and even crashes from not having enough resources to meet demand was overprovisioning the system. While overprovisioning gave IT more runway to respond to growing loads, it also resulted in idle resources that, in some cases, might be provisioned, paid for, and never even used. 

Over-provisioning

On-demand access to resources in cloud systems has provided the ability to more rapidly and reliability scale in response to changing demand. In contrast, some IT departments continued to scale systems manually. However, more quickly than with traditional infrastructure, automation – auto-scaling – has become a common and reliable way to address scaling any time of the day and more quickly than relying on engineers to do so manually. Currently, the two primary approaches are “regular” auto-scaling and proactive auto-scaling, with predictive autoscaling just starting to come online.

Reactive Scaling

When we talk about auto-scaling, the typical approach is to set a trigger based on a use limit on memory or CPU or some other performance metric of importance. This is also known as reactive auto-scaling. When a limit is exceeded, say 85% CPU use for more than one minute, automation kicks in, and additional application replicas or infrastructure is created. This process is generally configured to continue automatically up to an upper resource limit based on an auto-scaling policy definition.  Lower limit thresholds can also be similarly used to downscale resources. Automating this response to demand is especially useful in environments where there is no clear pattern to changes in demand.  With auto-scaling automation in place, the system can appropriately address changes in load, ensuring performance SLOs are maintained, costs from idle infrastructure are avoided, and on-call engineers can get some well-deserved sleep. One caution with reactive scaling is that, depending on how quickly infrastructure or application replicas come online, lags between the trigger and resource availability can cause performance issues if load increases very quickly, as illustrated below.

reactive autoscalingProactive Scaling (Scheduled Scaling)

Proactive autoscaling does not wait for a trigger and rather scales on a cycle (each weekend or after business hours) or in anticipation of an upcoming event (e.g., Cyber Monday sales events or a new product release).  Proactive scaling is appropriate when you either have data on typical and predictable load patterns that warrant up or downscaling. This can be especially appropriate when there is a startup time lag when creating an instance. If a service demand spikes quickly, it may be more efficient and performant to have the needed number of instances in place before the expected demand event.