Autoscaling Your Cloud

Rightsizing an application is hard. Trying to save on infrastructure costs by running small instances can that then lead to an application crash when the load exceeds capacity is not acceptable to most users, developers, or businesses (figure 1).  To avoid this, many applications are overprovisioned, running on much more infrastructure than needed, to avoid failure when workloads grow. The result, however, is unnecessary spending on idle resources during periods of low demand (figure 2).


overcapacityAn example of an underprovisioned infrastructure where the load (blue bars) exceeds available capacity (light blue). How an application responds will vary, but degraded performance or a complete application crash are possible outcomes.

unused capacityAlthough the application load (dark blue) never exceeds capacity (light blue) and protects application performance. However, on the days where the load is lower than the capacity, unused infrastructure is still generating costs.

Scaling your infrastructure to better match the expected load is one way to reduce spend on idle resources and still provide a performant application. Scaling means adding or removing resources like network services, storage, and compute in relation to the anticipated demand from your workload. Unfortunately, this is an extremely difficult and time-consuming task when done manually.

Autoscaling can come to the rescue by automating the scaling process. This could consist of adjusting resources such as the number of instances used for your workload in your infrastructure based on reaching a certain resource use threshold. For example, you could configure your system so that  80% CPU utilization for 30 minutes would automatically the addition of another server and load balance across the two instances. Similarly, an instance that was only using 10% CPU capacity might get deleted and load routed to other servers with available capacity. Autoscaling is beneficial because you are supplying your cloud with optimal resources. This means you are able to save costs, as you only spend money on resources you actually use and the correct amount needed for your workload. This further frees employees to apply their skills to more interesting and valuable tasks.

auto scaling

Figure 3. Autoscaling adjusts capacity to map to expected loads to balance having enough resources to handle the expected loads and keep the application running smoothly while reducing the costs of running unused infrastructure.


Scaling Categories: Vertical or Horizontal?


Vertical scaling can consist of changing the size of a server, or even completely substituting it with another. This could be scaling down so the server will be given less memory, CPUs, or network capability or scaling up to increase capacity. 

A perk of vertical scaling is that it decreases operation overhead. This is because instead of trying to provision many servers, you only need to focus on one. Therefore, there’s no requirement to distribute workload and try to make sure all servers are on the same page.

Still, it is important to keep in mind vertical scaling is limited. There is only a certain amount of CPU and memory and can be added to an individual instance, so a vertically scaled application could still run out of resources With that being said, an instance can have an adequate amount of memory and CPU, but resources aren’t always being used, especially if scaling is an involved and manual process Then you are paying for these resources while they are idle. The best case to use vertical scaling is when an application is complicated to distribute. 


Horizontal scaling differs from vertical due to the fact that you divide the workload across multiple servers, as opposed to resizing your app for a larger server. Horizontal scaling works well for websites because their traffic can be predicted by variables like different times of the year. This means they may need extra servers to manage the increased traffic and those servers can be removed when demand drops. Horizontal scaling is ideal for websites, for example, because users are able to each use a single server. This is beneficial for back end microservices as well as front end applications. 


How is Autoscaling Done in the Cloud?

Most cloud providers provide autoscaling service. For example Microsoft Azure offers Virtual Machine Scale sets, AWS offers Auto Scaling groups, and Google Cloud uses instance groups. All of these providers have features that enable users to scale horizontally. 


How to Select the Correct Size & Instance Types for Autoscaling

Your workload determines the size and type of instance you should use in your autoscaling group. To select the correct instance, you need to take into consideration what combo of memory and CPU will support the demand of your workload, without resulting in idle resources. 

To design an optimal auto-scaling group, you need to determine various parameters. This includes establishing the maximum and the minimum number of instances in your clusters, and a benchmark that will supply an additional instance or subtract one when necessary. The selection of parameter values is important because it dictates how much you will spend to use the cluster. 

Your minimum number of instances needs to be able to handle the base application load, as well as they should not have extra capabilities not being used. It may appear the right choice is to configure an individual instance to handle low-end requirements for the instance type, but this isn’t always true. You have to examine both the minimum needed resources, as well as the most efficient way to add instances. 


Autoscaling with Opsani

Opsani intelligently autoscales massive infrastructures for optimal cost and performance.  By leveraging machine learning algorithms, Opsani is able to study how the application performs under the volume of requests it is receiving. The AI algorithm will then examine the hidden relationship between the pod and traffic to determine the best moment to scale up. Furthermore, Opsani can consider a far greater combination of CPU and memory across a cloud provider’s offerings and quickly evaluate far more options to find an optimal solution than a human operator could. 

To autoscale efficiently and avoid infrastructure (and funding) loss, you need Opsani. Get your demo right now.  Opsani customers experience a 40-70% decrease in costs or 2.5x increase in performance. Overnight.