cloud resources

We hate to be the ones to break it to you, but managing your cloud resources and usage manually is a massive waste of time. Don’t worry, you are definitely not alone – many enterprises make this mistake. Here’s why trying to manually determine the best EC2 instance type is a bad idea.

Applications are constantly changing and evolving. So are their resource requirements. To achieve SLA-level performance, the infrastructure must have adequate resources to meet shifting requirements. Not only that, but applications must also be tuned with the most optimal configurations. But the possible configurations number in the trillions. Pinpointing the best and optimal permutation is simply not possible for the human mind.

Manual tuning forces you to overprovision your cloud resources. In most cases, you end up with poorly managed or unused resources. Either way, they cost money.

So what’s wrong with doing this work manually?

In the past, planning and allocating resources for capacities was simpler. All you had to do was determine the maximum load a server could possibly handle. Then, you just had to size that server depending on the idea the peak load may happen whenever. (And guess what? You were probably sold resources that you didn’t need. Yes, even back then you were overspending; you were probably sold extra resources that you never actually even used.)

With the cloud, you may assume that only necessary resources would be provisioned in relation to load traffic. Unfortunately, this is not the case.

Of course, AWS has autoscaling, but this is not enough. Keep in mind that AWS has millions of EC2 configuration possibilities. AWS “General Purpose” instance types alone (A, T, M series) give you over 100 different configuration options.  If selection is done manually, since loads aren’t predictable, you’re going to have to decide which instance is best in regards to your peak load.

I know, determining which EC2 instance type is right for your workload seems simple. Determining the best instance type for one workload might be doable. Performance may improve to some degree. But can you manually determine which instance type is ideal for thousands of workloads? No. 

Even with one fairly predictable workload, there is a difference, sometimes a huge and costly difference, between OK and optimal. Why? 

Because optimizing your cloud usage involves a lot of repetitive, time-consuming testing to see if your adjustments improve or degrade performance. There are trillions of permutations to test, and fiddling with each variation of EC2 configuration individually is just impossible. 

If you are overprovisioning resources to provide a buffer against uncertainty, you are almost certainly overspending. You don’t want to underprovision your system and degrade application performance to the point where customers leave your site, frustrated. Matching these resources manually to the optimal configuration is just too difficult. Cautious engineers that are overprovisioning are also spending unnecessary money on cloud resources that will not be used.

Since manual planning isn’t feasible, what’s the solution? You’re probably thinking the next step is to adjust the code of your infrastructure. 

While automated infrastructure-as-code lets you make policy-based decisions, it doesn’t grant enough control over your environment. Your infrastructure becomes more responsive to changing conditions and will determine instance types instead of you doing it yourself.

However, unless you previously provided the correct policies for the automation to follow, your infrastructure may not select the optimal instances. Policy-based automation might seem like the right choice, but mismatched tuning parameters enforced in policies can result in even more cloud spend and operational issues.

The solution to this is Machine Learning (ML) automation. ML allows you to have control over what your business needs by setting performance or cost objectives and letting the infrastructure automatically adjust itself based on those requirements.

Although ML is a big step, it needs to be paired with permutation analysis

Unlike a human engineer, an ML system can analyze myriad permutations to understand their usage and efficiency for cost or performance. ML can then automate the implementation of these adjustments, based on the predefined business objective.  ML is necessary because these permutations are constantly changing and require an updated evaluation of what optimal means for the system. 

As requirements change and new permutations come to the surface, your cloud optimization system constantly learns, updates and implements optimum performance without human intervention. Your cloud optimization platform accurately predicts your application requirements. It automatically provisions the right amount of cloud resources when usage spikes up, and shuts processes off when traffic slows down, preventing wastage.

Furthermore, although using multiple clouds adds to the complexity, ML is able to determine optimal configurations within a single cloud environment as well as manage the much greater complexity of optimizing across a multi-cloud environment.