Introduction

A brief cloud optimization definition: 

(Noun.) A process or set of processes that enable a company to reduce cloud costs while maintaining or improving application performance.

 

The rise of DevOps has ushered in an era of high-velocity delivery and daily releases of new code. But despite the ever-growing complexity of cloud applications, the post-delivery portion of the Continuous Integration & Continuous Deployment pipeline has been woefully neglected. 

Most enterprises leave their apps running hot and only attempt to tune their apps for performance when they’ve failed to meet SLO and SLA, or as a response to downtime. Most application parameters are left untouched altogether, and enterprises massively overprovision in order to buy peace of mind. 

This oversight is costing large enterprises tens of millions of dollars, and hampering their performance. The goal of cloud optimization is to reverse this trend. Cloud optimization frameworks seek to achieve absolute cost efficiency and optimal performance (while adhering to compliance policies).

Businessman using computer in virtual server room

Organizational Challenge of Pushing for Cloud Optimization

Despite the promise of cloud optimization, often, IT Operations (particularly CloudOps and/or DevOps teams) arrive at an impasse when they try to implement such strategies within their companies.

The reason for such a dilemma stems from CloudOps or DevOps teams typically being stuck between two departments of the company. 

The CFO and the Finance team would love nothing more than to save as much money and company resources as possible, and so they set strict standards. /these standards are often based on previous (and outdated) on-prem infrastructure resource allocations. 

The application owners, on the other hand, are afraid that too little in the way of resources will affect the performance of the application. And so they throw everything they’ve got at it just to make sure, and take it as an affront to have their application’s resources be reduced.

IT Operations personnel frequently end up getting caught between a rock and a hard place. This is one reason why, as research reveals, enterprises are having serious problems when it comes to cloud cost optimization and application performance:

  • 80% of finance and IT leaders report that poor cloud financial management has had a negative impact on their business.
  • 69% regularly overspend their cloud budget by 25%.
  • 57% worry daily about cloud cost management.

This is why cloud optimization is so crucial, especially when turbulent times arrive. Cloud optimization allows CloudOps and DevOps teams to properly control resource allocation and application performance. It maximizes cloud value-per-spend while delivering optimal app performance. When this happens, the CFO and Finance department get what they want, and the application owners can be confident that they don’t need to overspend to get the job done. Best of all, end users are made happier.

Four Best Practices When Implementing Cloud Optimization

Here are some things IT Operations teams should consider when pushing for cloud optimization.

Teams should be careful about using reserved instances when appropriate, along with scheduling or autoscaling underutilized resources, as these approaches can greatly reduce costs.

Strategically employing new cloud infrastructure technologies is a great way to save finances. Aside from reducing operating costs, this also provides a workaround to issues involving environment configuration.

As a company grows, it will need governance for more robust, more defined processes. This governance structure will also ensure that the cloud gets used more efficiently.

IT Operations should make it their business to be aware of live cloud costs and how these costs translate into ROI for the company. If they don’t, the Finance department will keep on giving them a hard time, as the application owners continue to overprovision and overspend.

Why DevOps Demands Cloud Cost Optimization

But why do companies overspend, to begin with? That’s because IT has changed. Before DevOps, this was how it went:

pillar page illustrations 2020 01

Your developers wrote the code, which went through a build phase. After that, there were manual tests on the app and, upon the discovery of bugs or areas of improvement, manual tuning. When the team was sure the app was 100% ready, then they deployed it manually. This cycle was repeated for every new release, which came every month or two.

But with DevOps, it now looks like this:

pillar page illustrations 2020 02

Once the code is written, the CI/CD pipeline picks it up and carries automatic builds, tests, and deployments. The processes happen rapidly, in short cycles, again and again. Within this agile setting, Continuous Integration and Continuous Deployment are the norm.

Most enterprises are already using a robust and effective CI/CD toolchain. They are operating a delivery pipeline where developers blend their work within a single repository, and new code reaches users quickly and safely, generating maximum value. 

But did you notice something missing from the DevOps paradigm? That’s right: there is no effective post-release optimization and tuning. The post-delivery portion of the CI/CD toolchain is totally neglected.

This explains why companies find themselves overspending. The lack of optimization approaches cause application owners to use overprovisioning as a Hail Mary pass to avoid downtimes and service errors.

Why it’s Easy to be In Denial about Optimization

Lots of organizations hear about cloud optimization and they think: “No, that can’t be right. We can’t be non-optimized. We tune our applications!” Well, yes, and no.

In the CloudOps/DevOps paradigm, performance tuning and optimization does happen. But unfortunately, it happens only when things are running hot, or in response to downtime or failure to meet an SLO or SLA. When something in the app breaks, when a team starts to notice over-provisioning in the infrastructure, or when the application performs poorly and customers start complaining – these are the only times that people think of utilizing their cloud optimization tools to address the issues. 

And on the rare occasion that they do tune their apps, teams bring in a set of siloed and only partially effective tools. Why don’t these tools get the job done properly? Because they focus on the code and the app layer (UI, database schema, etc.). APM systems monitor basic app transactions and only trigger alarms if something goes really wrong. At most, they might offer some broad recommendations about how to reduce bottlenecks. 

This isn’t true optimization at all.

Effective cloud optimization is not achieved through troubleshooting only when problems arise, or through relying heavily on APM systems. None of those approaches will affect your cost significantly in the long run. Moreover, your DevOps teams run a great risk of being eventually overwhelmed by recurring issues, and your end user experience worsens.

Cloud Optimization is a Fine Art

True cloud optimization needs to go deeper than surface level recommendations that only focus on the resources. But why are most current cloud optimization services and tools not equipped to do this?

Look at it this way:

Even a simple five container application can have more than 255-trillion resources and basic parameter permutations.

That is a vast amount of configuration tweaks, available at any given moment. To really engage with this system, your DevOps teams would need comprehensive and flawless knowledge of the entire infrastructure:

pillar page illustrations 2020 03

And you would need this knowledge to cover layers across the application, data, and cloud infrastructure stack. On top of this, you would need deep familiarization with the application workload itself. 

It is highly unlikely that any human staff member will possess this knowledge or visibility. The developer who wrote the code is unlikely to be savvy enough when it comes to infrastructure. Even the rare person who is comfortable with both kinds of knowledge – infrastructure, and application workload – is guaranteed to be something of a generalist, and lacks the deep knowledge needed to carry out real optimization. 

And even if they had the knowledge, they couldn’t move fast enough. Because the measuring and tweaking that is needed to continuously optimize your app needs to happen at lightning speed.

This is because modern app workloads are undergoing constant change. Round the clock, developers are releasing new features, middleware is getting updated, behaviour patterns are shifting, and cloud vendors are releasing new resource options. 

Attempting to optimize with the right instance-type, number of instances, and settings for each instance involves numerous interdependencies that are the cognitive equivalent of playing a thousand chess games at once. And due to the speed of these changes, even if you did take the time to understand your infrastructure deeply, by the time you did, that understanding would be outdated.

This is why cloud and mobile apps  chronically run with less performance and more cost than is ideal and possible for that workload: because manual optimization is impossible to do on every layer of your stack.

True Cloud Optimization is Built on AI

Real cloud optimization is beyond the reach of human cognition. 

The solution? Leverage artificial intelligence (AI).

Achieving maximum efficiency for apps operating in the cloud means making judgements and decisions that are too numerous and fast-moving for human minds. But these judgements and decisions aren’t too numerous or too fast-moving for an AI. 

pillar page illustrations 2020 05
This is the basic Cloud Optimization (CO) model:
  • After the app code has passed through the CI/CD pipeline, the Cloud Optimization tool begins to measure the performance of that code. 
  • The CO tool formulates predictions about which set of configurations can further improve the performance of the application or reduce the cost incurred.
  • Then, it tweaks the settings and configuration parameters, implements the changes, and runs tests.
  • While this is going on, the CO tool measures data from the testing process and analyzes the data to learn how the changes affected the performance and/or cost.
  • The CO tool takes these learnings, compares them to previous data, and makes another set of predictions, which lead to a new set of configurations.
  • This cycle runs repeatedly and non-stop. The system keeps on finding new ways to achieve the highest possible performance with the lowest possible cost.

What Real Cloud Optimization Tools Should Do

True cloud optimization services and tools should utilize deep reinforcement learning (Deep RL) to optimize cloud infrastructure. 

Deep RL uses neural networks that are inspired by the connectivity and activation of neurons in the brain. When properly trained, these neural networks can represent your hidden data, allowing cloud optimization tools to build a knowledge base of optimal and sub-optimal configurations, similar to how the brain develops effective patterns of behaviour.

Following an implementation process that should be as straightforward as a simple Docker run command, effective cloud optimization tools should integrate with your existing CI/CD automated deployment pipeline and go right to work. Right away, your entire system should be monitored, and the tool should pay close and granular attention to how the shifts in every setting and parameter affect performance. This information is fed back into the neural network, which processes and learns everything it sees, so that its insights compound. 

This compounding means that the cloud optimization services engine becomes exponentially better at tuning performance and improving efficiency. 

Deep reinforcement learning should enable such cloud optimization tools to continuously examine millions of combinations of configurations to identify the optimal combination of resources and parameter settings. The tools should take in metadata about an application, gradually making tweaks to resource assignments and configuration settings to enhance performance and reduce cost, then continuously remeasuring the data.

Furthermore, effective cloud optimization tools should be able to perfect those settings that are usually judged too complex to touch, such as:

  • Resources
    • CPU
    • Memory
       
  • Middleware configuration variables
    • JVM GC type 
    • Pool sizes
       
  • Kernel parameters
    • Page sizes 
    • Jumbo packet sizes
       
  • Application parameters
    • Thread pools
    • Cache timeouts
    • Write delays
  • And many, many more. 

And because they are constantly gathering new and more powerful data, AI-powered cloud optimization services will be able to constantly uncover more and more new solutions. This often includes counterintuitive solutions that are not apparent to a human user. Such solutions should constantly react to new traffic patterns, new code, new instance types, and all other relevant factors. With each new iteration, their predictions hone in on the optimal solution, and as improvements are found they can be automatically promoted.

With cloud optimization, infrastructure is tuned precisely to the workload and goals of the application – whether those goals relate to cost, performance, or some balance of the two.

As part of reducing costs and boosting performance, cloud optimization tools often lead to:

  • Reduction of application risk and modernization of cloud infrastructure through instance rightsizing 
  • Optimization of various areas of interest (containers, VMs, software licenses, storage, resources, etc.) 
  • Automation through self-aware instances and/or self-optimizing applications
  • Workload reservation and/or routing
  • Efficient cloud migration to achieve digital transformation
  • Real-time response for virtual infrastructures and “bare metal cloud”

Conclusion

In the DevOps era, cloud optimization is a must for any enterprise with cloud-based, medium-to-large applications that have a need to reduce cost while retaining reliability and performance. If you have a solid annual cloud spend (or internal chargeback), and frequent rollouts/updates, then you need cloud cost optimization. Finding the right cloud optimization solution should be one of your top priorities.