Understanding Amazon Web Services EC2 Pricing Options


Understanding Amazon Web Services EC2 Pricing Options

Amazon Web Services (AWS) Elastic Compute (EC2) instances provide the AWS compute service. As with all things AWS, there are many options available regarding how to pay for this service. Let’s look at the options and when they might provide the best value.

Free Tier

For completeness, I will mention the AWS Free Tier. This gives you free access to up to 750 hours a month of Linux and Windows instances for up to one year.  These are the smallest EC2 instances available–  t2.micro in most cases, though you can get  t3.micro instances in regions that do not have t2.micro instances. If you use another instance type, you will start getting charged for all EC2 instances you are using, including the .micro instances.  This is a great service to try out AWS, but it is unlikely that you will be satisfied with this for a production system, so let’s consider what you get when you start paying for your EC2 services if you want to learn more about AWS’ free services here.


On-Demand instances provide the greatest flexibility and control over spending, though, as we will see, they may be more expensive than other payment options, depending on your use case. 

Depending on the specific instance type, AWS will bill you for your EC2 instances by the hour or the second. You only pay for the time per instance that you actually use, and you can freely change the type of instance(s) you are running.

On-demand is the right choice when an upfront payment commitment for a certain amount of compute resources does not make sense. This payment model is useful in application development/testing cases where you are not certain about the best EC2 instance to use for an application. If your workloads cannot be interrupted (see Spot Instance pricing for comparison) are short-term use, highly variable, or unpredictable, on-demand instances are the way to go. 

Spot instances

Spot instances use “spare” Amazon EC2 computing capacity at up to 90% cheaper than the same On-Demand EC2 instance. This allows Amazon to put resources to work currently idle but are actually reserved for another service. While it may or may not happen, if the service that has the instance reserved needs the instance your application is using, your use is terminated.

If you are running an application that is tolerant of potential interruption/failure, using spot instances has the potential for providing substantial cost savings. They can also be a way to scale up an application that suddenly needs a great deal more compute capacity. If your budget is currently limited and you wish to try some of the more expensive EC2 instance types or use more capacity than your budget would allow using on-demand instances, spot instances can also provide a solution. 

Savings Plans

Savings Plans can provide up to 72% discount on EC2 instances compared to equivalent On-Demand instances. The requirement is a commitment to a consistent amount of hourly use for a period of 1 or 3 years.  This can provide great cost savings if you know your applications’ workload profiles and are willing to commit to a longer-term contract.  Using a Savings Plan to cover your baseline workload along with On-Demand or Spot Instances for periods of increased compute demand can provide a balance between cost savings and workload responsiveness.

AWS has a tool, the AWS Cost Explorer, that will help you figure out your actual resource usage and understand where you might benefit from a Savings Plan to purchase your EC2 resources. In the example below, from the AWS Cost Explorer overview page, it is possible to see that most resources for this example system have a fairly constant workload and could see substantial cost savings if a 1 or 3-year commitment makes sense.

Reserved Instances

Reserved Instances are similar to a Savings Plan but are assigned to a specific Availability Zone. Reserved instances guarantee you required resource access in the AZ you have chosen. They can cost up to 75% less than the equivalent On-Demand instance. Much like the Savings Plan, if you have fairly steady workloads and can commit to a 1 or 3-year contract, this model may save you money.  

Dedicated Hosts

A Dedicated Host gives you dedicated access to a physical EC2 server. This means that you are the only “tenant” that will have access to that particular machine. If you are using software that requires their licenses to be tied to a single machine (e.g., Windows Server, SQL Server), the Dedicated Host can provide the necessary single server compliance and save on cost.  The service comes integrated with AWS License Manager.  The manager service ensures that the Dedicated Host instances are compliant with license terms and launch on appropriate instance types. Dedicated hosts can be purchased using the hourly On-Demand model or as a Reserved instance for up to 70% less than the On-Demand cost. 


While on-demand instances are typically what people start out using, substantial cost savings can be realized using other available EC2 purchase models. Spot instances can provide on-demand-like flexibility at substantial savings as long as your application can tolerate being preempted.  Both Savings Plan and Reserved Instances can save money while providing you the necessary resource coverage for consistent workloads over time. Dedicated Hosts provide a way to launch EC2 instances with software that has licensing restrictions in an On-Demand or Reserved pricing model. Amazon’s integrated Cost Explorer can be used to provide the necessary insight into your system’s resource use to make decisions about which purchase model or combination of models can help you optimize your AWS cloud spend. 

Here's Why Manual Workload Tuning is Obsolete


Here's Why Manual Tuning is Obsolete

Manual workload tuning is reactive and takes several weeks to tune. Enterprises are over-provisioning their applications as they cannot do tuning on the scale that is needed. This results in massive waste, sub-optimal performance, and lower availability of the application. To learn more watch our last webinar Using Machine Learning to Optimize All Applications Across the Delivery Platform

Request A Demo

Adaptive Tuning for Load Profile Optimization


Adaptive Tuning for Load Profile Optimization

Initial application settings are generally derived from experience with similar systems’ performance or overprovisioned to head off anticipated performance bottlenecks.  Once an application is running and actual performance metrics are available, it becomes possible to tune parameters to more appropriately assign resources to balance performance requirements and cost. In simple, stable systems, this cycle of measure, evaluation, and improvement are relatively straightforward to apply. 

To break this basic tuning steps out more explicitly:

  1. Establish values that provide minimum acceptable performance (a Service Level Objective (SLO))
  2. Collect metrics on system performance
  3. Identify the part of the system that is limiting performance (e.g., CPU, memory)
  4. Appropriately adjust the part of the system, causing the bottleneck.
  5. Again collect metrics on system performance.
  6. If system performance improves, keep the modification; if it degrades performance, reverts or tries a different adjustment.

While relatively simple to apply in simple and stable systems, as system complexity increases, the number of potential performance-impairing bottlenecks increases as the overall performance depends on inter-service interactions. Process automation becomes important as relying on human intervention to maintain SLOs becomes overwhelming and may not adjust system performance quickly enough to meet SLOs in a dynamic environment. 

Cloud computing systems and the common microservice architectures of cloud-native applications have the ability to automatically scale to maintain performance SLOs in the face of variable loads.  Increasing loads can trigger the system to scale up resources to maintain performance. Decreasing loads can trigger a scale-down of resources to levels that still maintain performance and remove the cost burden on idle resources.

Database and big data applications have been at the forefront of understanding and automating the process of adaptive tuning. Herodotou and collaborators identify six approaches to performance optimization: 

  • Rule-based –  An approach that relies on the system behaving as based on prior experience with similar systems.  This does not rely on metrics/logs or a performance model; this will provide initial settings to get started but is unlikely to provide optimal performance settings.
  • Cost modeling – An analytical (white-box) approach based on known cost functions and understanding of the system’s internal functions.  Some form of the performance metric is required to develop the predictive model.
  • Simulation-based – A model to predict performance is generated from a set of experimental runs that simulate load scenarios (e.g., using a load generator) and evaluating optimal parameter settings. 
  • Experiment-based – Search algorithm-led experiments with varying parameter settings are used to identify optimal settings.  
  • Machine learning-based – A black-box approach that generates predictive performance models that do not rely on internal system functionality knowledge.  
  • Adaptive – Configuration parameters are tuned on a running application using any number of the methods listed above.

While any one of the above approaches can be used to tune a system’s performance, doing so effectively will likely leverage several approaches as the “Adaptive” category suggests.  While rule-based methods can be a quick and dirty way to provide initial conditions and, if those rules include the ability to adjust the application resources in response to workload changes (e.g., Autoscaling thresholds), the result is an adaptive system.  Combining AI methods with rules-based methods can improve adaptability by adding a predictive ability level (e.g., AWS Predictive Autoscaling for their EC2 service).  Indeed, combining rule-based with ML-based approaches best addresses the need to adapt to both changing workloads and systems changes.

While rules-based auto-scaling can adapt to workloads changes, the next question you may wish to ask is whether the application profile is configured optimally. As the application and possibly the supporting infrastructure are scaling in response to load, are resource settings such as CPU, memory, and network configured to perform as load requirements change optimally? The challenge here is that adaptively tuning your system configurations becomes exponentially more complex as you keep adding tunable parameters.

Increasingly, the ML approach to adaptive tuning is becoming not only possible to apply but is almost a prerequisite to achieving true optimization. Peter Nikolov, Opsani‘s CTO and co-founder, gave a presentation “Optimizing at Scale: Using ML to Optimize All Applications Across the Service Delivery Platform (Las Vegas 2020),”  in which he pointed out that one application that had eight settings for two tunable parameters for twenty-two component service would have an 822 (74 quintillion) possible tuning permutations. This is outside of the scope of the human ability to search for a truly optimum, but, in this case, the Opsani machine learning algorithm was able to rapidly search and identify the settings that provided both optimal performance and lowest cost.

If we now add considerations of variations in the workload itself, effective adaptive tuning with the machine learning approach starts to need not just an adaptive but also an autonomic approach.  Oracle’s Autonomous Database and Opsani’s Continuous Optimization service are examples of continuous adaptive tuning in action.  The ability to appropriately respond to changes in the system without human intervention removes the drudgery or toil from searching and implementing (hopefully) optimal configuration settings; it also greatly reduces response time in applying the appropriate optimum.

The six categories of performance optimization can be viewed as an evolutionary approach to adaptive tuning. Rules-based approaches will get you up and running and can be applied without any actual performance.  With the increasing ability to get performance metrics and apply appropriate modeling techniques, discerning and applying performance-improving changes become more rigorous and complex.  Eventually, applying machine learning approaches to evaluating data and automating tuning the system allows the rapid discovery of optimal settings even in the face of changing workload, application, and system requirements.

If you would like to experience AI-driven application optimization for yourself, Opsani offers a free trial.

Using ML to Optimize All Applications Across the Delivery Platform


Using ML to Optimize All Applications Across the Delivery Platform

Learn how Continuous Optimization as a Service (COaaS) allows users to continuously and autonomously tune workloads with sophisticated AI algorithms. We will deep dive into understanding how AI engines will allocate resources dynamically to tune applications for the best performance at the lowest possible costs. Offloading your engineers to focus on critical feature delivery while the infrastructure automatically delivers the best performance at the lowest possible cost. For more Opsani knowledge, check out our last webinar the Artificial Intelligence and the Enterprise Stack.

Request A Demo

Getting Ahead of the Curve with Proactive Autoscaling


Getting Ahead of the Curve with Proactive Auto Scaling

Autoscaling is a way to automate away the manual toil involved in adding or removing resources to support application performance.  There are three primary types of auto-scaling: “regular” auto-scaling or reactive auto-scaling, proactive auto-scaling, and predictive auto-scaling. In this article, we’ll have a look at why auto-scaling is useful and where each type of auto-scaling is most useful.

Traditionally, changes in load demanding the provisioning of additional application instances or infrastructure were manual and time-consuming. The general approach to avoiding performance issues and even crashes from not having enough resources to meet demand was overprovisioning the system. While overprovisioning gave IT more runway to respond to growing loads, it also resulted in idle resources that, in some cases, might be provisioned, paid for, and never even used. 


On-demand access to resources in cloud systems has provided the ability to more rapidly and reliability scale in response to changing demand. In contrast, some IT departments continued to scale systems manually. However, more quickly than with traditional infrastructure, automation – auto-scaling – has become a common and reliable way to address scaling any time of the day and more quickly than relying on engineers to do so manually. Currently, the two primary approaches are “regular” auto-scaling and proactive auto-scaling, with predictive autoscaling just starting to come online.

Reactive Scaling

When we talk about auto-scaling, the typical approach is to set a trigger based on a use limit on memory or CPU or some other performance metric of importance. This is also known as reactive auto-scaling. When a limit is exceeded, say 85% CPU use for more than one minute, automation kicks in, and additional application replicas or infrastructure is created. This process is generally configured to continue automatically up to an upper resource limit based on an auto-scaling policy definition.  Lower limit thresholds can also be similarly used to downscale resources. Automating this response to demand is especially useful in environments where there is no clear pattern to changes in demand.  With auto-scaling automation in place, the system can appropriately address changes in load, ensuring performance SLOs are maintained, costs from idle infrastructure are avoided, and on-call engineers can get some well-deserved sleep. One caution with reactive scaling is that, depending on how quickly infrastructure or application replicas come online, lags between the trigger and resource availability can cause performance issues if load increases very quickly, as illustrated below.

reactive autoscalingProactive Scaling (Scheduled Scaling)

Proactive autoscaling does not wait for a trigger and rather scales on a cycle (each weekend or after business hours) or in anticipation of an upcoming event (e.g., Cyber Monday sales events or a new product release).  Proactive scaling is appropriate when you either have data on typical and predictable load patterns that warrant up or downscaling. This can be especially appropriate when there is a startup time lag when creating an instance. If a service demand spikes quickly, it may be more efficient and performant to have the needed number of instances in place before the expected demand event.

AWS Cloudformation: An Introduction to the Automation Service


AWS Cloudformation: An Introduction to the Automation Service

AWS CloudFormation (CF) is an automation service that can be used to model and deploy an Amazon Web Services resource stack based on a text-based template. A CloudFormation template describes all the AWS resources that you want (like Amazon EC2 instances, AWS Elastic Beanstalk environments, Amazon RDS DB instances, …). The CloudFormation service then uses the CF template to correctly provision and configure the requested resources and manage resource dependencies. Some of the key benefits of this template-driven model of managing infrastructure include simplifying management processes, allow version control of infrastructure changes, leverage existing templates to replicate infrastructure.

Automate and simplify management

Amazon Web Services provides many microservices that are commonly used together to support an application.  The graphic below shows the AWS resources for a LAMP-stack scalable web app. You can see a back-end database, load balancer, and autoscaling group, among other pieces that all work together to make this app work.

While you could manually interact with each AWS service to provision them in turn and then configure them to work together, CF provides a simpler way to do this. In the case of this particular example stack, you could use the existing CF LAMP stack framework template as is or modify it to match your requirements. CloudFormation automatically creates the Auto Scaling Group, load balancer, database, and security groups for you with the CF template. 

To emphasize all that CloudFormation is doing for you, consider the process of creating an Amazon Route53 (DNS) record and associate it with an EC2 instance. The EC2 instance needs to exist before the creation of the DNS record.  You could sit and wait, for instance, for creation to complete and then got to Route53 to create the DNS record. Or you might do something more clever and script something up with AWS API calls, some wait loops, and retry logic.  Or you could create a CF template and let CloudFormation’s built-in intelligence make sure that the resources are created in the correct order.

CloudFormation also helps with cleanup. If you want to delete a running stack, CloudFormation will automatically delete all the stack resources. In addition to simplifying management of the entire collection of resources as a single unit, CF automation also ensures that resources are not orphaned when cleaning up.

Version Control and Infrastructure-as-code

If you manage your AWS resources manually, interacting individually with each service to build your application environment, it becomes important to keep track of changes to your configurations if a new config causes a problem, and you wish to revert to the original state. Because the CloudFormation template is text-based (written either in JSON or YAML), it becomes easy to put your CF templates into version control.  (See a partial example of an EC2 CF template below or check out the full template.) 

With your infrastructure defined as a CloudFormation template, if you change up the EC2 instance types and suddenly apply performance issues with the new configuration, you can easily revert. Just rerun CloudFormation with the prior CF template to return to the prior acceptable state.  All of the other benefits of version control that apply to using it in software development apply to this infrastructure-as-code management model – you can review what changed, when, and who made the change.

Automate Infrastructure Replication

You’ve probably already figured this out, but it becomes simple to recreate your infrastructure in different zones or regions with an infrastructure template. This simplifies the dev-to-prod process as an approved dev CF template can be promoted and used to create your new production environment without worrying that manually recreating each resource introduces errors.  It also becomes trivial to set up additional environments in different regions for disaster recovery (DR) or high availability (HA) purposes. Automation for the win.

Understanding CloudFormation – a Simple Template Example

We took a look at this sample Amazon EC2 instance with a security group earlier, but let’s look a little closer. A CloudFormation template has six properties:

Here’s a short explanation of what each means with the important ones bolded:

  • AWSTemplateFormatVersion: Identifies the specific AWS CloudFormation template version.
  • Description: Text string description of the template.
  • Mappings: Keys: value mapping that can be used to specify conditional parameter values. 
  • Outputs: Stack’s properties visible in the AWS CloudFormation Console.
  • Parameters: The values that you can pass into your template at runtime.
  • Resources: Specifies the stack resources and their properties.

Parameters and Resources are critical as they are what make the template work. Let’s look at the EC2Instance resource:

While on the surface, this seems like a fairly trivial description of an EC2 server, the “Ref” notation references the specifics that this EC2Instance description aggregates. Below you can see the specifics of the “Instance Type” definition referenced in the “Properties” section.  You can see a default of “t2.small” is specified, and, while I’ve shrunk the list for this example, alternative allowed values that you could specify when launching the stack. As an exercise, look at the full template and see if you can find what the “InstanceSecurityGroup” references. You can dig deeper into the specifics of the pieces here in the CloudFormation EC2 instance resource definition.


If you are an SRE working in an AWS environment, CloudFormation can help you manage your infrastructure-as-code and automate away many manual processes that will not work at scale. CloudFormation provides a way to templatize, manage, and automate your infrastructure in an easily repeatable manner. 

This article tries to show that the basic concepts behind CloudFormation are not overly complex. The complexity that CloudFormation has a reputation for comes from its ability (and power) to automate many of the massive catalogs of cloud resources available from AWS. If you work in an AWS environment, the effort of learning to work with CloudFormation will pay you back in the elimination of toil, the low-value, repetitive manual actions that you will otherwise need to manage. 

Principles and Practices of Google Cloud Platform Cost Optimization


Principles and Practices of Cost Optimization on Google Cloud Platform

With great complexity comes many options for optimization. This article will first examine some of the overarching principles of cloud cost optimization for Google Cloud Platform (GCP). Then we’ll consider some practical actions you can take now. 

While the following will call out some GCP specifics, I find that these principles can be broadly transferred to any cloud system.

Cost optimization – Know your target.

One challenge with cost optimization is there are myriad ways to achieve it. But, not all options will result in positive outcomes for your business. It is worth keeping in mind that cloud systems’ real benefits are not relatively cheaper than physical infrastructure. Though they may be, but rather that they provide you with a faster time to value.  In other words, if you keep your cloud costs reasonable, the value comes from being able to provide greater value to your customers. This will translate to increased revenue.

Cost-cutting thus requires an understanding of not just cost but also performance and reliability targets. This then brings into play the developers and site reliability engineers (SREs) and the company’s business side – executives and finance.

If you can define service level objectives for costs, performance, and reliability, you now have a target to aim for. 

Implement and leverage cloud cost metering and monitoring

Traditional IT infrastructures had fairly dependable budgets. They were then doled out to business units for capital expenditures after approval, and forecasting utilized historical data to derive future budget needs. This model’s static nature provided budget certainty. One of the pain points as purchases was not instantaneous, often required purchase approval,  ordering, delivery, and installation, and that took time.

With cloud environments, costs are now operational rather than capital.  Budgets can be spent in an on-demand fashion and, as needed, in a timely fashion. Only the resources being used are paid for. But because purchase decisions are being made subject to less review, having a way to understand actual operational costs becomes critical. 

As it turns out, one of the key features of a cloud system is having measurable services, and GCP provides ways to track and measure costs for each of your cloud services.  Still, this can become a challenge with larger systems as the sheer number of measurements can become difficult to parse if standards are not put in place.

Defining standards for resource labeling, defining unit budgets and spending alerts and developing a model that makes sure that engineering and finance are communicating cost requirements effectively becomes critical. One holdover from the CapEx world is developers and engineers treating servers as ‘theirs’ and keeping them reserved even when idle. Another common CapEx holdover is over-provisioning resources to ensure performance and then paying for the overpowered system during non-peak times. Standards help define boundaries and expectations around resource use in an otherwise easy to overspend environment.

The GCP Cloud Billing reports are an incredibly powerful tool to understand service costs, especially when services are appropriately labeled so that costs are attributable to specific teams or departments.  The ability to create customized dashboards with Cloud Billing also allows getting away from just looking at service costs. Appropriate labeling allows business-relevant evaluation of GCP resource costs compared to revenue generated by specific customers. This can then feedback into the discussion of budget and resource allocation between finance and engineering.

Do you know your cloud’s value versus its cost? 

The CapEx world was very much about controlling and reducing costs.  In part, this was because the server you bought for that project was not something that would be returned.  Things are very different in an OpEx world, and it is no longer just about cutting costs. The cost optimization focus is better thought of as eliminating waste and maximizing the value derived from your spending.  Here again, finance and engineering can work together to define standards and metrics to understand a service’s operational cost/value proposition. 

  • What is the value that Service X provides to our customers?
  • What does it cost to provide Service X?
  • How can I optimize the cost of Service X without degrading performance?

Depending on where you are in a digital transformation journey, this may be a more or less difficult task to implement. Still, the value of understanding the actual cost-benefit of service provides valuable insights when trying to keep costs down and customers happy at the same time.

Start with standards and automate processes early on.

If you can define standards early on in a project, you will be better off than enforce them retroactively. Defining how resources are labeled and providing limits on resource deployment maxima is good to define upfront. Coupling standards with automation tools such as GCP’s  Cloud Deployment Manager or  Terraform will help ensure application consistency.

Setting up a sensible cost management hierarchy to create logical resource groupings is also best-done upfront. You can, and should, define a simple structure that meets your initial for cost management and attribution and then add complexity if needed. With GCP, you can leverage the setup wizard to get recommendations about setting up your environment.

I’ve mentioned the importance of labeling resources as key to effective cost management, and it is worth reiterating. Having well-labeled resources ties costs to a specific business unit or project so that it becomes possible to tie service costs to business value. By default, it is easy to see that your company is spending $35,000 on Google Kubernetes Engine (GKE). If you label the two services you run on GKE, you can see that you spend $10,000 on GKE for your web photo catalog that generated $11,000 in revenue, and $25,000 on GKE for an AI image search engine generated $150,000.  

Review cloud costs and optimization practices regularly.

Your specific review cadence will depend on your development and customer environment; having a regular review process is important to avoid spending surprises. Again, having teams with diverse responsibilities engineering, dev, finance, executive, meeting to review the data on use, and potentially adjust use/cost forecasts. The default GCP Cloud Billing console makes it simple and quick to review and audit your cloud costs regularly. Putting effort into setting up a custom dashboard can be well worth simplifying specific metrics important to your company.

It is worth considering that if your customer base is fairly stable and your system is relatively small, reviews may not need to be as frequent as dealing with a dynamic or huge system.  Large systems with multiple applications and cloud spend in the seven-figure range per month can rapidly and unnecessarily lose revenue from inefficiency and, conversely, rapidly save significant revenue from optimization efforts. 

Setting actionable priorities

As previously mentioned, there is often, but not always,  a tradeoff when looking at optimizing cost, performance, velocity, and reliability. When looking across a company, it can be difficult to make decisions across competing requirements and goals. A multi-disciplinary team review of optimization objectives and goals helps find a realistic balance between cost savings and customer value impacts. If there is a clear understanding of the required effort (dev or engineering), the potential cost savings, and the potential business value, it becomes easier to make informed decisions. 

The Practice of Cost Optimization

Cloud optimization is not something that can be solved using a default procedure or checklist. Everyone’s cloud environment is in some way unique, as are the specifics of the applications you are running. Once the structure for communication monitoring, evaluating, and tuning for cloud cost optimization is set, it is time to apply some specific tools and practices. Google Cloud optimization tools can be generally binned into three categories: cost visibility, efficient resource allocation, and pricing optimization.

Cost visibility

The variable nature of cloud systems and the on-demand capabilities that allow DevOps and SRE to address variation can result in unintended and unexpected costs. Understanding and controlling spending is an initial first step in optimizing your costs. With Google Cloud, you immediately have access to several no-cost billing and cost management tools that provide the necessary visibility to start making necessary spending adjustment decisions. We’ve already touched on the value of using GCP’s default billing management tools and the ability to customize your billing dashboard for greater insights. Additional cost management/visibility features include quotas, budgets, and alerts that will give you real-time alerts and greater control over costs and reduce the probability of unintentional overspending. 

Resource Usage Optimization

Overprovisioning is a pervasive opportunity for cost optimization efforts. While overprovisioning has its roots in traditional IT principles, it remains an issue in environments where automation to monitor and right scale resources is not fully developed.  The GCP ” Recommender” can help identify over- or under-provisioned or idle resources.  Looking at ways to automate such management should be a key SRE team goal and is where Opsani’s continuous optimization function is ideal for applying.

Some specific resource use optimization actions to take include:

Delete Idle VMs: Getting rid of resources you are paying for but not using is a clear way to cut costs. There is even a GCP Idle VM Recommender that will find inactive VMs and persistent disks. Deletion should be done with care and ideally by the person or team that created it.

Schedule non-prod VMs: Most cloud systems only charge for the resources you have running. If you have dev/test environments that are not active during non-business hours, turning them off can provide substantial savings. 

Rightsize VMs: Overprovisioned VMs will have you paying for resources that you are not getting value from.  Having machines that are too small (e.g., worker nodes in a GKE cluster) may result in inefficient bin packing, and again, you are paying for unused resources. There are also GCP rightsizing recommendations that can show you how to effectively downsize your machine type based on changes in CPU and RAM usage. Simply finding a better default machine type can help resolve this. If you are really into tuning your environment, it is possible to create custom GCP machine types with the correct amount of CPU and RAM for your specific use case. 

Pricing efficiency

GCP, like most other cloud vendors, have a wide range of pricing options. For VMs, these include volume, sustained use, and committed use discounts, preemptible instances, per-second billing, and others. GCP also offers many storage classes where the price is generally correlated with the frequency of data access and the rate at which data can be retrieved. These specific pricing options you can use will depend on your use case and backed up by performance data.  A couple of examples:

Preemptible VMs: These virtual servers are highly affordable and appropriate for fault-tolerant and generally short-lived uses.  These VMs only live up to a maximum of 24 hours and can cost up to 80% less than normal VMs. 

VM Sustained-use Discounts: This uses resource-based pricing that applies sustained use discounts to all predefined machine types in a region collectively rather than to individual machine types. The greater the aggregate use, the lower the overall costs.

Cloud Storage Classes:  The default GCP storage class is standard, but if you rarely access data (e.g., data archives), the nearline or cold line storage classes may provide. If you are maintaining data that you are unlikely to access (e.g., legal discovery requirements), the archival class may provide even further savings. 

From principles to practice

While I wanted to provide some practical examples in this article, there are truly myriad options for improving GCP’s cloud costs. Look for a deeper dive into hands-on recommendations in an upcoming post.  

Much like the DevOps journey that many of us are on today, cost optimization is a journey that parallels the growth, tuning, and automation of your cloud environment. I hope you appreciate that visibility into your actual costs and the source of those costs are primary.  Secondarily, equally important is the inclusion of an appropriate but diverse set of views about business goals, operational goals, and development goals to understand the true effects of cost on business value, application reliability, and performance. Opsani can be one piece of the puzzle that provides an automated optimization solution that is always up to date regardless of your GCP environment changes. 

Should You Use Kubernetes with Kubeflow or a Managed Service?


Should You Use Kubernetes with Kubeflow or a Managed Service?

At the MeetUp Artificial Intelligence and the Enterprise Stack, Jocelyn Golfein answers whether she believes it is best to use Kubernetes with Kubeflow or a managed service. Jocelyn then passed the question on to Peter, Opsani’s CTO and VP of Engineering, to give his expertise. Check out our last video, Digital Transformation: Should A Company Use AI?  for more knowledge from Jocelyn regarding AI and how you can apply it to your enterprise.

Request A Demo

Optimize Your Azure Costs To Meet Your Financial Objectives


Optimize Your Azure Costs To Meet Your Financial Objectives

One of the key reasons companies startup new ventures or “transform” legacy IT models with cloud computing is to meet financial objectives. Without proper planning, execution, and monitoring, what should be cost savings can actually become a financial drain. Furthermore, without proper controls in place, this can happen with surprising rapidity. 

Microsoft Azure provides several tools that help with efficiently managing your cloud to keep costs down. Azure has features to provide current cost overviews and cost forecasts, spending controls, and workload cost optimizations. Let’s take a look at some of Azure’s features. Then we will consider some actions that can help you optimize your cloud spending to start saving on costs right away.

Understanding current and future costs

To optimize your Azure costs, you first need to have insight into what you’re spending now. Then, forecast what your bill is likely to be in the future for your current and planned projects. Cost Management for Azure provides free cost management and billing functionality. It provides up to date information on current costs and can provide cost forecasting. Costs can be allocated to specific teams and projects. It is possible to set budgets and spending alerts to avoid cost overruns.

In addition to being able to understand current cloud costs, Azure has tools that help you estimate future spend as you look to add new services or optimize existing ones. Cost Management for Azure provides analytics to understand the status quo and two other tools can be used in conjunction to plan future costs. The Azure pricing calculator provides a simple interface to add the products you use and understand Azure options that might provide additional savings.

Azure’s Total Cost of Ownership (TCO) calculator is intended to provide a cost TCO comparison between an equivalent on-premise and Azure cloud system. The generated report provides a way to include datacenter costs such as electricity and even personnel costs. 

Optimizing Workload Costs

While the monitoring and management tools just mentioned can identify broad areas for resource optimization, Azure provides a couple of key tools to optimize your current system: Azure Advisor and the Microsoft Azure Well-Architected Framework. Further, taking advantage of special rates available for a spot or reserved instances or Dev/Test pricing can increase your savings.

The Azure Advisor considers a comprehensive list of Azure best practice recommendations.  It provides a personalized set of recommendations based on your system’s specifics. The Advisor can identify underutilized resources such as idle VMs, unprovisioned ExpressRoute circuits, idle virtual gateway networks… and provide you recommendations for deleting or reconfiguring your settings to reduce costs.

The Azure Well-Architected Framework considers the high-level architecture of your system and will provide best practices for cost optimization. Along with cost optimization specifically, the framework includes consideration of operational excellence, performance efficiency, reliability, and security, all of which have potential cost implications. The Azure Well-Architected Review allows you to evaluate your current system or explore possible future workload scenarios and receive personalized recommendations for improvement.

Keeping costs under control

One of the key benefits of cloud computing is the ease of access to resources that enable functionality such as autoscaling. Without a clear understanding of cost implications, such ease of access can also result in unexpected charges that can result in budget overruns.  Companies undergoing a “digital transformation” are especially prone to cost overruns. This results from engineers and developers treating cloud resources like traditional IT resources. Putting a clear governance structure in place with clear management policies and spending guardrails can help keep costs under control.

Microsoft has developed the Microsoft Cloud Adoption Framework for Azure provides an overall cloud management framework that has five key “disciplines”: cost management, security baseline, identity baseline, resource consistency, and deployment acceleration. The cost management discipline specifically seeks to reduce cloud cost risks while supporting the successful implementation of the other four disciplines. Once cost management policies are defined, cost controls and guardrails are easily implemented at the cloud-scale with Azure Policy

Getting started with cost optimization

The Microsoft Azure Well-Architected Framework recommends the following four steps to comprehensively address your systems cost optimization:

If you are not ready to do a complete overhaul on your Azure system and are looking for some easy cost optimizing wins, here are some actionable steps:

  1. Use Azure Advisor to identify and shut down unused resources
  2. Use Azure Advisor to right-size underused resources
  3. Consider reserve instances for baseline workloads and spot instances for preemptible workloads.
  4. Configure autoscaling to let Azure rightsize your infrastructure to match your actual workloads.
  5. Consider taking advantage of Azure Dev/Test pricing for your dev environments
  6. Use Azure Cost Management to set up team and project budgets and cost allocations to define and monitor spending.

Azure provides a wide range of Azure-native options when it comes to optimizing your cloud costs. You can find additional resources at the Azure cost optimization overview page. Opsani is dedicated to helping companies continuously achieve optimal cloud spend. While also maximizing performance. Learn more about how Opsani leverages machine learning algorithms to provide automated and continuous cost optimization on Azure and try for yourself with our free trial.

Kubernetes Cost Optimization Best Practices


Kubernetes Cost Optimization Best Practices

Cost savings is one of the biggest drivers behind the adoption of cloud systems. As companies increasingly adopt cloud-based technologies such as Kubernetes, to support efficient and agile operations. Some are unexpectedly seeing costs grow. This can occur for a number of reasons. A key problem is the tendency to pad application and infrastructure resource allocations. This is done in an effort to avoid instability in dynamic cloud environments. Over-provisioning made sense with a traditional non-elastic infrastructure. However, cloud-native application orchestrators like Kubernetes have built-in features that allow systems to automatically respond to environmental changes. Over-provisioning CPU and memory incur costs that may not provide actual value most of the day. 

Kubernetes scaling models

Kubernetes provides several scaling models. The Kubernetes Cluster Autoscaler will scale the number of worker nodes in a cluster in response to workload.  This allows the cluster to automatically grow when demand increases. This is to maintain the performance or the cluster can scale down as demand drops. This is done to avoid paying for unnecessary nodes. 

Kubernetes autoscaling

Kubernetes also provides two pod autoscaling options: the Horizontal Pod Autoscaler and Vertical Pod Autoscaler.  The HPA simply adds or deletes pod replicas in response to CPU load. The VPA scales pod size in response to load – specifically adjusting CPU and memory requests and limits. The ability to “right-size” resources, up or down, again provides for efficient resource use.

While these tools will keep your app running smoothly, they are not particularly cost-focused. The container/Pod requests and limits feature is one place where optimization can pay off serious dividends for both performance improvement and cost reduction. Pod request settings (CPU and memory) define the minimum requirements for resource availability on a node in order for Kubernetes to deploy a Pod on that node. Limits define resource maxima. If exceeded, can result in Kubernetes throttling the Pod (CPU) or terminating it (memory).  

As you can probably guess, over-provisioning requests and limits is common. This is because developers do not want to deal with the pain guessing too low and seeing performance issues. The unfortunate corollary to bloated, but performant Pods is more resources are required to deploy these applications. Thus, increased overall cost. It becomes harder for Kubernetes to bin pack nodes with large Pods. Also if you have autoscaling enabled, may end up with clusters typically running on the large end of the scaling range. Again, paying for nodes not being efficiently used is an unnecessary cost.

How can you solve this?

Opsani solves this challenge by using machine learning algorithms to discover the actual application resource requirements. It then passes them on to Kubernetes to apply to the environment. By rightsizing the Pods that are deploying your applications, Kubernetes can more efficiently bin pack those Pods onto fewer nodes.  Performance is maintained, costs are dramatically reduced.

With the improved understanding of actual resource needs, it then becomes possible to adjust the types of nodes that are being provisioned in the cluster. This is done to better fit the known application requirements.  This also opens up the opportunity to take advantage of the cost savings of reserved instances. Which typically come with substantial cost savings.

Cost savings can be achieved by enabling Kubernetes scaling function and taking advantage of the right node types. Appropriately tuning resource use is also  foundational to both effective and efficient scaling and being able to target the ideal node type. It is possible to tune application performance manually. Although, this is only realistic for very simple applications. Today’s cloud-native micro-services applications have multiple services that all have individual requirements.  

Opsani automates this tuning process and quickly finds the optimal configurations. They are then implemented by Kubernetes. The cost savings can be substantial. Our case study of Google’s Online Boutique (formerly the Hipster Shop) achieved an 80% reduction in cloud costs. The Opsani optimization also doubled performance and provided an 800% improvement in efficiency. If you’d like to see what Opsani can do for yourself, check out the free trial.