What is Continuous Deployment and Continuous Delivery?


What is Continuous Deployment and Continuous Delivery?

What is a CD… Continuous Deployment; Continuous Delivery

The process of Continuous delivery or deployment (CD) is typically the latter stage of a CI/CD workflow.  Overall CI/CD is a set of practices that accelerate the pace at which DevOps teams can deliver value to their users. The principal tenet of CI/CD is keeping your codebase in a functional state so that it can be shipped to production at any time. This is typically achieved by moving toward frequent but small and safe changes to the code base rather than the big, high-stakes deployments commonly seen in the past. 

When you see CI/CD, you probably understand that the continuous integration part is the development of a code artifact that is the software that will run in production.  The CD part, broadly, is the process that vets the artifact to allow your DevOps team to quickly and reliably push bug fixes, features, and configuration changes into production. The entire CI/CD process is built on pipelines – a series of automated process steps – that reliably provide the confidence that the code being vetted will perform as expected in the production environment.

Before getting into the CD pipeline process in greater detail, it is important to clarify that CD refers to continuous deployment or continuous delivery.  The distinction is that delivery requires a pause for manual approval before the code is released into production, whereas deployment automates this step. This means that if an artifact successfully passes through the CD pipeline, it is put into production without pause. Continuous deployment is the end goal, and the other CD is a milepost along the way. As most of the CD process is similar, whether pausing at the end of the pipeline or not, we will use CD in this article to encompass both modes.

The benefits of CD pipelines are:

  • Increased code velocity – New features and bug fixes can be deployed more rapidly. Although pipelines can include manual steps, as the testing process is increasingly automated, steps like the process of code testing (acceptance and integration) proceed faster.
  • Reduced costs, faster time to value. The use of CD (and CI) pipelines make it possible to validate and deploy smaller bits of code to production. This means that small improvements, bug fixes, and new features can be rolled out quickly to improve the overall user experience.  Automating pipelines further frees developers from repetitive manual processes (aka toil) to focus on greater value business goals. 
  • Reduced risk. Automation reduces the errors and uncertainty of manual processes. Validated pipelines that include appropriate tests engender confidence in the reliability of new code behaving in production.  An added tool in the CD arsenal is blue/green deployments and canary releases that add a final safety measure to deployment processes that ensure customers are not impacted by unexpected breaks caused by new code.

CD Pipelines – Release vs. Build

Creating release branches, a branch cut off the master branch to support a release, has been a traditional practice. This candidate code is isolated from any changes to master that might cause instability and allow the team managing the release to fix problems in the code to ensure a smooth deployment. This means that the code being deployed may diverge from master as bugs are fixed, or configurations are updated on the release branch. 

The modern CI/CD process works differently as at the end of the CD process, the code artifact fed into a CD pipeline is a static snapshot.  The CD pipeline is defined by a series of stages that will conduct appropriate tests to ensure confidence in its functionality and then deploy it to production.  Automating the process increases confidence by eliminating sources of human error that can be lost when using manual processes, such as inconsistent changes to configurations or loss of bug fixes.

continuous delivery

CD Pipeline Stages


When a CD process is triggered (quite possibly by a successful completion of a CI pipeline), the code artifact will pass through a series of stages in the CD pipeline. The artifact is first deployed to a test environment and then run through tests to validate the code. Development environments should be identical to production environments though they may be smaller versions. A variety of tests for basic functionality and performance under load can be used to verify the code’s behavior. At the successful completion of the testing stage, the code is versioned. It may then either automatically be deployed, or there may be a manual gate in continuous delivery.


Once the testing stages have been passed, the versioned code is deployed.  This could be a full deployment on the entire system where all the old versions are rolled out at the same time to production. To those that are a little bit more risk-averse, especially if your production and test deployment environments differ, blue/green deployments or canary releases add the ability to ensure zero downtime deployments and simplify version rollbacks should unpredicted issues arise. 


Continuous deployment pipelines provide a means of testing code updates in a manner that allows immediate deployment into production environments. Progressive deployments models allow the gradual release of deployable code into production to further reduce the risk of customer impact.  Automated CD pipelines reduce the chance of introducing human error, encourage more frequent, smaller, and faster deployments of new code, leading to increased value for both customers and the company. 

After your code is successfully deployed and the production environment has changed, performance criteria may have changed. Even when testing performance in the dev deployment environment, real-world traffic/performance is still difficult to adequately test and tune. Opsani’s continuous optimization solution works to tune performance and cost of your application in production safely. It is further possible to pretune a deployment candidate in the dev environment.  To see Opsani in action for yourself, sign up for a free trial and see how your application performance can benefit from continuous optimization.

SRE Service Level Agreement Terms Explained: SLA, SLO, SLI


SRE Service Level Agreement Terms Explained: SLA, SLO, SLI

When you sign a technology-related service contract, you probably want to be sure you know what you are getting for your money. If you are responsible for delivering on that contract, you likely want to know if you are meeting the contract obligations.  The acronyms SLA, SLO, and SLI are interrelated terms that help define exactly that. You’ve probably heard the term SLA (Service Level Agreement) as this important to both service users and providers.  The terms SLO (service level objective) and SLI (service level indicators) are more IT operational terms that are important to site reliability engineers (SREs) in helping make sure that SLAs are not being violated. Before we venture into the details, it is helpful to have an initial understanding of what the related terms mean:

Service Level Terminology

  • Metric: something that is measurable and related to a service level objective/agreement
  • Metric value: the value of a metric at a single point in time
  • Service Level Indicator (SLI): a metric and its target values (range) over a period of time
  • Service Level Objective (SLO): all SLIs representing the SLA objective
  • Service Level Agreement (SLA): legal agreement about SLO (e.g., how it is measured, notifications, service credits, etc.)

You can think of an SLA as a legal promise to users, an SLO as an objective that helps you ensure you keep that promise, and the SLI as the metric or metrics that let you and the customer know how you are performing in keeping the promise.

Service-Level Agreement (SLA)

What is a Service Level Agreement?

An SLA (service level agreement) is a generally contractual agreement between a service provider and a service user about performance metrics, e.g., uptime, latency, capacity. Because these agreements tend to be contractual between a company and client, they are generally written up by a company’s business and legal team. However, the SRE team initially defines them.  An SLA typically includes both the SLA metrics and the business consequences of failing to meet the SLA.  These might include refunds, service credits, or similar penalties to the service provider.

The SLA is considered the overall service agreement related to a system’s reliability or availability. While singular, the SLA should derive from a rationally defined SLO (which may differ from the SLA), and that is typically an aggregate of multiple individual metrics (the SLIs). 

Challenges when using SLAs

SLAs can be a big challenge to create and implement correctly. SLAs are at times written by people that are not involved in building and running the technology that the SLA is meant to define.  Failing to critically create SLAs in a way that clearly defines service expectations, defines the associated metrics, and is clear on consequences can create a promise that is difficult or impossible to keep. 

Ensuring that the legal and business development teams are including the IT and DevOps teams will greatly increase the chance that you will create a functional SLA. An SLA should not be a wish coming from either the business or the client. An SLA should be grounded in the real world with expectations set on the reality of what a system can realistically support and provide.

It is important to consider the effects of client side delays when defining SLAs. If a client inadvertently causes a situation that impacts performance, you don’t want to be in a situation where this causes the SLA to be broken. 

Do you need an SLA?

If you are not providing a free service, and SLA is generally not provided.  On the other hand, paying customers generally expect SLAs as they provide a guarantee of the level of service and the consequences, such as compensation, if the guarantee is not met.

Service Level Objectives (SLO)

What is an SLO?

A SLOs (service level objective) is the aggregate of a set of metrics like uptime or response time that are used to evaluate system performance. So, if the SLA is the formal agreement between you and your customer, the SLO is what sets a customer’s expectations and helps IT, and DevOps teams understand what goals they need to hit and measure themselves against.

Challenges when using SLOs

While SLOs are similar to SLAs, they aren’t typically written by the legal team and will benefit from being written simply and clearly. The metrics that define SLOs should be limited to those that truly define performance measures.  Also, consider the potential for client-side impacts on service when writing these, as this helps translate your SLO requirements over to the SLA.

When defining an SLO, the SLI value(s) chosen should be those that define the lowest acceptable level of reliability possible.  While this may seem initially counterintuitive, greater reliability incurs greater cost, so acceptable service should still keep the customer happy without requiring the additional work to provide increased performance that may not even be noticed. In SRE, the tradeoff between increased reliability is not only increased cost, but also slowed development as changes caused by the internal development process can impact SLOs as well. 

It is worth considering that it is common to have two SLOs per service. One that is ‘customer-facing’ and used to derive the SLA and a stricter internal SLO.  The internal SLO may include more metrics or have a lower availability value than the one used for the SLA. The difference between the customer-facing 99.9% SLO and the 99.95% SLO is an error budget in SRE terms. The value in doing this is that if the internal SLO is violated, there is still room to take action to avoid violating the customer-facing SLO. 

Do you need an SLO?

Unlike SLAs, which provide value for paying customers, SLOs can be used for free accounts, and if managing software systems for your own company, they can be used for internal management programs. Creating SLOs for internal databases, networks, and the like helps set expectations and measures for internal systems’ performance, just like those that are customer-facing.

Service Level Indicator (SLI)

What is an SLI?

An SLI (service level indicator) is the metric that allows both a provider and customer to measure compliance with an SLO (service level objective).  Rather than a single metric value, these are typically an aggregate of values over time. In any case, the SLI must meet or exceed the SLO and SLA cutoffs. If your SLA requires 99.9% uptime, your SLO is likely also 99.9%, and your SLI must be greater (e.g., 99.96% uptime). If you have an internal SLO, you might set it slightly higher, perhaps 99.95%. 

While SLIs can include any metric deemed relevant, typical SLIs tend to focus on the RED metrics:

  • Rate (e.g., request rate, throughput, transaction rate)
  • Error rate
  • Duration (e.g., response time, latency)

Challenges when using SLIs

In a world where creating metrics can be as simple as a few clicks with a mouse, it is critical to lean towards simplicity when defining SLIs. Think of them as capturing key performance indicators (KPIs) rather than all possible performance indicators.  First, decide if the metric under consideration matters to the client. If not, don’t use it to create SLO/SLA.  Secondarily decide if it will help with improving your internal SLO if you have one.  It may, but if it is not needed, it is better to exclude the metric for your SLOs.

Do you need SLIs?

Good SLIs let you measure how reliable your service is. Having appropriate SLIs provides value whether you are using an SLA or just an SLO.  By now, you should understand that the SLA/SLO defines an acceptable level of performance.  The SLIs are how you can evaluate that performance against the SLO/SLA standard to inform your operations team and your customers. 


The use of SLAs, SLOs, and SLIs clearly define expectations for system reliability for both your customers and SRE teams. Well written SLAs and SLOs are derived from customer needs and appropriate SLIs that are used to verify that those needs are being met. Defining an error budget with a stricter, internal SLO can help focus SREs on improving overall system performance on addressing reliability issues.  

Opsani uses SLOs to guide the automated and continuous improvement of application performance and cost reduction. If you have SLOs and SLIs defined, you can load them into Opsani directly.  For systems that don’t have SLOs defined, Opsani can recommend an appropriate SLO and then update the SLO as the system is optimized.  If you’d like to see Opsani in action for yourself, our free trial allows you to run continuous optimization on the application of your choice. Sign up for a free trial here.

How to Monitor Kubernetes with Prometheus


Monitoring Kubernetes with Prometheus

Kubernetes and Prometheus were the first and second open-source projects brought onboard to the then newly minted Cloud Native Computing Foundation (CNCF).  Both systems were designed as cloud-native tools from the start. While Kubernetes was designed to manage running microservices based applications at scale,  Prometheus was specifically designed to monitor and provide alerts for such systems. 

The Kubernetes monitoring challenge

While being an incredibly powerful and performant tool to manage container-based applications reliably, Kubernetes is also a complex tool with multiple components. A Kubernetes cluster involves multiple servers that can span private and public cloud services. Kubernetes is also frequently deployed with additional services that provide performance enhancements. Unlike troubleshooting a single application on a single server, there is a good chance that there are multiple logs and services that need to be looked at when troubleshooting Kubernetes. 

What does it take to monitor a Kubernetes Cluster?

 Before we consider why Prometheus meets the monitoring tool requirements for K8s, let’s consider what needs to be monitored.  The master or controller node is the command center for the cluster and maintains the configuration database (etcd), an API server, and scheduler.  Each worker node will have a node agent that communicates with the master node via the API server, a proxy service, and a container runtime. There are also many K8s addons that extend Kubernetes functionality, with networking function and policy being the most popular category by far. Below you can see the Kubernetes-specific components involved in a typical Kubernetes cluster to highlight some of the complexity a monitoring tool needs to be capable of collecting data from.

Why use Prometheus to monitor Kubernetes

Let’s first consider some of the features you get when you install Prometheus.

  • time series data that are identified by metric name and key/value pairs 
  • a multi-dimensional data model
  • pull model for time series collection over HTTP
  • time series pushing is also supported
  • Monitoring targets can be statically configured or automatically discovered (service discovery)
  • PromQL, the powerful and flexible Prometheus query language for exploring your monitoring data

These core functions should convince you that Prometheus is a powerful and high-performing monitoring tool.

Service Discovery and Pull-based monitoring

One of the powers of Kubernetes and challenges for any monitoring system is that K8s can spin up new Pods to meet demand or replace failed Pods automatically. At any given moment, it is difficult (and also unnecessary) to know exactly where a set of Pods comprising a Service are running. Prometheus provides service discovery that automatically finds the new Pods and automatically starts to pull metrics from those pods. This pull-based model of metric collection and service discovery matches very well to dynamic cloud environments’ demands.


The use of Labels, which are simply key-value pair designations, is a concept shared by both Kubernetes and Prometheus.  In Kubernetes, labels can designate services, releases, customers, environments (prod vs. dev), and much more.  While you can create labels in Prometheus using PromQL, the query language can natively use the labels defined in your Kubernetes environment as well. Labels can then be used to select the time series of interest and match labels to further aggregate metrics. 

Exporters and Kubernetes Pods

Prometheus natively instruments Kubernetes components, but there are times when you need to monitor a system that is not natively integrated with Prometheus (e.g., Postgres).   In this case, you can co-deploy an exporter in a Pod that runs alongside your service. The role of the exporter is to translate the service translates metrics into one consumable by Prometheus.

Deciding which metrics to monitor

You could decide to, and Prometheus could handle, instrumenting everything. However, it is possible for the metrics storage to become a limiting factor. The Kubernetes community generally agrees that there are four principal types of metrics that should be monitored: Pods/Deployments, Node Resources (disk I/ORunning pods and their deployments), container-native metrics, and app metrics.  Several frameworks (e.g., USE and RED) can be used to decide about additional metrics to include.


From an operator’s or SRE’s perspective, a monitoring tool needs to be able to collect metrics from a complex and changing system and should not be difficult to manage. Prometheus addresses both challenges.  Because of the deep and native integration between Kubernetes and Prometheus, it is remarkably easy to get up and running. It is also easy to get metrics on high-level constructs such as Services and node resources, and it is also easy to zoom in close to look at Pod, Container, and application metrics. Together, Kubernetes and Prometheus give you the data needed to ensure that overall system function is acceptable, e.g., tracking SLOs.  The combination also allows operators to identify resource bottlenecks and use that information to improve overall application performance. For many looking for a Kubernetes monitoring solution, Prometheus is an easy first step, and for many, thanks to its simplicity and power, it is the last step.

Continuous Optimization In Your Continuous Integration Pipeline


The Value of Continuous Optimization In Your Continuous Integration Pipeline

What is Continuous Integration (CI)?

Continuous integration (CI) is the process of taking the software code that you are developing and validating that any new code works as expected and works together with existing code. Potentially, this could be part of an end-to-end multicomponent system. As a developer or team leader, you usually own only a piece of the puzzle and need to fit it into a larger continuous integration environment. 

In the continuous integration space, one would normally want to validate that your application performs at a certain level after the addition of new code. Performance optimization is usually a part of that, although most people don’t implement it. With several customers we’ve spoken with, we’ve seen that while they may talk about performance optimization and may even have a performance test in their CI process, performance really only becomes a concern when they go into production. The problem is that the typical dev environment is not a true reflection of the production environment. As a result, most customers’ performance tests are not an accurate representation of their system or how their system works; this is a problem that makes it challenging to do performance testing other than in production!

Opsani’s continuous optimization platform is designed to safely run continuous optimization in a production environment with live production traffic.

Optimizing against your actual application environment is ideal as you can optimize not only for peak traffic but continuously across time and varying traffic levels. In the CI arena, as long as there is some prototype of a performance test application, the same concepts can be applied, even though they won’t necessarily indicate “proper” optimization as one would get from real traffic.  Current standard CI proves that your application works, but what you haven’t proven is how your customers will respond when you release it. In that case, the continuous optimization process can at least ensure that the system behaves in a similar fashion for a given workload.  And by applying both processes, the production optimization can support scaling of the application environment for the most efficient use of resources, while the CI optimization validates that the application hasn’t accidentally introduced working but dramatically slower code.

The optimization cycle that most people think about is optimizing only when they start having customer problems.

We find it best to optimize continually, in production at a minimum, and in Continuous Integration cycles as well. Combined, Opsani can support your need to find the best solution, most cost-effective solution, and most performant solution that meets your business service level objective (SLO). We want to bring the benefit of application optimization early into the development cycle as possible, in production against real-world application conditions to guide your future development decisions, and in the development cycle, as a gate to the integration release process. With the gated integration optimization approach, the nice thing is you can put these two things together as part of the integration release cycle. Now that we are adding a performance optimization phase to the development cycle, we can do performance optimization for the application before deployment. We determine the most performant result and the most performant configuration given a known load profile for this application. This is usually the most significant gap for any continuous integration. It’s just like having enough tests to cover your code. You do have to build new tests that accurately cover your end-to-end process across the enterprise. 

Step 1: For CI, make sure we have tests for our codebase for correct behavior and performant behavior.

Step 2: For optimization, make sure we have an application performance optimization test and understanding of our customer SLO 

This performance optimization in continuous optimization does not have to 100% accurately reflect the production optimization. We are trying to use this to first get into the right scope for how your application should run, and more importantly, we are looking at it as a way to make sure your application development hasn’t gone off the rails.

In standard continuous integration, we have tests to validate our application does what it says it does on the tick; for example, you received an input, you got the right output. This is the principal approach to continuous integration. The secondary factor that is a benefit from continuous optimization is we ensure that we haven’t now reduced the ability to meet our customer’s SLO. These can be simple additional tests like make sure you don’t take more than 200 milliseconds to render the webpage. This might already be a test that the end-user has. We can now make sure that not only do you hit that SLO, but that you exceed that SLO.

This is all done in the integration phase; this isn’t even when the app gets to production. Implementing this only requires a basic performance test to be created as a part of the testing structure. This doesn’t mean it has to be complete, but it should be somewhat similar to the actual application load. As much as you can, you want to get real benefit out of that performance optimization. But even if you only get a few metrics in the test system, some basic log-in, log-out type capabilities, API calls, somethings along that line that are consistent and repeatable and that you can use as a sample of how your application is running. We can optimize sufficiently against that. 

The goal here is not necessarily to optimize for performance; the goal is to ensure that the minimum service level objective is achieved as a part of your integration process.

We have specifically seen this with customers where a poorly configured piece of code doubled the company’s response time. Their performance tests were not particularly accurate reflections of their actual workload. Still, they were able to find this level of degradation in their application even in a release candidate. This code wasn’t even some work-in-progress; this was code that the engineering team felt was adequate to be released to their customers. So, integrating the same toolset that you would use in production optimization can be powerful at the continuous integration stage. Essentially, you do this just as well in pre-production as you can do it in production.

Once you release your tuned and tested application into the wild, the real benefit of optimization is when you are doing it against live traffic.

 You are looking at how the application behaves on a second by second basis and tuning as quickly as possible. Many services that look at CPU metrics don’t look at the application itself. No one fully understands what the application is doing, so you want to look at the system—the real-time metrics against your application. You need to consider the response time on a per transaction basis or how many errors are being generated that are not appropriate to your application. You need to make sure you’re meeting your service level objectives. Initially, tuning those same settings in the release setting ensures you don’t exceed your service level objectives when you release it into production. 

So, while it is best to do live optimization on your production system, adding an optimization component to your continuous integration pipelines can provide substantial benefits. Running optimization against a realistic load in dev can determine if your new code should not exceed your SLOs and will keep your customers happy.  Your tested and optimized application should run smoothly upon deployment, and it is then possible to reap additional performance and cost optimization benefits by further optimizing your application in production against actual load conditions. 

Migrating Your Cloud Application from Docker Compose to Kubernetes


Why Migrate to Kubernetes from Docker Compose?

Is migrating your orchestration tool from Docker Compose to Kubernetes a must? It is not, but if you have been running applications with Compose for a while, you may be starting to discover limitations that the simpler Compose model can’t handle. 

If you have never used Docker Compose, it’s a framework that provides an application and environmental definition for your containerized application in a single YAML file. This file will define the container images required, dependencies, networking specifications, and so on.

Docker Compose still does have some advantages over Kubernetes, but these are primarily due to its simplicity. The learning curve is not as steep. It makes deploying microservice applications easy. A single YAML file and one command can correctly configure your cloud environment and deploy your containers. 

This simplicity is also the thing that limits Docker Compose. Compose runs on a single host or cluster. Unlike Kubernetes, multi-cluster or even multi-cloud deployments are not an option. This should also tell you that scaling Compose also has limitations.  If you are using Kubernetes on a public cloud service like AWS, Microsoft Azure, or GCP, you can take advantage of a wide range of Kubernetes integrations that are not available to Compose users. 

Another issue with Docker Compose application is that the server running your application is a single point of failure. In a rather un-cloudlike manner, it must be kept running to keep the application running. Kubernetes is typically run with multiple nodes (servers) and can distribute and maintain multiple instances of a microservice across nodes.  This means that if one node fails, Kubernetes can assure continuity by deploying additional instances to any of the remaining nodes.

Still, I should say that if you are happy running Compose, then carry on; there is nothing inherently wrong with the tool. Just a couple of years ago, having your application development and management firmly embedded in the Docker ecosystem looked like a safe and sensible decision.  The company had truly revolutionized how applications were being run, and Docker containers were the new kid on the block that everyone wanted to get to know. Unfortunately, Docker’s domination of all things container started to fade with the arrival of Kubernetes, and as Docker’s developmental velocity slowed, Kubernetes became the leader in orchestration tools.

Migrating from Docker Compose to Kubernetes

The way that Compose and Kubernetes work are understandably different, and it might seem that this would make migrating from one to the other an ordeal.  Surprisingly, the oh so cleverly named Kompose tool makes this process remarkably easy. If, for example, you had a three-tier microservice application defined in your single docker-compose.yaml, Kompose would split that into .yaml files for each service to provide the Deployment and Service definitions that Kubernetes would require.

One of the most common concerns when migrating to Kubernetes from Docker Compose is the very different approach to networking.  Because Compose can create a single local network on the single host machine running your application, any container within the local network could connect to, for example, the local mariaDB instance with a hostname mariadb and port assignment of 8080 at http://mariadb:8080. Anything external to the local network would need to know the MariaDB container’s IP address.

Because Kubernetes typically runs on multiple nodes, networking functions differently. In Kubernetes, the Service allows communication between containers on the same or different nodes in a cluster.  Continuing with the example above, the MariaDB container definition would be converted into a Deployment and a Service. The Deployment defines the environmental considerations of deploying MariaDB to the cluster, and the Service is what Kubernetes uses to enable inter-container communication across the cluster. Even though the structure of the .yaml file differs, Kompose can define the service as mariadb and assign port 8080 so that a MariaDB container can still be reached at http://mariadb:8080. As a result, no networking changes should be needed when migrating from Docker Compose to Kubernetes.

Using Kompose to Migrate Your Docker Compose File

Once you’ve downloaded Kompose and navigated to your docker-compose.yml directory, running kompose convert and then kubectl apply -f {output_file} to convert the single docker-compose.yml to several Kubernetes configuration files. Running kubectl apply -f {config} will then launch your newly transformed application on your Kubernetes cluster.

While the few simple steps just described should get your Docker Compose application running on a Kubernetes cluster, life is rarely simple. Depending on the configuration and complexity of your application, additional troubleshooting may be necessary.  

Now that you are running on Kubernetes, it is good to check that access management and secrets continue to function correctly.  Even though Kompose will have enabled network communications, more performant networking solutions are available in Kubernetes, which may be a time to consider implementing them.

In some cases, developers will actually continue to use the single docker-compose.yaml to provide their application specification and use Kompose to translate the configuration over to a production environment. If your Kompose transformation was a one-way trip, it might be time to incorporate the new Kubernetes .yaml configuration files into a CI/CD process.


While the Docker ecosystem integration or simplicity of orchestrating your container app may have brought you to Docker Compose, transitioning over to Kubernetes will provide you with a much more performant orchestration engine. Your application will experience the benefits of being easier to manage, more scalable, more resilient, and much more.

Ancestry Accelerates Innovation with Opsani and Harness


Ancestry Accelerates Innovation with Opsani and Harness

Watch our webinar where our VP of Product and Marketing Amir Sharif sits down with Ravi Lachhman, an Evangelist from Harness and Russ Barnett, the Chief Architect of Ancestry. They will discuss the agile enterprise when it comes to software development, specifically CI/CD, why this differentiates companies, and what we can expect to come in the future in terms of innovation. Opsani is the leading Continuous Optimization as a Service (COaaS) platform. Harness is a CD as a service company and Ancestry is the largest provider of family history and DNA related software. We’ll discuss how Opsani accelerated Ancestry’s software development with CI/CD/CO, how Opsani helped build a pipeline Ancestry trusts, what sets Ancestry apart from other companies that don’t adopt CI/CD, and what benefits has Ancestry seen with Opsani being apart of the Harness software factory.

Request A Demo

What is a CI/CD pipeline?


What is a CI/CD pipeline?

In simplest terms, a Continuous Integration / Continuous Delivery (CI/CD) pipeline is simply a predefined series of steps that must be performed in order to deliver a new version of software for final deployment to production. By explicitly defining the steps that software development must proceed through, software delivery speed and reliability can be greatly improved.  It should not be surprising that as CI/CD process bridges the Dev and Ops sides of software delivery, that it is foundational to standard DevOps (and SRE) approaches.

While automation is not explicitly assumed by a CI/CD process, the greatest value of CI/CD pipelines is in accelerating development processes. Automating systems monitoring and testing allows rapid feedback on software state and supports rapid intervention to correct issues in the development and deployment stages. At the extreme of CI/CD automation, the “CD” actually is continuous deployment (sometimes called continuous release) where software updates that pass all required steps in a pipeline are pushed directly to production without human intervention. 

What makes up a CI/CD pipeline?

A CI/CD pipeline is typically divided into a set of discrete stages. These are generally a set of tasks that have a related function. Here are a few common pipeline stages:

  • Build – The application is compiled.
  • Test – The code is tested.  This can (and really should) include unit and integration tests. (Note that this stage can happen at the CI and CD parts of the overall pipeline.)
  • Release – The stage where the application or artifact is delivered to the repository.
  • Deploy – The artifact is deployed to production. In continuous delivery this requires manual approval, in continuous deployment, this stage is automated.

This list of pipeline stages are common, but what your pipeline will look like will be determined by the tooling and requirements of your team. There are some applications, like Tekton, Jenkins, and Travis CI that can be used to serve both the CI and CD aspects of the overall pipeline.  There are also applications like Spinnaker and Harness that focus on the CD side of things and it is not unusual to see different CI and CD solutions being used together.

Why CI/CD pipelines matter

If you are considering implementing CI/CD, we can probably agree your end goal is to successfully deliver ever improving software to your customers. Conversely, you are probably not hoping to spend time on the repetitive infrastructure that supports that goal.  The CI/CD pipeline, whether supported by a single piece of software or a collection, is intended to define a reliable and repeatable development and deployment process that keeps the focus on improving your application.  

This is not to say that your pipeline will not change once it has been developed. As initially noted at the beginning of this article, what your CI/CD pipeline looks like will depend on the needs of your team.  If you are just starting out, you may only have a CI pipeline in place.  As you grow you may add continuous delivery and then continuous deployment to the overall pipeline.  You may further add functions along the way. For example, the Jenkins ecosystem includes over 1500 plugins that integrate into the CI/CD process. Much like the software that is the focus of your work, your CI/CD pipeline can continue to improve, eliminating toil and accelerating your time to value.

Five Ways to Run Kubernetes on AWS


Five Ways to Run Kubernetes on AWS

If you have decided that Amazon Web Services (AWS) is the place you want to host your Kubernetes deployments, you have two primary AWS-native options – push the easy button and let AWS create and manage your clusters with Elastic Kubernetes Service (EKS) or roll up your sleeves and sweat the details with the self-hosted Kubernetes on EC2. In between these two levels of complexity are a number of install tools that abstract away some of the complexity of getting a Kubernetes cluster running on AWS. In this article we will look at the most popular AWS-compatible tools: Kubernetes Operations (kOps), kubeadm and kubespray.  

In this article we’ll cover the options for running Kubernetes on AWS in greater detail, provide some insight into prerequisites and provide resources to help you get up and running:

  • [easy] Creating a Kubernetes cluster with Elastic Kubernetes Service (EKS)
  • [less easy] Creating a Kubernetes cluster on AWS with kOps
  • [less easy, more control ] Creating a Kubernetes cluster on AWS with kubeadm
  • [less easy, more control, Ansible-centric ] Creating a Kubernetes cluster on AWS with kubespray
  • [hard, all the control] Manually creating a Kubernetes cluster on AWS with EC2 instances

Creating a Kubernetes Cluster on AWS with Elastic Kubernetes Service (EKS)

This is really the easy button when it comes to the options for running Kubernetes on AWS.  With this option, AWS simplifies cluster setup, creation, patches and upgrades. With EKS you get an HA system with three master nodes for each cluster across three AWS availability zones.

Although the simplest way to get a Kubernetes up and running on AWS, there are still some prerequisites:

  • An AWS account
  • An IAM role with appropriate permissions to allow Kubernetes to create new AWS resources
  • A VPC and security group for your cluster (one for each cluster is recommended)
  • kubectl installed (you may want the Amazon EKS-vended version)
  • AWS CLI installed

If you have your prerequisites in place, the following resources will guide you to getting your first EKS cluster up and running:

Creating a Kubernetes Cluster on AWS with kOps

Using Kubernetes Operations (kOps) abstracts away some of the complexity of managing Kubernetes clusters on AWS. It was specifically designed to work with AWS, and integrations with other public cloud providers are available. In addition to fully automating the installation of your k8s cluster, kOps runs everything in Auto-Scaling Groups and can support HA deployments.  It also has the capability to generate a Terraform manifest, that could be used in version control or could be used to have Terraform to actually create the cluster.

If you wish to use kOps, there are a number of prerequisites before creating and managing your first cluster:

  • have kubectl installed.
  •  install kOps on a 64-bit (AMD64 and Intel 64) device architecture.
  • setup your AWS prerequisites
  • set up DNS for the cluster, e.g. on Route53, (or, for a quickstart trial, a simpler alternative is to create a gossip-based cluster)

Once you’ve checked off the prerequisites above, you are ready to follow the instructions in one of the resources below:

Creating a Kubernetes Cluster on AWS with kubeadm

Kubeadm is a tool that is part of the official Kubernetes project.  While kubeadm is powerful enough to use with a production system, it is also an easy way to simply try getting a K8s cluster up and running. It is specifically designed to install Kubernetes on existing machines. Even though it will get your cluster up and running, you will likely still want to integrate provisioning tools like Terraform or Ansible to finish building out your infrastructure. 


  • kubeadm installed
  • one or more EC2 machines running a deb/rpm-compatible Linux OS(e.g. Ubuntu or CentOS), 2GB+ per machine and at least 2 CPUs on the master node machine.
  • full network connectivity (public or private) among all machines in the cluster. 

The following resources will help you get started with building a K8s cluster with kubeadm:

Creating a Kubernetes Cluster on AWS with kubeadm

Another installer tool that leverages Ansible playbooks to configure and manage the Kubernetes environment.  One benefit of Kubespray is the ability to support multi-cloud deployments, so you are looking to run your cluster across multiple providers or on bare metal, this may be of interest.  Kubespray actually builds on some kubeadm functionality and may be worth considering adding to your toolkit if already using kubeadm.


  • uncomment the cloud_provider option in group_vars/all.yml and set it to ‘aws’
  • IAM roles and policies for both “kubernetes-master” and “kubernetes-node”
  • tag the resources in your VPC appropriately for the aws provider
  • VPC has both DNS Hostnames support and Private DNS enabled
  • hostnames in your inventory file must be identical to internal hostnames in AWS.

The following resources will help you get your Kubernetes cluster up and running on AWS with Kubespray:

Manually Creating a Kubernetes Cluster on EC2 (aka, Kubernetes the Hard Way)

If EKS is the “easy button,” installing on EC2 instances is the opposite. If you need full flexibility and control over your Kubernetes deployment, this may be for you. If you’ve spent any time with Kubernetes, you’ve almost certainly heard of “Kubernetes the Hard Way.” While KTHW originally targeted Google Cloud Platform, AWS instructions are included in the AWS and Kubernetes section.  Running through the instructions provides a detailed, step by step process of manually setting up a cluster on EC2 servers that you have provisioned. The title, by the way, is not a misnomer and if you do run through this manual process you will reap the rewards of having a deep understanding of how Kubernetes internals work. 

If you are actually planning to use your Kubernetes on EC2 system in production, you will likely still want some level of automation, and a functional approach would be to use Terraform with Ansible. While Terraform is much more than a K8s install too, it allows you to manage your infrastructure as code by scripting tasks and managing them in version control.  There is a Kubernetes-specific Terraform module that helps to facilitate this.  Ansible complements Terraform’s infrastructure management prowess with software management functionality for scripting Kubernetes resource management tasks via the Kubernetes API server.

The following resources will help you get started with creating a self-managed Kubernetes cluster on EC2 instances:


In this article, we considered five common ways to get a Kubernetes cluster running on Amazon Web Services.  Which one you choose will really depend on how much control you need over the infrastructure you are running the cluster on and what your use case is.  If you are just trying out Kubernetes or doing a dev environment to just try something out, a quick and repeatable solution is likely preferable. In a production system, you’ll want tools that simplify administrative tasks like rolling upgrades without needing to tear down the entire system.

The tools we covered are the most popular solutions for deploying on AWS. You may have noticed that there is a degree of overlap and integration among several of the approaches, so using kOps with Terraform to then install on self-hosted EC2 instances is a possibility. Kubernetes is known for being a challenge to manage manually, and the tools we covered are under active development to simplify that process. More tools are constantly being created to address specific use cases. For example, Kubicorn is an unofficial, golang-centric K8s infrastructure management solution.  While not all of the tools listed are AWS specific, you can explore the CNCF installer list from the CNCF K8s Conformance Working Group to get a better sense of the diversity of options available.

Instrumenting Kubernetes with Envoy for Application Performance Metrics


Instrumenting Kubernetes with Envoy for Application Performance Metrics

Opsani COaaS (Continuous Optimization as a Service) optimizes runtime settings such as CPU, memory, and autoscaling as well as in application settings such as Java garbage collection time and database commit times.  Opsani performs this optimization by learning from application performance metrics (APM).

Envoy (https://www.envoyproxy.io/) is a self-contained layer 7 proxy process that is designed to run alongside an application server.  One of its proxy functions is to provide performance metrics.  In Kubernetes, Envoy allows you to instrument applications to obtain performance metrics without changing application code or disrupting your application in production.

While there are a variety of methods and tools for application performance metrics, in this step-by-step guide, we’ll walkthrough instrumenting your Kubernetes application for performance metrics with Envoy.  For this exercise, we’ll assume you have access to a fresh Kubernetes cluster (e.g. AWS EKS) and for simplicity we’ll be working in the default Kubernetes namespace.  Note: Opsani has packaged Envoy to include configurations to support Opsani application optimization.  The source code is publicly available and documented in GitHub at  https://github.com/opsani/envoy-proxy.

Deploy an Application to Monitor and Optimize

While Opsani can optimize applications throughout a variety of operating systems, clouds, programming languages, and continuous deployment platforms, we’ll use a very simple Kubernetes example.  Any server application will generally suffice, but for learning about Opsani application optimization, it’s helpful to be able to control the resources that matter, including CPU, memory, and response times. 

fiber-http (https://github.com/opsani/fiber-http) is an open source Opsani tool that lets you do just that.  fiber-http is a webserver with endpoints to control CPU and memory consumption as well as server response times.  With these controls we can simulate a loaded server in a simple, controlled manner.

For this exercise, since fiber-http is already in DockerHub, let’s create a minimal Kubernetes yaml manifest file to stand up a Kubernetes Deployment of a fiber-http container and a load balancer service for ingress traffic.  Note: If you are not 100% comfortable with editing yaml files, we suggest using an editor that will help with lining up columns of text.  VSCode is a good editor for that (https://code.visualstudio.com/).


apiVersion: apps/v1
kind: Deployment
 name: fiber-http
   app.kubernetes.io/name: fiber-http
 replicas: 1
     app.kubernetes.io/name: fiber-http
       app.kubernetes.io/name: fiber-http
     - name: fiber-http
       image: opsani/fiber-http:latest
       - name: HTTP_PORT
         value: ""
       - containerPort: 8480
           cpu: "1"
           memory: "1Gi"
           cpu: "1"
           memory: "1Gi"

apiVersion: v1
kind: Service

 name: fiber-http
   app.kubernetes.io/name: fiber-http
 #  service.beta.kubernetes.io/aws-load-balancer-internal: "true"

 type: LoadBalancer
 #externalTrafficPolicy: Cluster
 #sessionAffinity: None
   app.kubernetes.io/name: fiber-http
 - name: http
   protocol: TCP
   port: 80
   targetPort: 8480

Run this manifest in Kubernetes via:

% kubectl apply -f fiber-http-deployment.yaml

This results in a pod with a single fiber-http container, with inbound traffic brought in by the LoadBalancer Service.

fiber-http (without Envoy) inbound traffic flow:

You can start HTTP communications through the service to the pod via a web browser, but for testing and automation purposes, let’s use the curl command line tool.

First, obtain the address of the service.

% kubectl get service
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)        AGE
fiber-http   LoadBalancer   a0564378f112548d5b11cbc806d5f34e-1268639300.us-west-2.elb.amazonaws.com   80:31961/TCP   25h
kubernetes   ClusterIP      <none> 
                                                                   443/TCP        6d

Use curl to start an HTTP connection to the application.

% curl a0564378f112548d5b11cbc806d5f34e-1268639300.us-west-2.elb.amazonaws.com

move along, nothing to see here% 

Refer to the fiber-http GitHib repository for instructions on how to communicate with fiber-http to control CPU load, memory consumption, and HTTP response times.  

Instrumenting a Kubernetes Deployment with Envoy

Now it’s time to re-deploy the application with metrics instrumentation, towards the goal of autonomous optimization! 

We’ll insert Envoy as a proxy in front of the fiber-http application pod.

We’ll need to insert Envoy between the Service and the fiber-http application container.  Let’s copy the original yaml into a new file so that it’s easy to compare “before” and “after” configurations.


apiVersion: apps/v1
kind: Deployment
 name: fiber-http
   app.kubernetes.io/name: fiber-http
 replicas: 1
     app.kubernetes.io/name: fiber-http
       app.kubernetes.io/name: fiber-http
       # *** ADD FOR OPSANI ***
       # Attach a label for identifying Pods that have been augmented with
       # an Opsani Envoy sidecar.
       sidecar.opsani.com/type: "envoy"
       # *** ADD FOR OPSANI ***
       # These annotations are scraped by the Prometheus sidecar
       # running alongside the servo Pod. The port must match the
       # `METRICS_PORT` defined in the Envoy container definition
       # below. The metrics are provided by the Envoy administration
       # module. It should not be necessary to change the path or port
       # unless the proxied service happens to have a namespace collision.
       # Any divergence from the defaults will require corresponding
       # changes to the container ports, service definition, and/or the
       # Envoy proxy configuration file.
       prometheus.opsani.com/scrape: "true"
       prometheus.opsani.com/scheme: http
       prometheus.opsani.com/path: /stats/prometheus
       prometheus.opsani.com/port: "9901"
     - name: fiber-http
       image: opsani/fiber-http:latest
       - name: HTTP_PORT
         value: ""
       - containerPort: 8480
           cpu: "1"
           memory: "1Gi"
           cpu: "1"
           memory: "1Gi"
     # *** ADD FOR OPSANI ***
     # Opsani Envoy Sidecar
     # Provides metrics for consumption by the Opsani Servo
     - name: envoy
       image: opsani/envoy-proxy:latest
             cpu: 125m
             memory: 128Mi
             cpu: 250m
             memory: 256Mi
       # The container port of Pods in the target Deployment responsible for
       # handling requests. This port is equal to the original port value of
       # the Kubernetes Service prior to injection of the Envoy sidecar. This
       # port is the destination for inbound traffic that Envoy will proxy from
       # the `OPSANI_ENVOY_PROXY_SERVICE_PORT` value configured above.
         value: "8480"

       # Uncomment if the upstream is serving TLS traffic
       #   value: "true"

       # The ingress port accepting traffic from the Kubernetes Service destined
       # for Pods that are part of the target Deployment (Default: 9980).
       # The Envoy proxy listens on this port and reverse proxies traffic back
       # to `OPSANI_ENVOY_PROXIED_CONTAINER_PORT` for handling. This port must
       # be equal to the newly assigned port in the updated Kubernetes Service
       # and must be configured in the `ports` section below.
         value: "9980"

       # The port that exposes the metrics produced by Envoy while it proxies
       # traffic (Default: 9901). The corresponding entry in the `ports` stanza
       # below must match the value configured here.
         value: "9901"

       # Traffic ingress from the Service endpoint. Must match the
       # `OPSANI_ENVOY_PROXY_SERVICE_PORT` env above and the `targetPort` of
       # the Service routing traffic into the Pod.
       - containerPort: 9980
         name: service

       # Metrics port exposed by the Envoy proxy that will be scraped by the
       # Prometheus sidecar running alongside the Servo. Must match the
       # `OPSANI_ENVOY_PROXY_METRICS_PORT` env and `prometheus.opsani.com/port`
       # annotation entries above.
       - containerPort: 9901
         name: metrics

apiVersion: v1
kind: Service

 name: fiber-http
   app.kubernetes.io/name: fiber-http
 #  service.beta.kubernetes.io/aws-load-balancer-internal: "true"

 type: LoadBalancer
 #externalTrafficPolicy: Cluster
 #sessionAffinity: None
   app.kubernetes.io/name: fiber-http

 # Send ingress traffic from the service to Envoy listening on port 9980.
 # Envoy will reverse proxy back to localhost:8480 for the real service
 # to handle the request. Must match `OPSANI_ENVOY_PROXY_SERVICE_PORT` above
 # and be exposed as a `containerPort`.
 - name: http
   protocol: TCP
   port: 80
   targetPort: 9980

You can use kubectl to apply these changes – even to a live running application.


% kubectl apply -f fiber-http-envoy-deployment.yaml
deployment.apps/fiber-http configured
service/fiber-http configured

We’ve “shimmed in” the Envoy proxy just in front of our application.  

fiber-http (WITH Envoy) inbound traffic flow:


Verifying Envoy is gathering metrics from your container

You can scrape envoy via a curl by instantiating and accessing a shell in a Linux/busybox pod in the same namespace, and performing an http client command to pull metrics from Envoy.  But there’s a better way (see below).


(outside a Linux/busybox container in the same namespace):

% kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
fiber-http-6ccc567bf8-4psqz   2/2     Running   0          26h
% kubectl describe pod fiber-http-6ccc567bf8-4psqz

(obtain the IP address of the pod via “kubectl describe pod <fiber-http pod name>”, then shell into a Linux/busybox container in the same k8s namespace):

% kubectl run -i --tty --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # wget -qO-
cluster.opsani_proxied_container.Enable upstream TLS with SNI validation.total_match_count: 0
cluster.opsani_proxied_container.Enable upstream TLS with validation.total_match_count: 0
cluster.opsani_proxied_container.Enable upstream TLS.total_match_count: 0
[more metrics follow] 

Kubernetes port-forward is a powerful test tool for communications debugging

Instead of creating a linux container to access Envoy, a less system-intrusive method is via the Kubernetes port-forward functionality.  Let’s port-forward TCP port 9901 on your machine running kubectl, to port 9901 in the pod, which is the listening port for the Envoy administration interface.


kubectl port-forward pod/{pod-name-of-an-injected-pod} local-port:destination-port


% kubectl port-forward pod/fiber-http-6ccc567bf8-4psqz 9901:9901
Forwarding from -> 9901
Forwarding from [::1]:9901 -> 9901

(This will continue to run until you exit via “control C”)

Now instead of running a local container to access Envoy, we can access from our kubectl machine.

% curl http://localhost:9901/stats/prometheus
# TYPE envoy_listener_manager_listener_modified counter
envoy_listener_manager_listener_modified{} 0
# TYPE envoy_listener_manager_listener_removed counter
envoy_listener_manager_listener_removed{} 0
# TYPE envoy_listener_manager_listener_stopped counter

Look for Metrics that Matter to your Application Performance

Envoy gathers many metrics about web server and application performance.  You can use either of the above methods to dump metrics while running load against the test application (fiber-http in this tutorial).  Here are some notable sample metrics from Envoy for application performance:

  • http.ingress_http.downstream_cx_total: 722  – This is the total number of client to server connections observed by Envoy since the last flush.
  • http.ingress_http.downstream_cx_length_ms: P0(1.0,1.0) P25(1.025,1.02601) P50(1.05,1.05203) P75(1.075,1.07804) P90(1.09,1.09365) P95(1.095,1.09886) P99(1.099,3.09743) P99.5(1.0995,5.041) P99.9(1.0999,432.82) P100(1.1,440.0)
    • Each P quantile entry shows the (interval, amount) of the length in ms of the connection.
    • This sample output was obtained with a simple shell while loop to fiber-http with no parameters
      • while true; do curl <k8s service>; sleep 1; done
    • fiber-http can simulate CPU load, memory, and response times by specifying URL parameters
      • while true; do curl <k8s service>/time?duration=800ms; sleep 1; done
      • Sample output with 800ms duration: http.ingress_http.downstream_cx_length_ms: P0(800.0,1.0) P25(802.5,1.02646) P50(805.0,1.05292) P75(807.5,1.07938) P90(809.0,1.09525) P95(809.5,2.02464) P99(809.9,805.493) P99.5(809.95,807.817) P99.9(809.99,809.676) P100(810.0,1100.0)

Visit https://www.envoyproxy.io/ for a detailed description of the various Envoy metrics and processes. 

Congratulations!  You’ve instrumented a simple Kubernetes application for Envoy metrics.  With these metrics, we can understand how our application is performing under load.  In our next exercise, you’ll utilize Envoy metrics with Opsani to optimize CPU and memory limits for the best application performance, at the lowest cost.

Once you’ve become familiar with Envoy, it’s time to start considering using another tool, Prometheus, to help manage and aggregate Envoy data across multiple services and multiple instances of your services.  For an introduction to Prometheus, check out our post on What is Prometheus and Why Should You Use it?

Cloud Elasticity vs. Cloud Scalability: A Simple Explanation


Cloud Elasticity vs. Cloud Scalability: A Simple Explanation

Cloud elasticity and cloud scalability seem like terms that should be possible to use interchangeably. Indeed, ten years after the US NIST provided a clear and concise definition of the term cloud computing, it is still common to hear cloud elasticity and cloud scalability treated as equivalent. While both are important and fundamental aspects of cloud computing systems, their actual functionality is related but not the same. In the 2011 NIST cloud computing definition, cloud elasticity is listed as a fundamental characteristic of cloud computing, while scalability is not.  Yet, elasticity is not possible without scalability.  The following quote from the NIST definition’s clarification of the essential characteristic of rapid elasticity:

Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.”

This means that the ability to scale a system, which is the ability to increase or decrease resources, is required before a system can be elastic.  Elasticity is the system’s ability to take advantage of that scaling ability appropriately and rapidly to demand. So, a system can be scalable without being elastic.  However, if you are running a system that is scalable but not elastic, then you are, by definition, not running a cloud. Note that the system need not make use of this capability. It just needs to have the capability.

Scaling up and Scaling out

In the figure above, we can see the difference between scaling up and scaling out to increase a system’s resources, in this case, CPU capacity. The converse would be scaling down or scaling in when shrinking resources. The scaling up/down terminology (aka vertical scaling) refers to scaling a single resource by increasing or decreasing its capacity to perform.  As in our example, this could be the number of CPU cores – real or virtual – available in a single server. The scaling up or out concept (aka horizontal scaling), again illustrated here with CPUs, is a matter of adding replicas of resources to address demand. This could just as easily be envisioned as spinning up additional containers or VMs on a single server as the CPU example we are using.  

Again, from a cloud definition, it is not what resource is being scaled, rather understanding how resource capacity is being increased or decreased. Confusion has crept into how the insistence of some uses cloud scaling and cloud elasticity. Each somehow refers to either a characteristic specific to infrastructure or an application.  Elasticity is often artificially tied to infrastructure and scalability to applications.  If we return to the NIST definition of elasticity, it does not explicitly call out infrastructure or applications and instead refers to capabilities. These capabilities are explicitly less critical than the overall system’s ability to adjust to changing needs rapidly.

In truth, what is important to the end-user is not the means but the end. Depending on how much change in demand a system experiences, it is quite possible that adding or deleting application instances can provide the rapid elasticity needed.   The explosion in popularity of Linux containers such as Docker and Serverless/Function-as-a-Service (FaaS) solutions means that applications can be incredibly and quickly elastic without an absolute need to provision additional hardware, real or virtual.  Continued improvement and automation of how hardware is provisioned and de-provisioned – even physical hardware – make integrating the hardware and software to provide even better elasticity increasingly functional and common. 

Moving from “Cloud Scaling” to “Cloud Elasticity”

The extreme of scaling over or under compensates against the realities of production load. As an example, let’s assume we’ve joined a company that just moved a significant legacy application to the cloud.  While the engineering team has done some work to make the app cloud-friendly, such as breaking the app into containerized microservices, we’ve been tasked to optimize its performance. We’ve received some performance data, but not much, and based on the limited data, let’s assume we’ve estimated that our necessary capacity is two servers, each costing at $0.05/hour or $1.20/day or $438/year. We’ve also implemented a more robust monitoring system to provide feedback on parameters such as application performance and server utilization. Unfortunately, we find that our initial static capacity estimate results in one server sitting idle during certain times of the day, costing us $6.00 per day or $2190 per year of excess resource costs. Furthermore, we did not estimate our daily load well enough, and we consistently see outages twice a day.  This could cost us even more in lost revenue than the net cost of infrastructure through failed transactions and lost customers.  What has happened here is a case of underprovisioning resources compared to our actual demand.

In a traditional IT infrastructure, the logical step would be to increase capacity.  And as our CEO and head of engineering see performance initially more critical than cost, we look at scaling the system. You chose whether you wish to scale up or out, but the result is that we increase our capacity to three servers available to our system at all times. Now that we have scaled our system, we’ve eliminated our daily outages and, unfortunately, increased our overall system cost and substantially increased our wasted spending on idle servers to $20.40/day or $7446/year.

It just happens that our company hired a CFO that is really into FinOps, and she realizes that we are treating our infrastructure like it is a traditional IT resource, not a cloud. So we give scaling in response to changing load a try.  Knowing that most of our system’s load was covered by two servers, we scale back down (or in) to that level and set an alarm to page an engineer to scale our infrastructure to meet demand.  Unfortunately, demand spikes and drops rapidly. By the time our very competent engineer has the additional servers online, there have been outages, and it also takes a while to scale back down. 

FO is pleased we’ve cut our idle infrastructure cost in half; she still sees some cost savings that should be attainable. On top of that, our head of engineering and CEO is not pleased that we are again in a state where we are having outages and the work of manually scaling up and down in response to system changes is tedious work.  We have achieved cloud scaling, but are not yet at a point of true cloud elasticity.

Cloud Elasticity to the Rescue

Finally, our team points out that our cloud provider has several automation tools that could tie into our monitoring system and automate the necessary rapid scaling responses to truly let us achieve cloud elasticity rather than merely cloud scaling.  The outcome makes the CEO, CFO, and head of engineering happy with the entire team and further has eliminated the toil for your team of manually responding to load changes. 

Because the process is automated, the response to changing loads is appropriate and rapid, resulting in eliminating outages and idle servers. Now that things look automated and stable, the CFO points out that there are times where server capacity is not optimal, and it might be time to look at that, but that will need to wait for another post. 

Is Cloud Elasticity Required?

Now early in this article, I noted that not just elasticity, but “rapid elasticity” is required, by definition, for a cloud actually to be a cloud.  Does this mean that your system MUST be elastic? In truth, no, it just needs to have the ability to be elastic to be a cloud system.

If you are running a service tied to retail sales, and seasonal events such as Valentine’s Day, Christmas, or Black Friday/Cyber Monday spike the demands on your systems. This alone might warrant making sure your system has its cloud elasticity functionality ready to go.   If, on the other hand, you are serving business software to small companies that have predictable growth and use rates throughout the year, elasticity may be less of a concern.  Indeed the question might further be, do you need to run your system on a cloud? 

Still, the point of cloud computing can be distilled down to another one of the NIST “essential characteristics” of cloud computing – self-service, on-demand access to resources.   The uncertainty of the on-demand requirement makes cloud elasticity – and rapid elasticity at that – necessary. If your service has an outage because of insufficient resources, you’ve failed your end-users, and having elasticity working on your system is the prudent choice.

Conclusions: Cloud Scalability AND Cloud Elasticity

Hopefully, you are now clear on how your system’s ability to scale is fundamental but different from the ability to quickly respond – be elastic – to the demand on resources.  Being able to scale has no implications about how fast your system responds to changing demands. Being elastic, especially in the context of cloud computing, requires that the scaling occur rapidly in response to changing demands.  A system that exhibits true cloud elasticity will need to have scalability and will likely be automated to avoid the toil of manual action and to take advantage of the responsiveness provided by computer-aided processes.