Autonomous Optimization: The Future of AIOps

[VIDEO]

Autonomous Optimization: The Future of AIOps

Watch our webinar from the AWS Startup Showcase where our Chief Commerical Officer Patrick Conte explains why manual tuning is no longer efficient in the world of cloud computing. Applications have trillions of configurations that need to be finely tuned for the optimal configuration. Only Opsani’s AI can do this. Opsani is the leading Continuous Optimization as a Service (COaaS) platform. Opsani’s customers typically achieve 235% performance increase,  70%  lower costs, and up to 400%  better application reliability.

Free Trial

Improving Operational Efficiency with Opsani Continuous Optimization

[VIDEO]

Improving Operational Efficiency with Opsani Continuous Optimization

Experience Opsani

Application Optimization Solutions from A to Z

[VIDEO]

Application Optimization Solutions from A to Z

In this speaking session noted product industry expert and startup founder Amir Sharif will explain everything you need to know on the app optimization market. Amir will give an in-depth analysis of the evolution of optimization and how it has resulted in autonomous and continuous optimization.

Experience Opsani

Metrics for Optimization: Are Network Metrics Enough?

[VIDEO]

Metrics for Optimization: Are Network Metrics Enough?

APM is the most common tool for optimizing applications but often requires deep application integration to get the best optimization results. However, application performance can also use “black box” models and external characteristics from network transactions to capture a set of metrics that can also optimize your applications. In understanding a metrics value, we’ll first investigate the business goals of application optimization. We’ll follow up with a review of the advantages and disadvantages of using network metrics, and the systems changes required for both APM and network derived metrics collection. Check out our blog series about metrics for optimization, starting with the first blog What is Optimization?

Experience Opsani

Opsani Provides Reliability and Performance to Enhance Healthcare Cloud Applications

[VIDEO]

Opsani Provides Cloud Reliability and Performance to Enhance Healthcare Cloud Application

Watch our webinar where our CCO Patrick Conte and Director of Engineering, Lorne Boden discuss how It seems like the country is getting back on its feet, with millions of COVID-19 vaccinations being distributed every day. But unfortunately, we are not in the clear yet. 2020 was an unprecedented, challenging year, but it is crucial to remember that in 2021 we are in the pandemic’s fourth wave. The healthcare industry has provided a massive dose of hope for ending the pandemic. Healthcare cloud applications are currently receiving more traffic than ever before. With the need for scheduling treatments, patient information, healthcare research, and vaccine distribution, healthcare cloud applications need to be scalable, responsive, and reliable. We will provide a free one-year subscription license to healthcare organizations involved in pandemic treatment, response, research, or distribution of medications. We already have significant interest from several healthcare organizations, and we want to help your application. For even more info check out our blog 15 minutes to Beat the Pandemic and Get Your Patients Back to Work

Free Software Request

What is Continuous Deployment and Continuous Delivery?

[ARTICLE]

What is Continuous Deployment and Continuous Delivery?

What is a CD… Continuous Deployment; Continuous Delivery

The process of Continuous delivery or deployment (CD) is typically the latter stage of a CI/CD workflow.  Overall CI/CD is a set of practices that accelerate the pace at which DevOps teams can deliver value to their users. The principal tenet of CI/CD is keeping your codebase in a functional state so that it can be shipped to production at any time. This is typically achieved by moving toward frequent but small and safe changes to the code base rather than the big, high-stakes deployments commonly seen in the past. 

When you see CI/CD, you probably understand that the continuous integration part is the development of a code artifact that is the software that will run in production.  The CD part, broadly, is the process that vets the artifact to allow your DevOps team to quickly and reliably push bug fixes, features, and configuration changes into production. The entire CI/CD process is built on pipelines – a series of automated process steps – that reliably provide the confidence that the code being vetted will perform as expected in the production environment.

Before getting into the CD pipeline process in greater detail, it is important to clarify that CD refers to continuous deployment or continuous delivery.  The distinction is that delivery requires a pause for manual approval before the code is released into production, whereas deployment automates this step. This means that if an artifact successfully passes through the CD pipeline, it is put into production without pause. Continuous deployment is the end goal, and the other CD is a milepost along the way. As most of the CD process is similar, whether pausing at the end of the pipeline or not, we will use CD in this article to encompass both modes.

The benefits of CD pipelines are:

  • Increased code velocity – New features and bug fixes can be deployed more rapidly. Although pipelines can include manual steps, as the testing process is increasingly automated, steps like the process of code testing (acceptance and integration) proceed faster.
  • Reduced costs, faster time to value. The use of CD (and CI) pipelines make it possible to validate and deploy smaller bits of code to production. This means that small improvements, bug fixes, and new features can be rolled out quickly to improve the overall user experience.  Automating pipelines further frees developers from repetitive manual processes (aka toil) to focus on greater value business goals. 
  • Reduced risk. Automation reduces the errors and uncertainty of manual processes. Validated pipelines that include appropriate tests engender confidence in the reliability of new code behaving in production.  An added tool in the CD arsenal is blue/green deployments and canary releases that add a final safety measure to deployment processes that ensure customers are not impacted by unexpected breaks caused by new code.

CD Pipelines – Release vs. Build

Creating release branches, a branch cut off the master branch to support a release, has been a traditional practice. This candidate code is isolated from any changes to master that might cause instability and allow the team managing the release to fix problems in the code to ensure a smooth deployment. This means that the code being deployed may diverge from master as bugs are fixed, or configurations are updated on the release branch. 

The modern CI/CD process works differently as at the end of the CD process, the code artifact fed into a CD pipeline is a static snapshot.  The CD pipeline is defined by a series of stages that will conduct appropriate tests to ensure confidence in its functionality and then deploy it to production.  Automating the process increases confidence by eliminating sources of human error that can be lost when using manual processes, such as inconsistent changes to configurations or loss of bug fixes.

continuous delivery

CD Pipeline Stages

Testing

When a CD process is triggered (quite possibly by a successful completion of a CI pipeline), the code artifact will pass through a series of stages in the CD pipeline. The artifact is first deployed to a test environment and then run through tests to validate the code. Development environments should be identical to production environments though they may be smaller versions. A variety of tests for basic functionality and performance under load can be used to verify the code’s behavior. At the successful completion of the testing stage, the code is versioned. It may then either automatically be deployed, or there may be a manual gate in continuous delivery.

Deployment

Once the testing stages have been passed, the versioned code is deployed.  This could be a full deployment on the entire system where all the old versions are rolled out at the same time to production. To those that are a little bit more risk-averse, especially if your production and test deployment environments differ, blue/green deployments or canary releases add the ability to ensure zero downtime deployments and simplify version rollbacks should unpredicted issues arise. 

Conclusion

Continuous deployment pipelines provide a means of testing code updates in a manner that allows immediate deployment into production environments. Progressive deployments models allow the gradual release of deployable code into production to further reduce the risk of customer impact.  Automated CD pipelines reduce the chance of introducing human error, encourage more frequent, smaller, and faster deployments of new code, leading to increased value for both customers and the company. 

After your code is successfully deployed and the production environment has changed, performance criteria may have changed. Even when testing performance in the dev deployment environment, real-world traffic/performance is still difficult to adequately test and tune. Opsani’s continuous optimization solution works to tune performance and cost of your application in production safely. It is further possible to pretune a deployment candidate in the dev environment.  To see Opsani in action for yourself, sign up for a free trial and see how your application performance can benefit from continuous optimization.


SRE Service Level Agreement Terms Explained: SLA, SLO, SLI

[ARTICLE]

SRE Service Level Agreement Terms Explained: SLA, SLO, SLI

When you sign a technology-related service contract, you probably want to be sure you know what you are getting for your money. If you are responsible for delivering on that contract, you likely want to know if you are meeting the contract obligations.  The acronyms SLA, SLO, and SLI are interrelated terms that help define exactly that. You’ve probably heard the term SLA (Service Level Agreement) as this important to both service users and providers.  The terms SLO (service level objective) and SLI (service level indicators) are more IT operational terms that are important to site reliability engineers (SREs) in helping make sure that SLAs are not being violated. Before we venture into the details, it is helpful to have an initial understanding of what the related terms mean:

Service Level Terminology

  • Metric: something that is measurable and related to a service level objective/agreement
  • Metric value: the value of a metric at a single point in time
  • Service Level Indicator (SLI): a metric and its target values (range) over a period of time
  • Service Level Objective (SLO): all SLIs representing the SLA objective
  • Service Level Agreement (SLA): legal agreement about SLO (e.g., how it is measured, notifications, service credits, etc.)

You can think of an SLA as a legal promise to users, an SLO as an objective that helps you ensure you keep that promise, and the SLI as the metric or metrics that let you and the customer know how you are performing in keeping the promise.

Service-Level Agreement (SLA)

What is a Service Level Agreement?

An SLA (service level agreement) is a generally contractual agreement between a service provider and a service user about performance metrics, e.g., uptime, latency, capacity. Because these agreements tend to be contractual between a company and client, they are generally written up by a company’s business and legal team. However, the SRE team initially defines them.  An SLA typically includes both the SLA metrics and the business consequences of failing to meet the SLA.  These might include refunds, service credits, or similar penalties to the service provider.

The SLA is considered the overall service agreement related to a system’s reliability or availability. While singular, the SLA should derive from a rationally defined SLO (which may differ from the SLA), and that is typically an aggregate of multiple individual metrics (the SLIs). 

Challenges when using SLAs

SLAs can be a big challenge to create and implement correctly. SLAs are at times written by people that are not involved in building and running the technology that the SLA is meant to define.  Failing to critically create SLAs in a way that clearly defines service expectations, defines the associated metrics, and is clear on consequences can create a promise that is difficult or impossible to keep. 

Ensuring that the legal and business development teams are including the IT and DevOps teams will greatly increase the chance that you will create a functional SLA. An SLA should not be a wish coming from either the business or the client. An SLA should be grounded in the real world with expectations set on the reality of what a system can realistically support and provide.

It is important to consider the effects of client side delays when defining SLAs. If a client inadvertently causes a situation that impacts performance, you don’t want to be in a situation where this causes the SLA to be broken. 

Do you need an SLA?

If you are not providing a free service, and SLA is generally not provided.  On the other hand, paying customers generally expect SLAs as they provide a guarantee of the level of service and the consequences, such as compensation, if the guarantee is not met.

Service Level Objectives (SLO)

What is an SLO?

A SLOs (service level objective) is the aggregate of a set of metrics like uptime or response time that are used to evaluate system performance. So, if the SLA is the formal agreement between you and your customer, the SLO is what sets a customer’s expectations and helps IT, and DevOps teams understand what goals they need to hit and measure themselves against.

Challenges when using SLOs

While SLOs are similar to SLAs, they aren’t typically written by the legal team and will benefit from being written simply and clearly. The metrics that define SLOs should be limited to those that truly define performance measures.  Also, consider the potential for client-side impacts on service when writing these, as this helps translate your SLO requirements over to the SLA.

When defining an SLO, the SLI value(s) chosen should be those that define the lowest acceptable level of reliability possible.  While this may seem initially counterintuitive, greater reliability incurs greater cost, so acceptable service should still keep the customer happy without requiring the additional work to provide increased performance that may not even be noticed. In SRE, the tradeoff between increased reliability is not only increased cost, but also slowed development as changes caused by the internal development process can impact SLOs as well. 

It is worth considering that it is common to have two SLOs per service. One that is ‘customer-facing’ and used to derive the SLA and a stricter internal SLO.  The internal SLO may include more metrics or have a lower availability value than the one used for the SLA. The difference between the customer-facing 99.9% SLO and the 99.95% SLO is an error budget in SRE terms. The value in doing this is that if the internal SLO is violated, there is still room to take action to avoid violating the customer-facing SLO. 

Do you need an SLO?

Unlike SLAs, which provide value for paying customers, SLOs can be used for free accounts, and if managing software systems for your own company, they can be used for internal management programs. Creating SLOs for internal databases, networks, and the like helps set expectations and measures for internal systems’ performance, just like those that are customer-facing.

Service Level Indicator (SLI)

What is an SLI?

An SLI (service level indicator) is the metric that allows both a provider and customer to measure compliance with an SLO (service level objective).  Rather than a single metric value, these are typically an aggregate of values over time. In any case, the SLI must meet or exceed the SLO and SLA cutoffs. If your SLA requires 99.9% uptime, your SLO is likely also 99.9%, and your SLI must be greater (e.g., 99.96% uptime). If you have an internal SLO, you might set it slightly higher, perhaps 99.95%. 

While SLIs can include any metric deemed relevant, typical SLIs tend to focus on the RED metrics:

  • Rate (e.g., request rate, throughput, transaction rate)
  • Error rate
  • Duration (e.g., response time, latency)

Challenges when using SLIs

In a world where creating metrics can be as simple as a few clicks with a mouse, it is critical to lean towards simplicity when defining SLIs. Think of them as capturing key performance indicators (KPIs) rather than all possible performance indicators.  First, decide if the metric under consideration matters to the client. If not, don’t use it to create SLO/SLA.  Secondarily decide if it will help with improving your internal SLO if you have one.  It may, but if it is not needed, it is better to exclude the metric for your SLOs.

Do you need SLIs?

Good SLIs let you measure how reliable your service is. Having appropriate SLIs provides value whether you are using an SLA or just an SLO.  By now, you should understand that the SLA/SLO defines an acceptable level of performance.  The SLIs are how you can evaluate that performance against the SLO/SLA standard to inform your operations team and your customers. 

Conclusion

The use of SLAs, SLOs, and SLIs clearly define expectations for system reliability for both your customers and SRE teams. Well written SLAs and SLOs are derived from customer needs and appropriate SLIs that are used to verify that those needs are being met. Defining an error budget with a stricter, internal SLO can help focus SREs on improving overall system performance on addressing reliability issues.  

Opsani uses SLOs to guide the automated and continuous improvement of application performance and cost reduction. If you have SLOs and SLIs defined, you can load them into Opsani directly.  For systems that don’t have SLOs defined, Opsani can recommend an appropriate SLO and then update the SLO as the system is optimized.  If you’d like to see Opsani in action for yourself, our free trial allows you to run continuous optimization on the application of your choice. Sign up for a free trial here.


How to Monitor Kubernetes with Prometheus

[ARTICLE]

Monitoring Kubernetes with Prometheus

Kubernetes and Prometheus were the first and second open-source projects brought onboard to the then newly minted Cloud Native Computing Foundation (CNCF).  Both systems were designed as cloud-native tools from the start. While Kubernetes was designed to manage running microservices based applications at scale,  Prometheus was specifically designed to monitor and provide alerts for such systems. 

The Kubernetes monitoring challenge

While being an incredibly powerful and performant tool to manage container-based applications reliably, Kubernetes is also a complex tool with multiple components. A Kubernetes cluster involves multiple servers that can span private and public cloud services. Kubernetes is also frequently deployed with additional services that provide performance enhancements. Unlike troubleshooting a single application on a single server, there is a good chance that there are multiple logs and services that need to be looked at when troubleshooting Kubernetes. 

What does it take to monitor a Kubernetes Cluster?

 Before we consider why Prometheus meets the monitoring tool requirements for K8s, let’s consider what needs to be monitored.  The master or controller node is the command center for the cluster and maintains the configuration database (etcd), an API server, and scheduler.  Each worker node will have a node agent that communicates with the master node via the API server, a proxy service, and a container runtime. There are also many K8s addons that extend Kubernetes functionality, with networking function and policy being the most popular category by far. Below you can see the Kubernetes-specific components involved in a typical Kubernetes cluster to highlight some of the complexity a monitoring tool needs to be capable of collecting data from.

Why use Prometheus to monitor Kubernetes

Let’s first consider some of the features you get when you install Prometheus.

  • time series data that are identified by metric name and key/value pairs 
  • a multi-dimensional data model
  • pull model for time series collection over HTTP
  • time series pushing is also supported
  • Monitoring targets can be statically configured or automatically discovered (service discovery)
  • PromQL, the powerful and flexible Prometheus query language for exploring your monitoring data

These core functions should convince you that Prometheus is a powerful and high-performing monitoring tool.

Service Discovery and Pull-based monitoring

One of the powers of Kubernetes and challenges for any monitoring system is that K8s can spin up new Pods to meet demand or replace failed Pods automatically. At any given moment, it is difficult (and also unnecessary) to know exactly where a set of Pods comprising a Service are running. Prometheus provides service discovery that automatically finds the new Pods and automatically starts to pull metrics from those pods. This pull-based model of metric collection and service discovery matches very well to dynamic cloud environments’ demands.

Labels

The use of Labels, which are simply key-value pair designations, is a concept shared by both Kubernetes and Prometheus.  In Kubernetes, labels can designate services, releases, customers, environments (prod vs. dev), and much more.  While you can create labels in Prometheus using PromQL, the query language can natively use the labels defined in your Kubernetes environment as well. Labels can then be used to select the time series of interest and match labels to further aggregate metrics. 

Exporters and Kubernetes Pods

Prometheus natively instruments Kubernetes components, but there are times when you need to monitor a system that is not natively integrated with Prometheus (e.g., Postgres).   In this case, you can co-deploy an exporter in a Pod that runs alongside your service. The role of the exporter is to translate the service translates metrics into one consumable by Prometheus.

Deciding which metrics to monitor

You could decide to, and Prometheus could handle, instrumenting everything. However, it is possible for the metrics storage to become a limiting factor. The Kubernetes community generally agrees that there are four principal types of metrics that should be monitored: Pods/Deployments, Node Resources (disk I/ORunning pods and their deployments), container-native metrics, and app metrics.  Several frameworks (e.g., USE and RED) can be used to decide about additional metrics to include.

Conclusion

From an operator’s or SRE’s perspective, a monitoring tool needs to be able to collect metrics from a complex and changing system and should not be difficult to manage. Prometheus addresses both challenges.  Because of the deep and native integration between Kubernetes and Prometheus, it is remarkably easy to get up and running. It is also easy to get metrics on high-level constructs such as Services and node resources, and it is also easy to zoom in close to look at Pod, Container, and application metrics. Together, Kubernetes and Prometheus give you the data needed to ensure that overall system function is acceptable, e.g., tracking SLOs.  The combination also allows operators to identify resource bottlenecks and use that information to improve overall application performance. For many looking for a Kubernetes monitoring solution, Prometheus is an easy first step, and for many, thanks to its simplicity and power, it is the last step.


Continuous Optimization In Your Continuous Integration Pipeline

[ARTICLE]

The Value of Continuous Optimization In Your Continuous Integration Pipeline

What is Continuous Integration (CI)?

Continuous integration (CI) is the process of taking the software code that you are developing and validating that any new code works as expected and works together with existing code. Potentially, this could be part of an end-to-end multicomponent system. As a developer or team leader, you usually own only a piece of the puzzle and need to fit it into a larger continuous integration environment. 

In the continuous integration space, one would normally want to validate that your application performs at a certain level after the addition of new code. Performance optimization is usually a part of that, although most people don’t implement it. With several customers we’ve spoken with, we’ve seen that while they may talk about performance optimization and may even have a performance test in their CI process, performance really only becomes a concern when they go into production. The problem is that the typical dev environment is not a true reflection of the production environment. As a result, most customers’ performance tests are not an accurate representation of their system or how their system works; this is a problem that makes it challenging to do performance testing other than in production!

Opsani’s continuous optimization platform is designed to safely run continuous optimization in a production environment with live production traffic.

Optimizing against your actual application environment is ideal as you can optimize not only for peak traffic but continuously across time and varying traffic levels. In the CI arena, as long as there is some prototype of a performance test application, the same concepts can be applied, even though they won’t necessarily indicate “proper” optimization as one would get from real traffic.  Current standard CI proves that your application works, but what you haven’t proven is how your customers will respond when you release it. In that case, the continuous optimization process can at least ensure that the system behaves in a similar fashion for a given workload.  And by applying both processes, the production optimization can support scaling of the application environment for the most efficient use of resources, while the CI optimization validates that the application hasn’t accidentally introduced working but dramatically slower code.

The optimization cycle that most people think about is optimizing only when they start having customer problems.

We find it best to optimize continually, in production at a minimum, and in Continuous Integration cycles as well. Combined, Opsani can support your need to find the best solution, most cost-effective solution, and most performant solution that meets your business service level objective (SLO). We want to bring the benefit of application optimization early into the development cycle as possible, in production against real-world application conditions to guide your future development decisions, and in the development cycle, as a gate to the integration release process. With the gated integration optimization approach, the nice thing is you can put these two things together as part of the integration release cycle. Now that we are adding a performance optimization phase to the development cycle, we can do performance optimization for the application before deployment. We determine the most performant result and the most performant configuration given a known load profile for this application. This is usually the most significant gap for any continuous integration. It’s just like having enough tests to cover your code. You do have to build new tests that accurately cover your end-to-end process across the enterprise. 

Step 1: For CI, make sure we have tests for our codebase for correct behavior and performant behavior.

Step 2: For optimization, make sure we have an application performance optimization test and understanding of our customer SLO 

This performance optimization in continuous optimization does not have to 100% accurately reflect the production optimization. We are trying to use this to first get into the right scope for how your application should run, and more importantly, we are looking at it as a way to make sure your application development hasn’t gone off the rails.

In standard continuous integration, we have tests to validate our application does what it says it does on the tick; for example, you received an input, you got the right output. This is the principal approach to continuous integration. The secondary factor that is a benefit from continuous optimization is we ensure that we haven’t now reduced the ability to meet our customer’s SLO. These can be simple additional tests like make sure you don’t take more than 200 milliseconds to render the webpage. This might already be a test that the end-user has. We can now make sure that not only do you hit that SLO, but that you exceed that SLO.

This is all done in the integration phase; this isn’t even when the app gets to production. Implementing this only requires a basic performance test to be created as a part of the testing structure. This doesn’t mean it has to be complete, but it should be somewhat similar to the actual application load. As much as you can, you want to get real benefit out of that performance optimization. But even if you only get a few metrics in the test system, some basic log-in, log-out type capabilities, API calls, somethings along that line that are consistent and repeatable and that you can use as a sample of how your application is running. We can optimize sufficiently against that. 

The goal here is not necessarily to optimize for performance; the goal is to ensure that the minimum service level objective is achieved as a part of your integration process.

We have specifically seen this with customers where a poorly configured piece of code doubled the company’s response time. Their performance tests were not particularly accurate reflections of their actual workload. Still, they were able to find this level of degradation in their application even in a release candidate. This code wasn’t even some work-in-progress; this was code that the engineering team felt was adequate to be released to their customers. So, integrating the same toolset that you would use in production optimization can be powerful at the continuous integration stage. Essentially, you do this just as well in pre-production as you can do it in production.

Once you release your tuned and tested application into the wild, the real benefit of optimization is when you are doing it against live traffic.

 You are looking at how the application behaves on a second by second basis and tuning as quickly as possible. Many services that look at CPU metrics don’t look at the application itself. No one fully understands what the application is doing, so you want to look at the system—the real-time metrics against your application. You need to consider the response time on a per transaction basis or how many errors are being generated that are not appropriate to your application. You need to make sure you’re meeting your service level objectives. Initially, tuning those same settings in the release setting ensures you don’t exceed your service level objectives when you release it into production. 

So, while it is best to do live optimization on your production system, adding an optimization component to your continuous integration pipelines can provide substantial benefits. Running optimization against a realistic load in dev can determine if your new code should not exceed your SLOs and will keep your customers happy.  Your tested and optimized application should run smoothly upon deployment, and it is then possible to reap additional performance and cost optimization benefits by further optimizing your application in production against actual load conditions. 


Migrating Your Cloud Application from Docker Compose to Kubernetes

[ARTICLE]

Why Migrate to Kubernetes from Docker Compose?

Is migrating your orchestration tool from Docker Compose to Kubernetes a must? It is not, but if you have been running applications with Compose for a while, you may be starting to discover limitations that the simpler Compose model can’t handle. 

If you have never used Docker Compose, it’s a framework that provides an application and environmental definition for your containerized application in a single YAML file. This file will define the container images required, dependencies, networking specifications, and so on.

Docker Compose still does have some advantages over Kubernetes, but these are primarily due to its simplicity. The learning curve is not as steep. It makes deploying microservice applications easy. A single YAML file and one command can correctly configure your cloud environment and deploy your containers. 

This simplicity is also the thing that limits Docker Compose. Compose runs on a single host or cluster. Unlike Kubernetes, multi-cluster or even multi-cloud deployments are not an option. This should also tell you that scaling Compose also has limitations.  If you are using Kubernetes on a public cloud service like AWS, Microsoft Azure, or GCP, you can take advantage of a wide range of Kubernetes integrations that are not available to Compose users. 

Another issue with Docker Compose application is that the server running your application is a single point of failure. In a rather un-cloudlike manner, it must be kept running to keep the application running. Kubernetes is typically run with multiple nodes (servers) and can distribute and maintain multiple instances of a microservice across nodes.  This means that if one node fails, Kubernetes can assure continuity by deploying additional instances to any of the remaining nodes.

Still, I should say that if you are happy running Compose, then carry on; there is nothing inherently wrong with the tool. Just a couple of years ago, having your application development and management firmly embedded in the Docker ecosystem looked like a safe and sensible decision.  The company had truly revolutionized how applications were being run, and Docker containers were the new kid on the block that everyone wanted to get to know. Unfortunately, Docker’s domination of all things container started to fade with the arrival of Kubernetes, and as Docker’s developmental velocity slowed, Kubernetes became the leader in orchestration tools.

Migrating from Docker Compose to Kubernetes

The way that Compose and Kubernetes work are understandably different, and it might seem that this would make migrating from one to the other an ordeal.  Surprisingly, the oh so cleverly named Kompose tool makes this process remarkably easy. If, for example, you had a three-tier microservice application defined in your single docker-compose.yaml, Kompose would split that into .yaml files for each service to provide the Deployment and Service definitions that Kubernetes would require.

One of the most common concerns when migrating to Kubernetes from Docker Compose is the very different approach to networking.  Because Compose can create a single local network on the single host machine running your application, any container within the local network could connect to, for example, the local mariaDB instance with a hostname mariadb and port assignment of 8080 at http://mariadb:8080. Anything external to the local network would need to know the MariaDB container’s IP address.

Because Kubernetes typically runs on multiple nodes, networking functions differently. In Kubernetes, the Service allows communication between containers on the same or different nodes in a cluster.  Continuing with the example above, the MariaDB container definition would be converted into a Deployment and a Service. The Deployment defines the environmental considerations of deploying MariaDB to the cluster, and the Service is what Kubernetes uses to enable inter-container communication across the cluster. Even though the structure of the .yaml file differs, Kompose can define the service as mariadb and assign port 8080 so that a MariaDB container can still be reached at http://mariadb:8080. As a result, no networking changes should be needed when migrating from Docker Compose to Kubernetes.

Using Kompose to Migrate Your Docker Compose File

Once you’ve downloaded Kompose and navigated to your docker-compose.yml directory, running kompose convert and then kubectl apply -f {output_file} to convert the single docker-compose.yml to several Kubernetes configuration files. Running kubectl apply -f {config} will then launch your newly transformed application on your Kubernetes cluster.

While the few simple steps just described should get your Docker Compose application running on a Kubernetes cluster, life is rarely simple. Depending on the configuration and complexity of your application, additional troubleshooting may be necessary.  

Now that you are running on Kubernetes, it is good to check that access management and secrets continue to function correctly.  Even though Kompose will have enabled network communications, more performant networking solutions are available in Kubernetes, which may be a time to consider implementing them.

In some cases, developers will actually continue to use the single docker-compose.yaml to provide their application specification and use Kompose to translate the configuration over to a production environment. If your Kompose transformation was a one-way trip, it might be time to incorporate the new Kubernetes .yaml configuration files into a CI/CD process.

Conclusion

While the Docker ecosystem integration or simplicity of orchestrating your container app may have brought you to Docker Compose, transitioning over to Kubernetes will provide you with a much more performant orchestration engine. Your application will experience the benefits of being easier to manage, more scalable, more resilient, and much more.