[Installation Guide]

Introduction

Opsani is your optimization copilot. Opsani will show you how to reduce your infrastructure spending and amplify your application performance. This resource will guide you through connecting your application to an Opsani Optimizer backend and getting started with cloud native workload optimization on Kubernetes.

Opsani optimizes your application by performing experiments that apply configurations to your environment and evaluating the impact that they have on cost and performance.

Experiments are run by creating a tuning Pod, based off of a Kubernetes Deployment object of your choice, joining it to the Kubernetes Service that is distributing traffic to the Pods in your Deployment, and evaluating its cost and performance against its sibling Pods.

Optimization is orchestrated by a single Pod called the Servo, which handles all interaction between your application and the Opsani Optimizer. Metrics data such as replica counts, resource sizing, latencies, and request throughput are gathered by the Servo to inform the Optimizer. Data is always handled in the aggregate and is only used to drive optimization of your app.

Sound good? Let's ride.

Prerequisites

Before we roll up our sleeves, let's make sure that we have all the pieces on the board. Completing this guide requires the following:

  1. A Kubernetes cluster running v1.16.0 or higher.
  2. Local access to kubectl for applying manifests to the cluster.
  3. Access rights to list, create, read, update, and delete Deployments, Pods, Config Maps, Services, and Service Accounts through an in-cluster Service Account or a configured kubeconfig file.
  4. An application to be optimized, running as a Deployment in the Kubernetes cluster.
  5. An understanding of how ingress traffic is directed into a Kubernetes Service and flows into the target Deployment through a TCP port and the ability to make changes to the port.
  6. Representative traffic flowing to the Deployment through a Kubernetes Service (e.g., Load Balancer, ClusterIP) from a live source or synthetic load generator.

This document provides numerous command examples for reference. For clarity and brevity, it is assumed that kubectl is configured to interact with the target namespace. In order to copy and paste reference commands without modification, it is recommended that the active kubeconfig context is set to the target cluster and has been updated to interact with the target namespace by default. See Setting Namespace Context for details.

All the bases covered? Cool, let's get down to business.

Deploying the Servo

Before optimization can be started, there are a few things that need to be done. First off, we need to get the Servo up and running.

  1. Update the opsani-manifests.yaml file accompanying this document to reflect your environment. The following variables must be replaced:
    • {{ NAMESPACE }} - Namespace in which the target Deployment is running (e.g., apps).
    • {{ DEPLOYMENT }} - Name of the target Deployment to optimize (e.g., webapp).
    • {{ CONTAINER }} - Name of the container targeted for optimization (e.g., main). Must be the name of a container in the Pod spec of the target Deployment. When omitted, the first container in the Pod is targeted. Container names can be displayed via:
    console   kubectl get deployments -o \
      jsonpath="{.spec.template.spec.containers[*].name}" \
      [DEPLOYMENT]
  2. Apply the Servo manifests:

    kubectl apply -f opsani-manifests.yaml
  3. Wait for the Servo Deployment to become available:

    kubectl wait --for=condition=available --timeout=5m deployment/servo`
  4. If the wait command fails, move to Debugging servo Deployment.

4. Once the Servo becomes available, it will execute a set of health checks to verify readiness to run. The Servo logs will provide feedback about status of the checks. When the Deployment is found healthy, the logs will report that the Servo is waiting for the target Deployment to begin providing metrics. Keep tailing the logs while completing the proxy sidecar configuration for ongoing feedback.

kubectl logs -f -c servo \
  $(kubectl get pod \
    -l app.kubernetes.io/name=servo \
    -ojsonpath="{.items[0].metadata.name}")

Proxying Service traffic

Once the Servo is up and running, the next step is to update the target Service and Deployment so that metrics are made available that the Servo can consume. Metrics are generated by adding an Envoy sidecar container to the target Deployment and updating the Service so that ingress traffic into Pods described by the Deployment pass through Envoy before being processed by the primary application container. Envoy emits the metrics required for optimization, they are aggregated by the Prometheus sidecar running alongside the Servo, and are consumed and reported to the Optimizer.

Completing this portion of the configuration requires making edits to the target Service and Deployment. The Service can be directly edited and applied via kubectl edit service/[SERVICE] and the Deployment can likewise be directly edited and applied via kubectl edit deployment/[DEPLOYMENT]. Alternately, the YAML manifests can be edited in a local file and explicitly applied via kubectl apply -f [MANIFEST_FILE]. If you do not have a copy of the manifests on hand, one can be exported by running kubectl get -o yaml service/[SERVICE] > service.yaml and kubectl get -o yaml deployment/[DEPLOYMENT] > deployment.yaml, respectively.

Ensure that you have an editing environment available to you that can support your level of expertise with editing Kubernetes manifests in general and the target manifests specifically. Making significant edits to YAML documents can be error prone and frustrating when attempted through basic editors without format awareness. See the YAML Editing Recommendations appendix for help.

Make the following changes to the manifest:

  1. Update the targetPort Service to listen on on port 9980 (known as the OPSANI_ENVOY_PROXY_SERVICE_PORT). Take note of the original value as OPSANI_ENVOY_PROXIED_CONTAINER_PORT for use in subsequent steps.

These changes update the Service such that inbound traffic sent to the Service is transparently passed through Envoy on port 9980 before being handled by the application container. For example, given the following Load Balancer Service definition originally listening on port 8080 (HTTP), the updated Service might look like (pay attention to the NOTE comments):

apiVersion: v1
kind: Service

metadata:
  name: opsani-example-service
  labels:
    app.kubernetes.io/name: example-service

spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: example-service
  ports:
  - name: http
    protocol: TCP

    # Before editing
    # port: 80
    # targetPort: 8080 # NOTE: Becomes `OPSANI_ENVOY_PROXIED_CONTAINER_PORT`

    # After editing
    port: 80
    # NOTE: The port Envoy listens on for proxying traffic (known as
    # `OPSANI_ENVOY_PROXY_SERVICE_PORT`)
    targetPort: 9980

This completes changes to the Service object. Save and apply the changes.

  1. Edit the target Deployment to opt-in to metrics aggregation by including Opsani metadata in the Pod spec template. Only Pods that are annotated are discovered and scraped by the Servo Prometheus sidecar. To do so, add the following entries to the spec.template.metadata stanza of the manifest, creating the annotations and labels key if they do not already exist and merging them if they do:

    annotations:
      # These annotations are scraped by the Prometheus sidecar
      # running alongside the Servo Pod. The port must match the
      # `METRICS_PORT` defined in the Envoy container definition
      # below. The metrics are provided by the Envoy administration
      # module. It should not be necessary to change the path or port
      # unless the proxied service happens to have a namespace collision.
      # Any divergence from the defaults will require corresponding
      # changes to the container ports, service definition, and/or the
      # Envoy proxy configuration file.
      prometheus.opsani.com/path: /stats/prometheus
      prometheus.opsani.com/port: "9901"
      prometheus.opsani.com/scrape: "true"
    labels:
      # Attach a label for identifying Pods that have been augmented with
      # an Opsani Envoy sidecar.
      sidecar.opsani.com/type: "envoy"
  2. Add the Envoy sidecar container to the spec.template.spec.containers stanza of the Deployment manifest:

    # Opsani Envoy Sidecar
    # Provides metrics for consumption by the Opsani Servo
    - name: envoy
      image: opsani/envoy-proxy:latest
      resources:
          requests:
              cpu: 125m
              memory: 128Mi
          limits:
              cpu: 250m
              memory: 256Mi
      env:
      # The container port of Pods in the target Deployment responsible for
      # handling requests. This port is equal to the original port value of
      # the Kubernetes Service prior to injection of the Envoy sidecar. This
      # port is the destination for inbound traffic that Envoy will proxy from
      # the `OPSANI_ENVOY_PROXY_SERVICE_PORT` value configured above.
      - name: OPSANI_ENVOY_PROXIED_CONTAINER_PORT
        value: "{{ OPSANI_ENVOY_PROXIED_CONTAINER_PORT }}"
    
      # Uncomment if the upstream is serving TLS traffic
      # - name: OPSANI_ENVOY_PROXIED_CONTAINER_TLS_ENABLED
      #   value: "true"
    
      # The ingress port accepting traffic from the Kubernetes Service destined
      # for Pods that are part of the target Deployment (Default: 9980).
      # The Envoy proxy listens on this port and reverse proxies traffic back
      # to `OPSANI_ENVOY_PROXIED_CONTAINER_PORT` for handling. This port must
      # be equal to the newly assigned port in the updated Kubernetes Service
      # and must be configured in the `ports` section below.
      - name: OPSANI_ENVOY_PROXY_SERVICE_PORT
        value: "9980"
    
      # The port that exposes the metrics produced by Envoy while it proxies
      # traffic (Default: 9901). The corresponding entry in the `ports` stanza
      # below must match the value configured here.
      - name: OPSANI_ENVOY_PROXY_METRICS_PORT
        value: "9901"
    
      ports:
      # Traffic ingress from the Service endpoint. Must match the
      # `OPSANI_ENVOY_PROXY_SERVICE_PORT` env above and the `targetPort` of
      # the Service routing traffic into the Pod.
      - containerPort: 9980
        name: service
    
      # Metrics port exposed by the Envoy proxy that will be scraped by the
      # Prometheus sidecar running alongside the Servo. Must match the
      # `OPSANI_ENVOY_PROXY_METRICS_PORT` env and `prometheus.opsani.com/port`
      # annotation entries above.
      - containerPort: 9901
        name: metrics
  3. Update the {{OPSANI_ENVOY_PROXIED_CONTAINER_PORT }} value to match the original value of the Service targetPort noted in step 1. The OPSANI_ENVOY_PROXIED_CONTAINER_PORT is the TCP destination port that the proxy will use to pass requests into the application for handling.

Keep in mind that in a sidecar configuration the containers share a port namespace, so you must use different ports in the proxy config. By default, the Opsani Envoy sidecar is set up to use port 9980 but in the event that your application already utilizes port 9980 it may be necessary to select another port.

Once configured and applied, inbound traffic from the upsteam Service will pass through Envoy and generate the metrics necessary for optimization.

For reference, an updated Deployment manifest might look something like:

# NOTE: This is an example manifest provided for reference
apiVersion: apps/v1
kind: Deployment

metadata:
  name: example-deployment-with-sidecars

spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: example-app
  template:
    metadata:
      labels:
        app.kubernetes.io/name: example-app
        # Attach a label for identifying Pods that have been augmented with
        # an Opsani Envoy sidecar.
        sidecar.opsani.com/type: "envoy"
      annotations:
        # These annotations are scraped by the Prometheus sidecar
        # running alongside the servo Pod. The port must match the
        # `METRICS_PORT` defined in the Envoy container definition
        # below. The metrics are provided by the Envoy administration
        # module. It should not be necessary to change the path or port
        # unless the proxied service happens to have a namespace collision.
        # Any divergence from the defaults will require corresponding
        # changes to the container ports, service definition, and/or the
        # Envoy proxy configuration file.
        prometheus.opsani.com/path: /stats/prometheus
        prometheus.opsani.com/port: "9901"
        prometheus.opsani.com/scrape: "true"
    spec:
      containers:
      # Primary container providing the application logic
      - name: example-app-container
        image: example/app:v1.0.0
        resources:
          requests:
            cpu: "1"
            memory: 1G
          limits:
            cpu: "1"
            memory: 1G
        ports:
          # The ingress port that Envoy will reverse proxy requests
          # to for handling. Before Envoy sidecar injection this port
          # would be equal to the `targetPort` of the Service previously
          # edited.
          - containerPort: 8080

      # Opsani Envoy Sidecar
      # Provides metrics for consumption by the Opsani Servo
      - name: envoy
        image: opsani/envoy-proxy:latest
        resources:
            requests:
                cpu: 125m
                memory: 128Mi
            limits:
                cpu: 250m
                memory: 256Mi
        env:
        # The container port of Pods in the target Deployment responsible for
        # handling requests. This port is equal to the original port value of
        # the Kubernetes Service prior to injection of the Envoy sidecar. This
        # port is the destination for inbound traffic that Envoy will proxy from
        # the `OPSANI_ENVOY_PROXY_SERVICE_PORT` value configured above.
        - name: OPSANI_ENVOY_PROXIED_CONTAINER_PORT
          value: "8080"

        # Uncomment if the upstream is serving TLS traffic
        # - name: OPSANI_ENVOY_PROXIED_CONTAINER_TLS_ENABLED
        #   value: "true"

        # The ingress port accepting traffic from the Kubernetes Service destined
        # for Pods that are part of the target Deployment (Default: 9980).
        # The Envoy proxy listens on this port and reverse proxies traffic back
        # to `OPSANI_ENVOY_PROXIED_CONTAINER_PORT` for handling. This port must
        # be equal to the newly assigned port in the updated Kubernetes Service
        # and must be configured in the `ports` section below.
        - name: OPSANI_ENVOY_PROXY_SERVICE_PORT
          value: "9980"

        # The port that exposes the metrics produced by Envoy while it proxies
        # traffic (Default: 9901). The corresponding entry in the `ports` stanza
        # below must match the value configured here.
        - name: OPSANI_ENVOY_PROXY_METRICS_PORT
          value: "9901"

        ports:
        # Traffic ingress from the Service endpoint. Must match the
        # `OPSANI_ENVOY_PROXY_SERVICE_PORT` env above and the Service routing
        # traffic.
        - containerPort: 9980
          name: service

        # Metrics port exposed by the Envoy proxy that will be scraped by the
        # Prometheus sidecar running alongside the Servo. Must match the
        # `OPSANI_ENVOY_PROXY_METRICS_PORT` env and `prometheus.opsani.com/port`
        # annotation entries above.
        - containerPort: 9901
          name: metrics
  1. Apply the revised Deployment manifest to the cluster and ensure that the Deployment is rolled out and all Pods are healthy. The Servo logs will continue reporting about the status of the configuration.

Once all checks are passing, the Servo will enter its standard run mode and begin interacting with the Opsani API and the Optimizer. A tuning Pod will be created and data points will begin reporting into the Opsani Console. Opsani launches in measure-only mode which refrains from making changes to your infrastructure until you are ready.

Next Steps

Once the installation is completed, optimization can be configured to match your specific goals.

Troubleshooting

If you run into issues completing setup, there are a number of tools at your disposal for triaging and resolving issues.

Debugging Servo Deployment

If errors are encountered during servo deployment, there are a few things to evaluate:

  1. Is there a servo pod?

    kubectl get pod -l app.kubernetes.io/name=servo

If no servo pod is identified, then the deployment was not successful or the wrong namespace has been targeted. Retrace your steps and verify that all assumptions such as the namespace are correct.

  1. Has there been a scheduling error?

When a servo Pod is created but remains in the pending status, it often indicates a scheduling error due to a lack of available resources on the cluster. Describing the Pod and looking at the event status is essential to understanding the error and how to resolve it.

kubectl describe pod -l app.kubernetes.io/name=servo
  1. Are both containers healthy?

Once the sidecar is deployed, all Pods that are part of the target Deployment will begin reporting on 2 containers. Ensure that each of these containers is reporting as ready.

  1. Is the servo connecting to the API?

Check that servo is connecting to the Opsani API by reviewing the servo logs. Connectivity errors will be reported into the logs and will trigger exponential backoff and retry behaviors.

The HTTP status code of the error report is indicative of the problem. A 404 (Not found) error indicates a problem in the optimizer configuration (e.g.,an invalid OPSANI_OPTIMIZER, a bad base URL, or an outbound firewall/proxy).

A 401 (Unauthorized) status code indicates that an incorrect API token has been provided and could not be verified.

Any 5xx error would indicate an upsteam failure in the runtime system or infrastructure hosting the Opsani platform.

Running Servo checks

The Servo exposes a rich set of preflight checks that verify the correctness of the configuration. These checks are run automatically during normal operation startup. When troubleshooting specific issues, it can be helpful to run one or more of these checks directly. This can be done via the kubectl exec (when a running Servo Pod is available) or via kubectl run (as an adhoc task):

kubectl exec -c servo \
  $(kubectl get pod \
    -l app.kubernetes.io/name=servo \
    -ojsonpath="{.items[0].metadata.name}") \
  -- servo check

Tailing Servo logs

The Servo provides extensive logging output that is invaluable in debugging. Logs can be tailed by executing:

kubectl logs -f -c servo \
  $(kubectl get pod \
    -l app.kubernetes.io/name=servo \
    -ojsonpath="{.items[0].metadata.name}")

Debugging traffic metrics

Optimization requires data about the performance of the service under optimization. Opsani Dev relies on Envoy proxy sidecar containers that receive traffic from a Kubernetes service and proxies it back to the service under optimization. If throughput or latency metrics are flatlined, it can either point to an issue with metrics generation (at the Envoy level) or metrics aggregation (at the Prometheus level).

A good way to differentiate between the two possible root causes is to tail the Envoy proxy logs. If HTTP requests are being logged, then they are being instrumented and exposed as scrapable metrics and the issue is at the Prometheus aggregation level. If requests are not being logged but are succeeding, then the issue is related to sidecar proxy configuration -- traffic is not being intercepted by the proxy. If requests are failing, then the proxy configuration is misaligned and ingress traffic from the Kubernetes service is being misrouted. This can either be caused by the proxy is not listening on the port that the service is sending traffic to, in which case Envoy will show nothing in the logs, or by the Envoy proxy passing traffic back to the wrong port on the service under optimization, in which case the Envoy logs will show inbound requests with a 4xx/5xx status code from the upstream service.

The key to debugging these cases is to examine the Envoy proxy logs and analyze what you see as described.

To tail these logs, run:

kubectl logs -f -c envoy \
  $(kubectl get pod -l sidecar.opsani.com/type=envoy \
    -ojsonpath="{.items[0].metadata.name}")

Port-forwarding to Prometheus

The Servo runs with a Prometheus sidecar container. If metrics are not available for whatever reason, it can be helpful to directly connect to Prometheus in order to inspect the targets and run adhoc queries. To do so:

kubectl port-forward \
  $(kubectl get pod -l app.kubernetes.io/name=servo \
    -ojsonpath="{.items[0].metadata.name}") \
  9090:9090

Then open http://localhost:9090 in your browser.

Elevating log levels

The Servo emits extensive logging during operation. The default log level of INFO is designed to provide consistent feedback that is easy to follow in real time without becoming overwhelming. When troubleshooting an issue, it may become desirable to run the Servo at a logging level of DEBUG or TRACE. To do so:

kubectl set env deployment/servo SERVO_LOG_LEVEL=DEBUG
 kubectl rollout restart deployment/servo

Restarting the Servo

The Servo is built with extensive health checks, timeouts, and an asynchronous architecture that will deliver consistent feedback during normal operations. But things can go wrong and when they do sometimes a hard restart is the path of least resistance. To do so:

kubectl rollout restart deployment/servo

Getting Help

The Opsani support team is standing by to assist you with any issues encountered during deployment. Send an email to and we will lend you a hand.

Appendices

Setting kubectl context namespace

kubectl config set-context --current --namespace=[NAMESPACE]

YAML Editing Recommendations

When editing YAML documents, utilize a programming text editor that is aware of YAML syntax and is capable of providing affordances such a alignment guides and code folding. These can be very helpful for being able to see the relevant parts of the document and maintain context as you edit. Visual Studio Code, Atom, Sublime Text, vim, and emacs can all support YAML editing with low configuration effort.

If you are working in a constrained environment such as a corporate network that requires edits to be made over an SSH connection rather than from a workstation, strongly consider using SCP to move manifests on and off the server so that they can be edited in a friendlier environment. Much of the time, cluster admin bastion hosts are equipped with archaic builds of vi and nano that offer limited support. Copying and pasting content across terminal emulators and remote connections can introduce subtle, hard to debug issues such as the introduction of invisible characters due to character set misalignment or terminal emulator profiles.

Finally, it can be very helpful to leverage a linter that is aware of Kubernetes YAML syntax and can provide clear, actionable errors. Many modern editors support Kubernetes syntax through an extension and there are a variety of standalone linters available. A non-exhaustive list to consider includes:

Using an alternate API base URL

kubectl set env deployment/servo OPSANI_BASE_URL=https://custom-api.opsani.com
 kubectl rollout restart deployment/servo

Building Docker images

If you are unable to pull Docker images from the Opsani Docker Hub or would prefer to build the images yourself, instructions are available in the ServoX README.