Optimizing cloud apps is a no-brainer. If you can pull it off, you will simultaneously save money and boost performance. At Opsani, we have leveraged neural net and deep reinforcement learning technologies to construct an AI that continually optimizes cloud app performance. It is the perfect tool for any enterprise with cloud-based, medium-to-large applications that clocks a >$5M/yr cloud spend and is moving toward DevOps.
However, performance optimization for cloud apps isn’t a cookie cutter process. How you treat your apps should depend on the maturity, lifecycle stage, and scale of an app. Until they reach a certain stage of maturity, apps aren’t actually ready for full-blown performance optimization. Let’s call that final, CO-ready point of maturity stage five. Shepherding all of your apps toward stage five maturity should be your standard, targeted trajectory.
Here are the four preceding stages, and how to move through them to a point of continuous optimization readiness.
Stage One: Don’t Optimize, Scale Out
You are early in development. Here, especially with new products, optimization of cloud apps is a bad use of engineering time. Be agile, and design your services to scale out. Here, you can throw resources at the problem if you need to; resources are cheap, compared to engineers. Invest in automatic scale-out solutions, and keep headroom.
Continue this way until it starts to get expensive, and/or the response time of a single request gets too high.
Stage Two: Monitor Production Performance
Your apps are a little more mature now. Here, you need to identify your production performance metric(s), whether throughput, response time, error rate, or something else. You need to pinpoint how these affect your business as accurately as you can (involve customers in this if it feels right).
At this point, you should be monitoring production environments using easy and affordable SaaS-based monitoring services. Don’t overdo the monitoring: start small, and monitor the big things. Pay attention around deployments, and spot when bottlenecks appear. You can identify and fix yourself at first, but have a plan for adding triggers and notifications; eventually, there will just be too many services to monitor manually.
Stage Three: Add Performance Testing to Your CI/CD Pipeline
Your scale and maturity is now such that you should define and implement a performance test suite. You have three options here: Utilize an existing load generator; capture and replay production traffic; or build a custom load generator.
At this point, make performance regression tests part of your CI/CD pipeline. Report results, and overlay them with other development process metrics. Pay attention to measurement precision/repeatability. When performance regressions are caught, rollback deployment and return to a developer to fix, or allow for executive override.
You can press on this way until runtime resources become too expensive, and/or until your developers begin to seriously grumble about frequent rollbacks.
Stage Four: Application Performance Management
You are nearly ready for continuous optimization of your cloud apps. At this stage, you need to instrument your code. (This process is usually language- and framework-specific.) Alongside this, you need to dedicate a team to application performance management (APM). The team can be small at first. They should learn the tools of APM, and look for consistent improvements over a six-month window.
Your aim here should be targeted code improvements, and suggestions for where improvements can be made when nothing else is working. Evangelise your early successes in order to engage other developers.
Keep your APM contained, but continue until the codebase becomes too volatile to work with, ie. until there are major architectural changes, a disruptive migration to microservices etc.
Stage 5: Continuous Optimization
Once you’ve passed through these prior four stages, you are ready for continuous optimization.
To prepare for the integration of a CO tool like Opsani, you should identify, at a lop-level, the key performance tuning dials that you want to turn. These could be resources: CPU, memory: reserve or limit; VM instance type; I/O throughput. They could be various nodes of middleware configuration: JVM GC type/parameters, worker threads, pool sizes, write delays. They could be kernel parameters: page sizes, jumbo packet sizes, even scheduler tweaks. Or they could be an individual app’s parameters: thread pools, cache timeouts, memory-vs-cpu tweaks and tradeoffs.
Once you have an idea of your key performance dials, it’s time to choose where the CI/CD pipeline you want to add your tuning. You have two primary options: Tune in staging only, using a perf-test-suite, and only propagate results to production programmatically. Or, tune in production (perhaps a canary environment), which feels riskier, but requires zero-downtime deployments.
Many apps aren’t ready for full performance tuning just yet. But if you mature them in the correct way, and follow these best practices, you can make sure that they are always evolving toward a state where they can easily be optimized. As you scale, this will reap major benefits in the long run.