source of metrics

Thanks for returning to our series Metrics for Optimization. Once again, please be sure to read our first two blog posts, What is Optimization? and Which Sources of Metrics Should You Use for Optimization?

 In my first post, I taught you what optimization means, how to figure out what you are trying to optimize, and if you are focused on business goals or technology goals when performing this optimization. In the next blog, we discussed how you can collect metrics or get information from the technology side to derive and drive your optimization process. Now, I will teach you which metrics you should use and how to implement them.

As a refresher, the two main sources of metrics we see in optimization are application performance monitoring (APM) and request error duration metrics (RED), but how do you know which one to use for your application?

Are APM metrics right for your application? Let’s find out

If you already have APM inserted into your application, you absolutely want to leverage the resultant metrics because you can get finely detailed results from them. If your development team uses a single language, APM is often something that can be incorporated into the process for future development, which can simplify continued inclusion. 

Even if you are building multiple components based on a single development language, JAVA, for instance,  is a repeatable model. APM usually has software elements that have to be added to the application itself, and with a single language, the process can be shared across the organization. By integrating deeply into the application code, one can get very detailed resource metrics that one can’t necessarily determine from RED style external metrics. 

If you’re not using many components that use disparate languages, or use languages that are not supported by the APM vendor, then APM is probably an excellent selection for you. 

Here’s when you should use RED metrics 

In contrast, the other route to take would be to use the RED metrics. RED metrics are effectively a black box metric that looks at the network and how the network interacts with the target resource (or resources). In this case, you no longer have a language requirement because you are looking at this outside of the box, which is a plus in many environments.

If you don’t already have a network resource that can gather the required metrics there is still the need to add another resource potentially in line with your application. Some people have seen adding this resource as a negative. Although, in all of our testing that we have done over the years, we’ve seen little impact to the application performance by adding these network components specifically to gather the metrics. By gathering metrics at a network resource, It becomes a little easier in the RED case to actually gather metrics across every component, even external components, and gather that same information to drive your optimization. 

Does the source really matter? 

Regardless of what source we get metrics from, the really interesting shift in capabilities is that machine learning (ML) driven optimization is becoming the norm. As long as I can get the metrics from either of these sources, I can leverage machine learning algorithms to derive my optimization results. You still have to validate your business to technology objective correlation into account and derive that objective into some sort of latency and or throughput service level object. Once defined, it is simple enough to derive the optimization result to determine how to improve or maintain a service level objective against any of the metrics passing against the system. 

On top of the initial optimization result, it is simple now to continuously apply this optimization process in order to maintain our application’s optimal state against code and customer interaction changes over time. That optimization can be tuned to drive a reduced cost of my infrastructure or improved resilience while maintaining the overall business objectives.

Often just limiting and adding resources like CPU and memory, at the simplest level, can help optimize the system’s runtime overall, especially when we have multiple components. Optimizing a single service is something that we often feel we can accomplish, although once you start looking at the actual number of parameters, it becomes clear that this is a monumental task. Add the machine learning capability into the system, and suddenly we are talking about optimizing across all of our services simultaneously, and that’s when it really gets powerful. 

You now know how to determine what specific metric is the best for your application, tune into our next blog to learn how to implement them for optimization.