Kubernetes best practices

Leading FinTech Company Boosts SaaS User Experience

Leading FinTech Company Boosts SaaS User Experience

and Slashes Cloud Bills With Opsani

Challenge

The company’s performance team were trying to tune the Java environment of the application QBO-UI, which generates the SaaS UI from the backend infrastructure (without displaying databases, authentication and other elements that aren’t relevant to a user). The company’s performance team’s primary goal was to improve user experience. A key part of achieving this was reducing latency, which sat stubbornly at 150 milliseconds across dozens of shards. The company also wanted to reduce the frequency of “GC full events” occurring in Java Virtual Machines (JVMs). Whenever they occur, GC full events can slow down an application, consume excessive CPU, and impact the user experience.

If the company’s team could reduce latency and minimize GC full events, they would increase availability, make the UX snappier, and produce more of a real-time interaction for users. As well as implementing these user-friendly improvements, they would also trim their AWS bill – which they knew was higher than it needed to be – without any negative impact upon performance.

However, effective human tuning of the Java environment was proving impossible. Despite the efforts of the SaaS tuning team, the performance was inconsistent between releases. This was having an ongoing impact on user experience, which was intermittently less than optimal. And their cloud bills were staying the same, despite the tuning team knowing that application parameter optimizations were possible.

The company turned to Opsani to implement AI-driven cloud optimization and achieve their goals for their SaaS application.

Executive Brief

The company is the industry leader in financial, accounting, and tax preparation software, with annual revenue of almost seven billion dollars. Their SaaS service, is one of their key product offerings and is utilized by millions of global businesses.

Industry Financial Services
App Resources 1000s of Virtual Machines
Time to Optimize Less than one quarter

Optimization Challenge

The optimization challenge was significant. Cloud applications are by nature extremely complicated, offering trillions of resource and parameter permutations. Runs over 10,000 transactions per second. To reduce latency, the Opsani cloud optimization engine targeted a variety of changes:

  • Number of instances per shard (on the horizontal scale).
  • EC2 instance type/family (on the vertical scale).
  • Five different Java config parameters affecting Garbage collection strategies, intervals, and heap sizes. And this barely scratches the surface of what can be tweaked to improve the performance of a cloud application.

These parameters are all interrelated, offering trillions of possible combinations. The Opsani engine optimized for work done – number of transactions completed – while maintaining the service-level objective (SLO) – response time, error rate – and minimizing the cost. The cost was computed as the price of one of the selected instance types multiplied by the number of instances.

The Opsani engine examined the trade-off between having many cheap nodes as compared to fewer, larger nodes. Different instance types affected the amount of memory available for Java Heap, so different heap sizes had to be explored as a dependent variable. Moreover, for their shards serving the US market, Intuit could reduce their total number of compute instances by going with VMs with more memory.

Results

After the optimization period, The company’s SaaS application experienced a host of performance benefits:

  • Faster application response time: 90% percentile (TP90) latency time improved by 10%
  • A total of 5,000 minutes of uptime were recovered within 2 weeks
  • Operations experienced a 10x reduction in pager notifications
  • “GC full events” decreased by 91%
  • Release cycles made quicker by an entire week

Opsani’s cloud optimization enabled a major FinTech player to unlock major cost savings.

On top of this, Opsani’s cloud optimization enabled The company to unlock major cost savings. Prior to starting cloud optimization, the company’s team was expecting cost savings of at best of 20%. However:

With Opsani, the cost came down by 72%, equating to hundreds of thousands of dollars cut from the monthly AWS bill.

The company was so impressed with the benefits of cloud optimization that over the coming months they are expanding the technology across more of their applications.


BLOG IMAGES_Opsani- Optimizing a FinOps Framework

FinOps 101

[ARTICLE]

FinOps 201: The Best Stage To Optimize Your Apps

How To Unify Cost Optimization Strategy & Business Profitability


Executive Summary:

FinOps is an operational framework that combines efficiency and best practices to deliver financial and operational control of a company’s cloud spend. FinOps has its roots in the ways in which the rise of the cloud has complexified the relationship between DevOps teams and finance departments. FinOps approaches aim to reduce cloud overspend and unite teams in sensible financial conduct.

The main motivation of the FinOps framework is cost optimization, and the main motivation of the well-known BCG Growth-Share Matrix is business profitability. Combining the two can bring maximum efficiency to a company’s financials. Similarly, combining a FinOps framework with Cloud Optimization approaches can bring AI-powered technologies to bear on the key goals of FinOps, and further empower company goals.


What is FinOps?

FinOps – short for Financial Operations – is a financial cloud management approach. At its core, FinOps combines best practices, a culture of efficiency, and effective systems, to produce absolute financial and operational control for cloud spending. FinOps increases a company’s ability to comprehend cloud computing costs and make necessary tradeoffs.

As agile DevOps teams break down silos, increase agility, and transform digital product development, FinOps brings together business and finance professionals with new technologies and procedures to elevate the cloud’s business value.

FinOps, if implemented correctly, enables companies and organizations to make better decisions regarding the growth and scale of their cloud utilization. This leads to better control over spending, optimum resource utilization, and significantly lower cloud costs.

finops

How the FinOps Movement Started

The origins of the FinOps movement lie in the early 2000s, when DevOps took center stage and blew decades of established software development culture out of the water.

With the rise of DevOps, two previously siloed departments (Development and Operations) came together to function as one unit. They began developing new philosophies, uncovering best practices, utilizing new tools, and finding new ways to collaborate cohesively. Engineers and ops teams could no longer blame one other for slow servers or flawed code. They had to function together to solve issues, even if it meant retraining people in this new system of work.

Once the cloud and IaaS models came to prominence, the lines between technology, finance and procurement also started to become a problem. Infrastructure providers had to be on-demand, scalable, self-serviced, and measurable (OSSM). This meant that an engineer could essentially spend company resources to immediately scale up programs and fix performance issues without requiring approval from the finance and procurement departments. For the DevOps teams, this was wonderful. For the CFO and finance teams, it was not so wonderful.

Engineers, who worry constantly about performance issues and were once constrained by limited hardware, now had the freedom to throw money at a problem. But the CFO and finance teams were left with a financial mess.

Eventually, the realization dawned that something had to change. A balance needed to be achieved, to ensure that organizations don’t spend too many company resources, but are able guarantee performance. Different departments needed to integrate and shift into a shared accountability system.

This is when FinOps emerged. Like DevOps, it consisted of a new operating model with new frameworks and silo breakdowns. But unlike DevOps, it was an approach years in the making.

“It’s a cultural shift that I’ve watched happening in organizations around the world over the last 6-7 years,” wrote J. R. Storment of the FinOps Foundation in 2019.

“I first saw FinOps pop up in San Francisco circa 2013 at companies like Adobe and Uber, who were early scalers in AWS. Then I saw it when working in Australia in 2015 in forward-looking enterprises like Qantas and Australia Post. Then, during a two-year tour of duty in London starting in 2017, I’ve watched giant companies like BP and Sainsbury’s work diligently to develop this new culture.”

The Three Phases of the FinOps Journey

Transitioning to a FinOps culture consists of three main phases. These phases can happen simultaneously and iteratively in one company depending on the application, business unit, or team.

Inform

The first phase of the FinOps journey involves the use of people, tools, and processes to empower teams and organizations and provide the following benefits:

  • Proper allocation of cloud resources to enable accurate chargeback and showback;
  • Benchmarking as a cohort to provide organizations with key metrics for a high-performing team;
  • Effective budget plans to drive ROI while simultaneously avoiding overspend;
  • Accurate forecasting to prevent financial “surprises”, and;
  • Real-time visibility of the cloud’s on-demand and elastic nature to assist in making informed decisions.

The Inform phase is critical, because this is where your organization educates everyone involved and establishes the understanding that what you can measure, you can control.

Optimize

The next FinOps phase is to bring into service all that information and empowerment to optimize your cloud usage and footprint. Some of the steps your organization can take are:

  • Transition from on-demand capacity (which is the most expensive feature of the cloud) into Reserved Instances (RI) where possible;
  • Take advantage of Committed Use Discounts (CUD, from Google Cloud) through longer-term commitments to enforce cost controls;
  • Rightsizing and reducing waste such as orphaned resources and unused instances and storage, and;
  • Utilizing AI-powered cloud optimization tools that improve your application’s efficiency, improving app performance while reducing cloud spend.

Operate

This third phase isn’t technically the last one. Rather, it’s a reminder that this journey isn’t a one-off activity. This cultural initiative should be integrated, baked, and automated  into the daily operations, if the goal is to achieve ongoing success. Organizations who aim to build a Cloud Cost Center of Excellence should also continuously evaluate the metrics they’re tracking with the business objectives they have and the current trends in their industry. This rinse-and-repeat process needs business, financial, and operational stakeholders who embrace the culture of FinOps.

The Structure of a FinOps Team

A FinOps team is composed mainly of the executives, FinOps practitioners, DevOps, and Finance and Procurement. Each of these individuals/teams have a different role in the FinOps framework:

  • Executives
    • Includes a VP/Head of Infrastructure, Head of Cloud Center of Excellence, and a CTO or CIO.
    • Their focus is to drive accountability, build transparency, and to ensure budget efficiency.
  • FinOps Practitioners
    • Includes a FinOps Analyst, Director of Cloud Optimization, Manager of Cloud Operations or a Cost Optimisation Data Analyst.
    • These individuals will be focused on the teams’ budget allocation, and on forecasting cloud spend.
  • DevOps Team
    • Mainly composed of engineers and Ops team members (Lead Software Engineer, Principal Systems Engineer, Cloud Architect, Service Delivery Manager, Engineering Manager, and/or Director of Platform Engineering)
    • DevOps will focus on building and supporting services for the organization.
    • At this point, cost is introduced to them as a metric that should be tracked and monitored like other performance metrics. DevOps teams have to consider efficient design and use of resources, as well as identify and predict spending anomalies.
  • Finance and Procurement
    • Often include Technology Procurement Manager, Global Technology Procurement, Financial Planning and Analyst Manager, and Financial Business Advisor.
    • They will use reports provided by the FinOps team for accounting and forecasting, working closely with them to understand historic billing data  and build out more accurate cost models.
    • These forecasts and cost models will then be used to engage in rate negotiations with cloud companies and service providers.

FinOps and the BCG Growth-Share Matrix

One powerful way to leverage a FinOps framework is to combine it with the BCG Growth-Share Matrix – a model that many veterans of the business world will be familiar with.

What is the Growth-Share Matrix?

The Growth-Share Matrix was invented by the Boston Consulting Group’s (BCG) founder Bruce Henderson in 1968. It is a portfolio management framework that aids organizations in determining which product, service, or business to prioritize and focus on.

The BCG Growth-Share Matrix is a table comprising four quadrants that represent the degree of profitability of a product, service, or business:

  • the Stars;
  • the Question Marks;
  • the Cash Cows, and;
  • the Dogs (also known as Pets).

Each product/service/business is assigned to one of these categories, based on certain factors, but most especially on their capability for growth and their market share size. Executives can then decide which ones to focus on to drive more value and generate more profit.

How does the BCG Growth-Share Matrix help us refine a FinOps framework?

The main motivation of the FinOps framework is cost optimization, and the main motivation of the Growth-Share Matrix is business profitability. Combining the two can bring maximum efficiency to a company’s financials.

How the BCG Growth-Share Matrix Works

This business framework was built on the principle that market leadership = sustainable, superior returns. It reveals two fundamental elements of business that organizations need to consider before investing in a business:

  • Market attractiveness, which is driven by relative market share, and;
  • Company competitiveness, which is driven by the company’s growth rate.

The market leader eventually obtains a self-reinforcing cost advantage, which competitors find difficult to emulate. Such high growth rates tell organizations which markets have the highest growth potential, as well as the ones that don’t. The four symbols of the Matrix represent a certain combination of growth and relative market share:

The Stars

These are high-growth, high-share businesses that have considerable future potential and are very lucrative to invest in. These are the market leaders – businesses that make up a great portion of their industry and generate the most income. They need high funding, to maintain their growth rate. But if the business manages to maintain its status as a market leader when market growth slows, it becomes a Cash Cow.

Worst case scenario is when new innovations and technological advancements outplay your business, and instead of your Star becoming a Cash Cow, it becomes a Dog. This often happens in rapidly-changing markets and industries, and it can catch companies off guard.

The Question Marks

Question Marks are high growth, low share businesses that pose a strategic challenge. Depending on their chances of becoming Stars, companies either invest in or discard them. Startups and ventures often possess this designation.

With the right circumstances and the right management, a Question Mark can turn into a Star and, eventually, a Cash Cow. But sometimes, even after a large amount of investments, Question Marks still don’t develop into market leaders, and they end up as Dogs. This is why companies need careful consideration when it comes to deciding matters with Question Marks.

The Cash Cows

Cash Cows are low growth, high share businesses. They are marketplace leaders, generating more cash than they consume. The growth of Cash Cows isn’t high, but they often commonly have a lot of users already. They  act like the backbone of the company, providing revenue on almost all fronts.

Other, more “mature” markets consider these businesses to be “plain and boring”, especially because they run on a low-growth environment. But their cash generation is constant, and corporations value them a lot for it. The cash they generate also helps Stars and Question Marks transform into Cash Cows.

However, mismanagement and other negative circumstances degrade a Cash Cow into a Dog, so companies should also continue investing in Cash Cows to “milk” it passively and maintain its level of productivity.

The Dogs

The Dogs are the worst form a business can take. Dogs are low share, low growth businesses, meaning they’re in a mature and slow-growing industry and have low market shares. Often, they are cash traps that tie up company funds over long periods, drain resources due to low-to-negative cash returns, and depress a company’s return-on-assets ratio.

Dogs can still sometimes play a role in a company – for instance, one Dog may complement the products of other business units. But common marketing advice to deal with Dogs is to remove them from the company’s portfolio altogether, through divestiture or liquidation. Unless the organization finds a new strategy to reposition Dogs and lift them up from their status, they will most likely hurt the company in the long run.

Utilizing the Growth-Share Matrix

When adopting a FinOps culture, companies should simultaneously evaluate their product offering(s) using the Growth-Share Matrix with optimization strategies. While the FinOps practitioners establish the new procedures for FinOps to take hold, executives should take the time to look at their product lines and product features and thoroughly examine them. The expertise of the FinOps practitioners in benchmarking and forecasting should help them determine which product features and product offerings are on the way to becoming Cash Cows, and which ones are not.

Eventually, all products will either turn into Cash Cows (which is the best kind of product/business to apply optimization to) or Dogs. Careful evaluation on both the business side and the operational side can help companies and organizations make the right decisions.

Where Optimization Should Be Focused

Question Marks are not ideal as focus points of optimization. Yes, these kinds of products are growing. But the market isn’t really adopting Question Marks, due to their low share. With less adoption, costs are not skyrocketing. This also means that companies won’t be able to see many returns from them (yet).

Dogs are likely to be discontinued anyway – so you might as well cut losses as early as possible. Stars should definitely be optimized, but they are growing so fast that they are the hardest ones to control in a really granular way.

During the Optimize phase, it’s the Cash Cows that FinOps companies should focus on. Lots of people are already using them, and the user base will most likely be pretty steady, so there won’t be much “growth” to unsettle things. Organizations should prioritize reducing their Cash Cow’s cloud footprint and usage. This allows the Cash Cows to function more efficiently and generate more revenue for the company.

As the Operation phase rolls in, companies can start working on their Stars and Question Marks.

FinOps and Cloud Optimization

AWS autoscaling

Cash Cows are where optimization should be focused. The best way to optimize and squeeze more revenue out of these is to bring costs down. Reducing Cash Cows’ cloud footprint and usage will help companies to successfully integrate both the FinOps and the BCG Growth-Share Matrix frameworks. However, you also need to consider the performance of your Cash Cow. It’s no good cutting costs if performance suffers. You don’t want to jeopardize user experience. This is where cloud optimization comes in.

Cloud optimization can play a significant role in an effective FinOps culture. Cloud optimization is all about minimizing cloud spend, and preventing unnecessary wastage in DevOps budgets – the same motivations that gave birth to the FinOps movement in the first place.

Cloud apps are complicated. The right tweaks and changes to resource allocation and parameters can have a big impact on cost. But even a basic application can have trillions of different permutations of resource and parameter settings. And these settings change fast. With daily code releases, infrastructure features and updates, and traffic changes and user growth, no-one short of a superhuman can keep up.

Leveraging the Capabilities of Artificial Intelligence

An AI-driven cloud optimization technology will perform automation better than any human ever will. AI systems, unlike humans, do not grow tired, do not easily forget, and can take in and calculate variables at hyperspeed. Such technology is what is needed to help organizations optimize entire systems, providing improved performance with minimal costs. This allows them to stick to the overarching principles of the FinOps culture, and leaves the engineers and operations teams with more time to focus their efforts on development and innovation.

Conclusion

Adopting a FinOps framework is a lifetime commitment for a company. Sometimes, the process doesn’t go as smoothly as planned, especially for new companies who are still trying to figure out the business side of the cloud.

However, with the right guiding principles, the right mindset, the right people, and the right cloud optimization tools, any company can successfully pull off a FinOps transition, saving them millions of dollars in resources and ensuring the company’s longevity in the industry. Combining the FinOps framework with a proven and effective model like the BCG Growth-Share Matrix gives companies a way to sharpen their thinking around FinOps goals, and better position themselves for success.

 

For more reads, check out these other articles:


Why you should use a predictive load balancer with the Kubernetes Horizontal Pod Autoscaler (HPA)

Kubernetes: Everything You Need to Know

[ARTICLE]

Kubernetes: Everything You Need to Know

In this article, Kubernetes: Everything You Need To Know we will shed light on this amazing container orchestration system.  It is a tool that facilitates the automation of all aspects of application deployment and management. Kubernetes plays a key role within the world of cloud applications. The platform accelerates time to market, offers enhanced scalability and availability, combines neatly with cloud optimization tools, works flexibly across multiple and hybrid cloud environments, and makes all aspects of cloud migration smoother.

Containers: A Brief Overview

To understand Kubernetes, you first need to understand containers.

Back in 2013, Docker changed everything. Building on existing virtualization technologies, Docker introduced containers. A container is an abstraction, implemented at the kernel level, that consists of an entire runtime environment. This means that a container contains an application, but it also contains all of its dependencies, libraries and configuration files.

Containers allow you to quickly and smoothly move software from one computing environment into another. (For example, from staging to production, or from a physical server to a virtual machine (VM).) By “containerizing” an application and its dependencies, differences in infrastructure or OS distributions are abstracted away. An app will run the same on your laptop as it does in the cloud.

Unlike with older virtualization and VM frameworks, containers are able to share an operating system kernel with one another thanks to their relaxed isolation properties. As a result, a container is considerably more lightweight than a VM (virtual machine), which typically contains its own dedicated OS. This means that servers can host many more containers than it can VMs.

Containers are integral to modern DevOps frameworks. Their modular nature is what allows for a microservices approach, where different parts of an app are split up across different containers. Containers allow for quick and easy rollbacks, due to their image immutability. Containers accelerate the time-to-value for code, allowing releases to arrive daily, rather than quarterly. In modern cloud computing, containers are fast becoming the new norm. The 2019 Container Adoption Survey found that 87% of respondents used container technologies, compared to just 55% back in 2017. 451 Research predicts that containers will be a $4.3-billion industry by the end of 2022.

Container Organization: Kubernetes to the Rescue

What is Kubernetes exactly?

But: Containers need to be managed. They are complex entities, and many DevOps teams are managing thousands of containers. 

Enter Kubernetes. Originally designed by Google, Kubernetes (pronounced “koo-ber-NET-eez”) is an open-source container orchestration software designed to simplify the management, deployment, and scaling of containerized applications. Also referred to as “K8s”, “Kube”, or “k-eights”, Kubernetes is actually the Greek word for helmsman – the person in charge of steering a ship. Kubernetes integrates with a range of container tools, but the majority of people pair it with Docker. 

In short, Kubernetes is what people use to manage their containers. Kubernetes automates and simplifies the various processes involved in the deployment and scaling of containers, as well as in directing traffic into and between containerized applications. In a production environment, enterprises need full control over the containers that run their applications, to ensure that there is no downtime. Kubernetes gives them this level of control. 

Kuberenetes also facilitates the efficient management of clusters of containers. Kubernetes clusters can be distributed across multiple environments, including on-premise, public, private, or hybrid clouds. This makes Kubernetes an ideal hosting platform to manage cloud-native applications that rely on rapid, real-time scaling (such as data streaming through Apache Kafka).

How Kubernetes Works: Some Technical Details

From a 30,000-foot level, Kubernetes provides DevOps reams with a framework to run distributed networks of containers resiliently. Kubernetes enables flexible and reliable scaling of large clusters of containers, provides failover for applications, provides deployment patterns, and everything else teams need.

The base of the Kubernetes architecture consists of the containers that communicate with each other via a shared network. These containers form into clusters, where several components, workloads, and capabilities of the Kubernetes environment are configured.

Every node in every cluster is provided a specific role within the Kubernetes infrastructure. One particular node typeis deployed as the master node. The master server is the cluster’s main point of contact. It is in charge of the majority of the centralized logic that Kubernetes supplies  This is basically the gateway and brain for the cluster. The master server reveals an API for both clients and users, monitors the health of other servers, identifies the best ways to split up and delegate work, and facilitates and organizes communication between other components. In highly available Kubernetes clusters, there are multiple master nodes (typically 3) in order to ensure that the cluster can be contacted and continues to operate even if a master node fails. Kubernetes arranges for the automatic failover between master nodes.

Worker nodes are tasked with accepting and running workloads. To ensure isolation, efficient management, and unmatched flexibility, Kubernetes places and runs applications and services in containers. A container runtime (like Docker or rkt) needs to be equipped per node for this setup to work.

Once everything is up and running, the master node sends work instructions to the worker nodes. Fulfilling these instructions, worker nodes stand up or tear down containers accordingly, make adjustments to the networking rules to route and direct traffic appropriately.

Quick Definitions of Kubernetes Key Elements

Master node. Functions as the main control and contact point for administrators and users. It also distributes (schedules) workloads to worker nodes and handles failover for master and worker nodes.

Worker nodes. Act on assigned tasks and perform requested actions. Worker nodes take their instruction from the master node.

Pods. A group of containers that share network and storage and are alwaysplaced in a single node. Containers within the same pod typically collaborate to provide application’s functionality and are relatively tightly coupled.

Replication controller. This provides users with total control over the number of identical copies of a pod operating on the cluster.

Service. A named abstraction that exposes an application running on a set of pods as a network service and load balances traffic among the pods.

Kubelet. Running on nodes, Kubelet takes the container manifests, reads them, and ensures the defined containers are activated and functioning.

kubectl. The command line tool for controlling Kubernetes clusters.

The Business Advantages of Kubernetes

Kubernetes brings obvious benefits when viewed from an IT or DevOps perspective. But how does Kuberentes positively impact the business goals of an enterprise? In five key ways:

1. Accelerated time to market.

Kubernetes allows enterprises to utilize a microservices approach to creating and developing apps. This approach enables companies to split their development teams into smaller groups, and achieve more granular focus on different elements of a given app. Because of their focused and targeted function, smaller teams are more nimble, and more efficient.

Additionally, APIs between microservices reduce the volume of cross-team communication needed to build and deploy apps. Teams can do more, while spending less time in discussions. Businesses can scale different small teams composed of experts whose individual functions help support thousands of machines.

Because of the streamlining effect of the microservices model that Kubernetes empowers, IT teams are able to handle huge applications across many containers with increased efficiency. Maintenance, management, and organization can all be largely automated, leaving human teams to focus on higher value add tasks.

2. Enhanced scalability and availability.

Today’s applications rely on more than their features to be successful. They need to be scalable. Scalability is not just about meeting SLA requirements and expectations; it’s about the applications being available when needed, able to perform at an optimum level when activated and deployed, and not swallowing up resources that they don’t need when they are inactive.

Kubernetes provides enterprises with an orchestration system that automatically scales, calibrates, and improves the app’s performance. Whenever an app’s load requirements change – due to an increasing volume of traffic or low usage – Kubernetes, with the help of an autoscaler, automatically changes the number of pods in a service in order for the application to remain available and meet the service level objectives at the lowest cost.

For instance, when concert ticket prices drop, a ticketing app experiences a sudden and large spike in traffic. Kubernetes immediately spawns new pods to handle the incoming load that is above the defined threshold. Once the traffic subsides, Kubernetes scales down the app back to configurations and metrics that optimize infrastructure utilization. In addition, Kubernetes’ auto-scaling capability not only relies on infrastructure metrics to initiate the scaling mechanism. It scales automatically using custom metrics as triggers to the scaling process.

3. Optimization of IT infrastructure-related costs.

Kubernetes reduces all expenses pertaining to IT infrastructure drastically, even when users are operating on a large scale. The platform groups apps together using hardware and cloud investments, and runs them on a container-based architecture.

Prior to Kubernetes, administrators addressed the instances of unexpected spikes by ordering tons of resources and putting them on reserve. While this helps them handle unforeseen and unpredicted increases in load, ordering too many resources quickly becomes extremely costly.

Kubernetes schedules and solidly packs containers, while taking into consideration the available resources. Because it automatically increases or decreases the load on applications based on prevailing business requirements, Kubernetes helps enterprises free up their manpower resources, which they can then assign to other pressing tasks and priorities.

4. Flexibility of multiple cloud and hybrid cloud environments.

Kuberentes helps enterprises to fully realize the absolute potential of multiple and hybrid-cloud environments. That’s a big plus, considering the number of modern companies running multiple clouds is increasing every day.

With Kubernetes, users find it much easier to run their app on any public cloud service or in a hybrid cloud environment. What this means for enterprises is they can run their apps on the most ideal cloud space, with the right-sized workloads.

This helps them avoid vendor lock-in agreements, which typically come with specific requirements around cloud configurations and KPIs. Getting out of lock-in agreements can be expensive, especially when there are other options that are much more flexible, more cost-effective, and have a bigger ROI, over both the short and long term.

5. Effective cloud migration.

Whether a Kubernetes user requires a simple lift and shift of the app, making adjustments to the app runs, or a total overhaul of the entire app and its services, migrating to the cloud can be a tricky enterprise, even for experienced IT professionals. But Kubernetes is designed to make such cloud migrations much easier.

How? Thanks to the nature of containers, Kubernetes can run across all environments consistently, whether on-premise, cloud, or hybrid. The platform supplies users with a frictionless and prescriptive path to transfer their on-premise applications to the cloud. By migrating via a prescribed path, users don’t have to face the complexities and variations that usually come with the cloud environment.

The Future of Kubernetes

For the moment, enterprise organizations are chiefly using Kubernetes because it is the best way to manage containers, and containers supercharge the possibilities of app creation and deployment. Kubernetes automates processes that are critical to the management of IT infrastructure and app performance, and helps organizations optimize their cloud spend.

The future of Kubernetes could get even more interesting. Chris Wright, VP and CTO of Red Hat, summarizes the new ecosystem that is emerging around Kubernetes: “Just as Linux emerged as the focal point for open source development in the 2000s, Kubernetes is emerging as a focal point for building technologies and solutions.”

Kubernetes is currently the leading container orchestration platform. But increasingly, it is doing more than simply enabling organizations to manage their containers, optimize their cloud apps, and reduce spend. Kubernetes is actually driving new forms of “Kubernetes-native” app and software development. As the shift toward microservices continues to pick up pace, organizations will create and deploy apps with Kubernetes in mind from the jump – as a formative influence, not merely a tool.


What is AIOps?

[ARTICLE]

What is AIOps?

What It Is, What It’s Going to Be, and Everything in Between

Introduction

Executive Summary: AIOps, like DevOps before it, is a growing technological framework that’s bringing great change to a range of industries. Centred on the application of machine learning (ML) and big data science to IT operations problems, impacting a range of technologies and fields – especially those that do a lot of their work in the cloud. Various users – DevOps teams, infrastructure experts, and digitally transformed companies – enjoy the perks of utilizing the technology across a range of AIOps use cases. The implementation is still evolving, but AIOps is rapidly developing in line with modern technology.


What is AIOPS in Layman's Terms?

Here is a straightforward definition of AIOps from Gartner:

“AIOps is the application of machine learning (ML) and data science to IT operations’ problems.”

AIOps technologies bring together big data and ML tools to support a range of IT operations’ functions. These functions can include, but are not limited to: 

  • Availability and performance monitoring;
  • Event correlation and analysis:
  • IT service management and automation. 

An AIOps platform can ingest and analyze massive amounts of data, and through ML and forms of statistical inference, produce useful insights and/or interventions.

Applied to the world of applications, AIOps solutions facilitate the rapid and automated scanning of performance patterns; the detection of anomalies in time-series event data; and the pinpointing of the root cause of application performance issues.

team photo

What makes an AIOps platform?

All systems share certain features:

Machine Learning and AI
The core feature of Artificial Intelligence for IT Operations Systems, machine learning (ML) uses predictive and intelligent analysis to supplement and enhance a system’s decision-making ability.

Real-Time Processing
AIOps systems need to be able to analyze and process large amounts of data at speed. Real-time processing allows enterprise IT organizations to respond immediately to issues like anomalies and security breaches.

Deep Reinforcement Learning

The best AIOps systems leverage deep reinforcement learning (DRL), which converts observed patterns and learned responses into ever more refined algorithmic behavior. With DRL, algorithmic output is used as a new or additional input to alter existing input values.

Pattern Recognition
A true AIOps system is able to recognize and follow complex rules and patterns, in order to accurately detect and assess events, and respond appropriately.

Domain Algorithms
Domain algorithms define the precise operations and decision-making processes that the AI will prioritize. These are specific to an IT organization’s goals and data in a certain industry or environment.

Automation
This is one of the key reasons why AIOps is receiving such enthusiasm from the industry. Effective AIOps solutions and systems reduce IT operators’ workloads by automating menial or repetitive tasks, increasing efficiency on the human side of the enterprise. 

Data Aggregation

Many Artificial Intelligence for IT Operations platforms carry out the collection and statistical synthesis of varying types of data from an eclectic range of sources.


What Happens Inside an AIOps System?

Inside any AIOps system, data from varying sources gets processed by a number of layers of machine learning algorithms. The output from those algorithms can then either be presented as insightful data at the end point, or be used as new input in a continuous cycle until the desired output is achieved.

A good example of AIOps use cases is the cloud optimization of applications, and some AIOps solutions that deal with this use case follow this flow:

After the app code has passed through the CI/CD pipeline, the AIOps-based Cloud Optimization tool begins to measure the performance of that code. 

The AIOps tool formulates predictions about which set of configurations can further improve the performance of the application, or reduce the cost incurred, or both.

Then, the AIOps tool tweaks the settings and configuration parameters, implements the changes, and runs tests. 

While this is going on, the AIOps tool measures data from the testing process and analyzes the data to learn how the changes affected the performance and/or cost. 

The AIOps tool then takes these learnings, compares them to previous data, and makes another set of predictions, which lead to a new set of configurations. 

This cycle, as implied by its name, runs repeatedly, non-stop. This means the system can keep on finding new ways to achieve the highest possible performance with the lowest possible cost.


The Rewards of AIOps

opsani fintech case study 2

So why AIOps? Why are so many companies rushing to implement various AIOps tools? The benefits to a business are numerous:

Speed. AIOps tools operate with unbeatable speed and agility. Because of their real-time processing capabilities, AIOps solutions can offer the insights that a company’s IT operators need with little to no delay. 

Automation. By automating redundant processes and activity, AIOps can free up more time (and mental capacity) for IT operators to focus on more relevant concerns, like solving issues and developing systems.

Holistic view. AIOps tools can provide your business with an all-encompassing view into your IT environment, allowing you to see data from everywhere: compute, network, storage; physical, virtual, and cloud.

Increased business responsiveness. AIOps solutions can capture useful information and make it available in context, allowing your company to make data-driven decisions and refine their response to various scenarios.

Performance and cost optimization. An AIOps platform can also help your company optimize your application’s performance and bring down costs incurred. Considering only 43% of companies across the industry are confident of their application’s performance, this is much-needed.

Increased reliability and decreased downtime. Optimized applications means fewer issues to fix, less friction between specialists and service providers, and minimal disruption to end users.


Who Uses AIOps

AIOps is mostly used by companies with complex operations relying on cloud IT operations. These companies face lots of issues around the complexity and scale of their environments. The industries in question can vary widely, but the common denominators that link companies using AIOps are:

  1. Their large scale;
  2. Their rapid growth or change, and;
  3. They’re need for business and IT agility.

AIOps users can vary by role and department. They include: 

  • DevOps teams. Companies who’ve adopted (or are adopting) a DevOps model face a struggle juggling the different roles involved. AIOps can integrate with DevOps systems and bring new efficiencies. AIOps tools give a holistic, bird’s-eye view of systems. This can be a big help in ensuring increased agility and responsiveness, leading to project success.
  • Cloud or hybrid infrastructure users. Moving to the cloud presents a problem that people often only see as a solution: it’s a much larger environment. Though you can do infinitely more within the cloud, it is also easy to get lost in it. On-prem servers had baked in limits. In the cloud it’s easy to overprovision, and then struggle to control costs. But AIOps solutions can help companies make sense of the functions and the changes that happen in the cloud. An AIOps tool can even help a company optimize application performance and save them millions of dollars in cost.
  • Digitally-transforming companies. Many companies are rushing to implement various forms of digital transformation. In this landscape, there is a growing demand for AIOps systems that can help ease the transition into the digital sphere. AIOps solutions can help IT operations get on par with the speed required to operate in a digitally-transformed company and deliver the kind of support that the organization requires.

Some companies are enjoying the perks of AIOps right now. A major FinTech leader, for example, managed to shave 61% off their costs with the help of our handy AIOps tool. The same tool has also helped Ancestry, the global leader in family history and consumer genomics, achieve an average of 50% reduction in their operation costs, with no performance degradation. And it’s not just these: a smattering of other AIOps use cases can be found in journals and technology articles online.


The Future Is Here

Blog - Why the Rise of DevOps

“Our Age of Anxiety is, in great part, the result of trying to do today’s job with yesterday’s tools and yesterday’s concepts.”

Marshall McLuhan

The IT industry is forever evolving. DevOps has largely replaced the traditional IT department. Microservices have replaced the monolithic model. The cloud has partly replaced on-prem, and so on.

What AIOps is, is just another turn of this wheel. With the revolutionary emergence of AI and machine learning, it is a framework aimed at maximizing the potential of the current IT landscape, so that technologies can flourish. The scale of cloud computing has changed everything. To avoid falling behind, users need systems designed for this new normal. AIOps is one of those systems.

The transition to Artificial Intelligence for IT Operations is still in its early phase, but the battle is heating up and there are already success stories. VCs are placing bets, and vendors small and large are bringing new solutions to the market. These solutions solve a plethora of problems, and still, more AIOps use cases are popping up everyday. It is only logical, then, that AIOps must evolve over the coming years and continue to enable DevOps to embrace the scale and speed of modern development.


On average, when they implement our AIOps Cloud Optimization tool, Opsani users experience a 2.5x increase in efficiency, or a 40-70% decrease in cost. Get in touch today to see for yourself.


What is Cloud Optimization?

Introduction

A brief cloud optimization definition: 

(Noun.) A process or set of processes that enable a company to reduce cloud costs while maintaining or improving application performance.

 

The rise of DevOps has ushered in an era of high-velocity delivery and daily releases of new code. But despite the ever-growing complexity of cloud applications, the post-delivery portion of the Continuous Integration & Continuous Deployment pipeline has been woefully neglected. 

Most enterprises leave their apps running hot and only attempt to tune their apps for performance when they’ve failed to meet SLO and SLA, or as a response to downtime. Most application parameters are left untouched altogether, and enterprises massively overprovision in order to buy peace of mind. 

This oversight is costing large enterprises tens of millions of dollars, and hampering their performance. The goal of cloud optimization is to reverse this trend. Cloud optimization frameworks seek to achieve absolute cost efficiency and optimal performance (while adhering to compliance policies).

Businessman using computer in virtual server room

Organizational Challenge of Pushing for Cloud Optimization

Despite the promise of cloud optimization, often, IT Operations (particularly CloudOps and/or DevOps teams) arrive at an impasse when they try to implement such strategies within their companies.

The reason for such a dilemma stems from CloudOps or DevOps teams typically being stuck between two departments of the company. 

The CFO and the Finance team would love nothing more than to save as much money and company resources as possible, and so they set strict standards. /these standards are often based on previous (and outdated) on-prem infrastructure resource allocations. 

The application owners, on the other hand, are afraid that too little in the way of resources will affect the performance of the application. And so they throw everything they’ve got at it just to make sure, and take it as an affront to have their application’s resources be reduced.

IT Operations personnel frequently end up getting caught between a rock and a hard place. This is one reason why, as research reveals, enterprises are having serious problems when it comes to cloud cost optimization and application performance:

  • 80% of finance and IT leaders report that poor cloud financial management has had a negative impact on their business.
  • 69% regularly overspend their cloud budget by 25%.
  • 57% worry daily about cloud cost management.

This is why cloud optimization is so crucial, especially when turbulent times arrive. Cloud optimization allows CloudOps and DevOps teams to properly control resource allocation and application performance. It maximizes cloud value-per-spend while delivering optimal app performance. When this happens, the CFO and Finance department get what they want, and the application owners can be confident that they don’t need to overspend to get the job done. Best of all, end users are made happier.

Four Best Practices When Implementing Cloud Optimization

Here are some things IT Operations teams should consider when pushing for cloud optimization.

Teams should be careful about using reserved instances when appropriate, along with scheduling or autoscaling underutilized resources, as these approaches can greatly reduce costs.

Strategically employing new cloud infrastructure technologies is a great way to save finances. Aside from reducing operating costs, this also provides a workaround to issues involving environment configuration.

As a company grows, it will need governance for more robust, more defined processes. This governance structure will also ensure that the cloud gets used more efficiently.

IT Operations should make it their business to be aware of live cloud costs and how these costs translate into ROI for the company. If they don’t, the Finance department will keep on giving them a hard time, as the application owners continue to overprovision and overspend.

Why DevOps Demands Cloud Cost Optimization

But why do companies overspend, to begin with? That’s because IT has changed. Before DevOps, this was how it went:

pillar page illustrations 2020 01

Your developers wrote the code, which went through a build phase. After that, there were manual tests on the app and, upon the discovery of bugs or areas of improvement, manual tuning. When the team was sure the app was 100% ready, then they deployed it manually. This cycle was repeated for every new release, which came every month or two.

But with DevOps, it now looks like this:

pillar page illustrations 2020 02

Once the code is written, the CI/CD pipeline picks it up and carries automatic builds, tests, and deployments. The processes happen rapidly, in short cycles, again and again. Within this agile setting, Continuous Integration and Continuous Deployment are the norm.

Most enterprises are already using a robust and effective CI/CD toolchain. They are operating a delivery pipeline where developers blend their work within a single repository, and new code reaches users quickly and safely, generating maximum value. 

But did you notice something missing from the DevOps paradigm? That’s right: there is no effective post-release optimization and tuning. The post-delivery portion of the CI/CD toolchain is totally neglected.

This explains why companies find themselves overspending. The lack of optimization approaches cause application owners to use overprovisioning as a Hail Mary pass to avoid downtimes and service errors.

Why it’s Easy to be In Denial about Optimization

Lots of organizations hear about cloud optimization and they think: “No, that can’t be right. We can’t be non-optimized. We tune our applications!” Well, yes, and no.

In the CloudOps/DevOps paradigm, performance tuning and optimization does happen. But unfortunately, it happens only when things are running hot, or in response to downtime or failure to meet an SLO or SLA. When something in the app breaks, when a team starts to notice over-provisioning in the infrastructure, or when the application performs poorly and customers start complaining – these are the only times that people think of utilizing their cloud optimization tools to address the issues. 

And on the rare occasion that they do tune their apps, teams bring in a set of siloed and only partially effective tools. Why don’t these tools get the job done properly? Because they focus on the code and the app layer (UI, database schema, etc.). APM systems monitor basic app transactions and only trigger alarms if something goes really wrong. At most, they might offer some broad recommendations about how to reduce bottlenecks. 

This isn’t true optimization at all.

Effective cloud optimization is not achieved through troubleshooting only when problems arise, or through relying heavily on APM systems. None of those approaches will affect your cost significantly in the long run. Moreover, your DevOps teams run a great risk of being eventually overwhelmed by recurring issues, and your end user experience worsens.

Cloud Optimization is a Fine Art

True cloud optimization needs to go deeper than surface level recommendations that only focus on the resources. But why are most current cloud optimization services and tools not equipped to do this?

Look at it this way:

Even a simple five container application can have more than 255-trillion resources and basic parameter permutations.

That is a vast amount of configuration tweaks, available at any given moment. To really engage with this system, your DevOps teams would need comprehensive and flawless knowledge of the entire infrastructure:

pillar page illustrations 2020 03

And you would need this knowledge to cover layers across the application, data, and cloud infrastructure stack. On top of this, you would need deep familiarization with the application workload itself. 

It is highly unlikely that any human staff member will possess this knowledge or visibility. The developer who wrote the code is unlikely to be savvy enough when it comes to infrastructure. Even the rare person who is comfortable with both kinds of knowledge – infrastructure, and application workload – is guaranteed to be something of a generalist, and lacks the deep knowledge needed to carry out real optimization. 

And even if they had the knowledge, they couldn’t move fast enough. Because the measuring and tweaking that is needed to continuously optimize your app needs to happen at lightning speed.

This is because modern app workloads are undergoing constant change. Round the clock, developers are releasing new features, middleware is getting updated, behaviour patterns are shifting, and cloud vendors are releasing new resource options. 

Attempting to optimize with the right instance-type, number of instances, and settings for each instance involves numerous interdependencies that are the cognitive equivalent of playing a thousand chess games at once. And due to the speed of these changes, even if you did take the time to understand your infrastructure deeply, by the time you did, that understanding would be outdated.

This is why cloud and mobile apps  chronically run with less performance and more cost than is ideal and possible for that workload: because manual optimization is impossible to do on every layer of your stack.

True Cloud Optimization is Built on AI

Real cloud optimization is beyond the reach of human cognition. 

The solution? Leverage artificial intelligence (AI).

Achieving maximum efficiency for apps operating in the cloud means making judgements and decisions that are too numerous and fast-moving for human minds. But these judgements and decisions aren’t too numerous or too fast-moving for an AI. 

pillar page illustrations 2020 05
This is the basic Cloud Optimization (CO) model:
  • After the app code has passed through the CI/CD pipeline, the Cloud Optimization tool begins to measure the performance of that code. 
  • The CO tool formulates predictions about which set of configurations can further improve the performance of the application or reduce the cost incurred.
  • Then, it tweaks the settings and configuration parameters, implements the changes, and runs tests.
  • While this is going on, the CO tool measures data from the testing process and analyzes the data to learn how the changes affected the performance and/or cost.
  • The CO tool takes these learnings, compares them to previous data, and makes another set of predictions, which lead to a new set of configurations.
  • This cycle runs repeatedly and non-stop. The system keeps on finding new ways to achieve the highest possible performance with the lowest possible cost.

What Real Cloud Optimization Tools Should Do

True cloud optimization services and tools should utilize deep reinforcement learning (Deep RL) to optimize cloud infrastructure. 

Deep RL uses neural networks that are inspired by the connectivity and activation of neurons in the brain. When properly trained, these neural networks can represent your hidden data, allowing cloud optimization tools to build a knowledge base of optimal and sub-optimal configurations, similar to how the brain develops effective patterns of behaviour.

Following an implementation process that should be as straightforward as a simple Docker run command, effective cloud optimization tools should integrate with your existing CI/CD automated deployment pipeline and go right to work. Right away, your entire system should be monitored, and the tool should pay close and granular attention to how the shifts in every setting and parameter affect performance. This information is fed back into the neural network, which processes and learns everything it sees, so that its insights compound. 

This compounding means that the cloud optimization services engine becomes exponentially better at tuning performance and improving efficiency. 

Deep reinforcement learning should enable such cloud optimization tools to continuously examine millions of combinations of configurations to identify the optimal combination of resources and parameter settings. The tools should take in metadata about an application, gradually making tweaks to resource assignments and configuration settings to enhance performance and reduce cost, then continuously remeasuring the data.

Furthermore, effective cloud optimization tools should be able to perfect those settings that are usually judged too complex to touch, such as:

  • Resources
    • CPU
    • Memory
       
  • Middleware configuration variables
    • JVM GC type 
    • Pool sizes
       
  • Kernel parameters
    • Page sizes 
    • Jumbo packet sizes
       
  • Application parameters
    • Thread pools
    • Cache timeouts
    • Write delays
  • And many, many more. 

And because they are constantly gathering new and more powerful data, AI-powered cloud optimization services will be able to constantly uncover more and more new solutions. This often includes counterintuitive solutions that are not apparent to a human user. Such solutions should constantly react to new traffic patterns, new code, new instance types, and all other relevant factors. With each new iteration, their predictions hone in on the optimal solution, and as improvements are found they can be automatically promoted.

With cloud optimization, infrastructure is tuned precisely to the workload and goals of the application – whether those goals relate to cost, performance, or some balance of the two.

As part of reducing costs and boosting performance, cloud optimization tools often lead to:

  • Reduction of application risk and modernization of cloud infrastructure through instance rightsizing 
  • Optimization of various areas of interest (containers, VMs, software licenses, storage, resources, etc.) 
  • Automation through self-aware instances and/or self-optimizing applications
  • Workload reservation and/or routing
  • Efficient cloud migration to achieve digital transformation
  • Real-time response for virtual infrastructures and “bare metal cloud”

Conclusion

In the DevOps era, cloud optimization is a must for any enterprise with cloud-based, medium-to-large applications that have a need to reduce cost while retaining reliability and performance. If you have a solid annual cloud spend (or internal chargeback), and frequent rollouts/updates, then you need cloud cost optimization. Finding the right cloud optimization solution should be one of your top priorities.