Google's Hipster Shop

[CASE STUDY]

Google’s Hipster Shop

How Opsani Boosted Application Efficiency by over 8X

Overview

The Opsani autonomous tuning service was able to nearly double the throughput on the Google Hipster Shop while simultaneously cutting cloud costs by 79%.

Opsani’s algorithm was able to efficiently search across over 70-quintillion possible permutations, using fewer than 120 test configurations to derive settings with 8x more efficiency. The entire process was set up in less than 20 minutes with our straightforward and easy onboarding tool.

Opsani Optimization at Work

Opsani’s optimization starts with a series of calibrating steps, ensuring that the performance of the application is stable and consistent. Then, it begins the tuning process.

Opsani creates a machine learning model of the application’s performance characteristics and performs a series of tests of the Hipster Shop to learn the maximal efficiency configuration of the internal model. Using real data fed back after each test, this model is able to correct itself when it explores a bad configuration, and more deeply explore areas that produce improvements.

The Opsani SaaS is able to uniquely determine the best algorithm (and any such variant) to use, including setting values for hyperparameters. In the case of the Hipster Shop, the optimizer decided to use a variant of the Bayesian algorithm. Bayesian optimization is a sample-efficient, global black-box optimization algorithm. Since optimizing the Hipster Shop is relatively high dimensional, having 22 tunable parameters with mostly a priori unknown relationships, black-box optimization is a sensible choice. The user does not need to make any decisions about which optimization algorithm to use, or even be aware of the different algorithms, as Opsani autonomously determines the best such technique.

Executive Brief

Hipster Shop is a web-based, e-commerce demo application from the Google Cloud Platform. We wanted to see how Opsani’s cloud optimization would improve performance and cut costs with the Hipster Shop application. After less than 20 minutes of setup and a day-and-a-half of running, Opsani’s algorithm managed to nearly double Hipster Shop’s throughput and achieve an 800% improvement of the performance-to-cost ratio. Scaling back resources on 20 out of 22 settings and components, Opsani reduced Google Hipster Shop cloud costs by 79%.

IndustryeCommerce
App Resources22 Parameters
ImplementationLess than two days

The Opsani algorithm is able to come up with its result in only a day and a half.

In less than two days of running:

79% reduction in cloud spend

nearly 2x the app throughput

800% better application efficiency


The Setup

All that needs to be done to produce similar results to run the command ./install.sh. Opsani handles the rest. The bash script simply runs the following two commands, using the appropriate namespace and OPTUNE_AUTH_TOKEN:

kubectl create secret generic optune-auth-hipster-shop –from-literal=’token=@@OPTUNE_AUTH_TOKEN@@’ -n @@application_namespace@@

kubectl apply -f ./servo-base/ -n @@application_namespace@@

The first command constructs a Kubernetes secret which stores the authorization token. This token is necessary for the servo, which exists on the user’s Kubernetes cluster, to communicate with the Opsani API.

The second command creates the servo on the Kubernetes cluster. The servo communicates with the Opsani SaaS, sending measurements of the app’s performance and receiving adjustments to make.

The servo is created in five simple steps:

  1. First, a ConfigMap is defined. This ConfigMap informs the servo which services are meant to be optimized, which settings are available to be tuned, and the acceptable range of values for these settings.
  2. The ConfigMap then indicates which Prometheus queries must be made to measure Hipster Shop throughput. The Opsani ML SaaS makes decisions based on the results of these queries.
  3. Third, the ConfigMap defines how to generate load during the measure cycles.
  4. Next, the servo’s Deployment is created. The servo communicates with the Opsani API, sending information about the cluster and the app’s performance passed to the API.
  5. The Opsani SaaS then decides on a resource setting to try, and sends this information back to the servo.


The Results

Below is a table summarizing the performance and cost of the Hipster Shop before optimization and after running the Opsani optimization algorithm.

The improvement of the performance-to-cost ratio from 147,000 to 1,190,000 is an improvement of over 700%!

Consider the complexity of the problem. The Opsani SaaS attempted to optimize the CPU and memory of 11 microservices. The acceptable ranges are presented in the table below.

Each setting has 8 possible values, creating a 22 dimensional problem space with 822, or nearly 74 quintillion, possible configurations – that’s roughly eight times more configurations than there are NCAA brackets. This is a significantly difficult problem to attack, and the Opsani algorithm is able to come up with its result in only a day and a half.

As is apparent from the above table, this model highlights that the product catalog and front-end services are the most important in optimizing performance given the level of and type of load being generated, as they both require the most cpu cores.

Moreover, this model disco vers that many components are over-provisioned by default and scales these back. While it is often standard to assume that each microservice is equally important and requires the same resources, these results show why that is very often bad practice. Through these realizations, the Hipster Shop is able to have significantly better performance for a significantly lower overall cost.

Want these results on your app?

If you would like to try the tutorial for yourself, please click here. If you would like to read more about Opsani or continuous optimization, read this guide. If you would like to request a free trial, CLICK HERE.


FinTech Leader Saves Over $3M/Year on SaaS Operations with Opsani

[CASE STUDY]

FinTech Leader Saves Over $3M/Year
on SaaS Operations with Opsani

Opsani Cloud Optimization delivers 68% savings and a 12% performance boost

Customer

“The Company” provides SaaS financial management solutions. The App involved has over three million active users. After a successful transition to the cloud, the Company reduced idle servers and could elastically provision resources. During the shift, they also completed a fully automated DevOps toolchain and CI/CD pipeline.

Challenge

The Company undertook a successful transition to the cloud. However, post-transition, their cost and performance metrics did not meet expectations. Performance-tuning by the Company’s DevOps team failed to resolve the problem. New releases were frequently being delayed. The Company needed to find a way to reduce server idleness and elastically provision resources.

Executive Brief

The Company provides SaaS financial management solutions for over three-million active users. Opsani became involved after they completed the lift-and-shift of their application to the cloud. Within the quarter, Opsani’s AI-drove cloud optimization system produced both increased performance and reduced cost. As a result, the customer saved millions per year and achieved a positive ROI in the same quarter.

IndustryFinancial Services
App Resources1300+ Virtual Machines
Cloud Spend$5 Million / Year AWS
ImplementationLess than one quarter

Engineers were looking to:

  • Improve performance predictability
  • Determine efficient resource settings
  • Optimize Java performance

  • Protect or improve user experience
  • Expedite new releases

12% Performance Increase

68% Cost Reduction

232% Better Application Efficiency

All within a one quarter ROI Period


Solution

The first step for the engineers was to define a measurable performance unit. They chose to optimize response time at a fixed load. Throughput and error counts were also monitored to bound the solution set. Opsani added plugins to integrate with the current CI/CD system so it could collect data from existing monitoring tools, detect changes and feed resource and configuration parameters into the SaaS service for optimization.

Next, Opsani took some time to calibrate and analyze the data to condition the AI and ML which would help Opsani’s engine produce better configurations.

Without this data conditioning, the AI and ML would produce suboptimal results. With the setup complete, optimization began. In the first run, Opsani adjusted only JVM parameters. In just days, the system was able to boost performance by 10%.

The team then began a more expansive optimization run. Using AI and ML, Opsani automatically selected, tested and tuned resource parameters including combinations of memory, CPU, and instance count to optimize for best efficiency.

Opsani AI produced results within a week of the optimization start.


Results

Within month one, the Company’s application experienced
a host of performance benefits:

  • With Opsani, 90% percentile (P90) latency time came down from 150 milliseconds to 110 milliseconds
  • Availability improved by an average of ten seconds
  • Operations experienced a 10x reduction in page notifications
  • “GC full events” decreased by 91%
  • Restarts decreased by 78%
  • A total of 5,000 minutes of uptime were recovered within the month
  • Release cycles made quicker by an entire week

On top of this, cloud optimization enabled the Company’s teams to unlock some major cost savings. They were running 50 different application clusters with 30 machines per cluster. Once Opsani optimized the application, the Company were able to reduce the number of machines quite significantly, and go to a slightly larger machine.

In less than a week, the system autonomously ran numerous tests and produced compelling results. By automatically adjusting resource parameters across the full application stack, Opsani AI identified configurations to achieve the Lowest Cost, Best Efficiency, and Best Performance. Ultimately, the Company chose the most efficient configuration, saving $3.5M annually on cloud infrastructure, improved performance by 12% and boosted efficiency 232%. Performance is now consistent which improves the user experience. With Opsani’s AI Cloud Optimization integrated into their CI/CD toolchain, the Company has now implemented automated continuous optimization for all future releases. This allows for continuous findings for the optimal runtime configuration settings that are too complex for humans. As they prepare to update their middleware libraries, the Opsani AI is integrated, CI/CD/CO.

With Opsani, cost came down by 74%, equating to hundreds of thousands of dollars cut from the monthly AWS bill.

After engaging with Opsani, the Company achieved positive ROI in the same quarter the project started and is now saving millions in cloud spend annually. The customer gained control over cloud cost and increased the time to market for added features which deliver both top and bottom-line growth to their business.


Leading FinTech Company Boosts SaaS User Experience

[CASE STUDY]

Leading FinTech Company Boosts SaaS User Experience

and Slashes Cloud Bills With Opsani

Challenge

The company’s performance team were trying to tune the Java environment of the application UI, which generates the SaaS UI from the backend infrastructure (without displaying databases, authentication and other elements that aren’t relevant to a user). The company’s performance team’s primary goal was to improve user experience. A key part of achieving this was reducing latency, which sat stubbornly at 150 milliseconds across dozens of shards. The company also wanted to reduce the frequency of “GC full events” occurring in Java Virtual Machines (JVMs). Whenever they occur, GC full events can slow down an application, consume excessive CPU, and impact the user experience.

If the company’s team could reduce latency and minimize GC full events, they would increase availability, make the UX snappier, and produce more of a real-time interaction for users. As well as implementing these user-friendly improvements, they would also trim their AWS bill – which they knew was higher than it needed to be – without any negative impact upon performance.

However, effective human tuning of the Java environment was proving impossible. Despite the efforts of the SaaS tuning team, the performance was inconsistent between releases. This was having an ongoing impact on user experience, which was intermittently less than optimal. And their cloud bills were staying the same, despite the tuning team knowing that application parameter optimizations were possible.

Executive Brief

The company is the industry leader in financial, accounting, and tax preparation software, with annual revenue of almost seven billion dollars. Their SaaS service, is one of their key product offerings and is utilized by millions of global businesses.

IndustryFinancial Services
App Resources1000s of Virtual Machines
Time to OptimizeLess than one quarter

The company turned to Opsani to implement AI-driven cloud optimization and achieve their goals for their SaaS application.

Over one month, the Opsani cloud optimization tool:

Cut application latency by 10%

Recovered 5,000 minutes of uptime

Restarts decreased by 78%

Trimmed AWS EC2 bill by 72%

Optimization Challenge

The optimization challenge was significant. Cloud applications are by nature extremely complicated, offering trillions of resource and parameter permutations. Runs over 10,000 transactions per second. To reduce latency, the Opsani cloud optimization engine targeted a variety of changes:

  • Number of instances per shard (on the horizontal scale).
  • EC2 instance type/family (on the vertical scale).
  • Five different Java config parameters affecting Garbage collection strategies, intervals, and heap sizes. And this barely scratches the surface of what can be tweaked to improve the performance of a cloud application.

These parameters are all interrelated, offering trillions of possible combinations. The Opsani engine optimized for work done – number of transactions completed – while maintaining the service-level objective (SLO) – response time, error rate – and minimizing the cost. The cost was computed as the price of one of the selected instance types multiplied by the number of instances.

The Opsani engine examined the trade-off between having many cheap nodes as compared to fewer, larger nodes. Different instance types affected the amount of memory available for Java Heap, so different heap sizes had to be explored as a dependent variable. Moreover, for their shards serving the US market, Intuit could reduce their total number of compute instances by going with VMs with more memory.


Results

After the optimization period, The company’s SaaS application experienced a host of performance benefits:

  • Faster application response time: 90% percentile (TP90) latency time improved by 10%
  • A total of 5,000 minutes of uptime were recovered within 2 weeks
  • Operations experienced a 10x reduction in pager notifications
  • “GC full events” decreased by 91%
  • Release cycles made quicker by an entire week

Opsani’s cloud optimization enabled a major FinTech player to unlock major cost savings.

On top of this, Opsani’s cloud optimization enabled The company to unlock major cost savings. Prior to starting cloud optimization, the company’s team was expecting cost savings of at best of 20%. However:
With Opsani, the cost came down by 72%, equating to hundreds of thousands of dollars cut from the monthly AWS bill.

The company was so impressed with the benefits of cloud optimization that over the coming months they are expanding the technology across more of their applications.


Ancestry and Opsani Case Study

[CASE STUDY]

Opsani & Ancestry.com Case Study

Cut Cloud Costs and Boost Kubernetes Application Performance

Ancestry, the global leader in family history and consumer genomics, has gone enterprise-wide with Opsani’s AI driven cloud optimization service for Ancestry cloud applications to achieve the lowest possible cost while maintaining their performance.

By choosing to integrate advanced machine learning tools for cloud optimization into its existing CI/CD pipeline, Ancestry is able to achieve lower cost, efficiency, stellar performance, and a better customer experience with every new version release.

The Challenge

Ancestry has more than three million paying subscribers and a collection of over 20 billion records. The company moved its operations to the cloud to enable it to scale with its customer base and implemented CI/CD processes to facilitate rapid feature rollout. However, with rapid growth in both its user base and the range of products it provides, Ancestry was hard-pressed to ensure that it was achieving optimum performance, efficiency, and customer experience with its cloud applications, while also efficiently spending their cloud budget.

The Opsani Solution

The integration of Opsani adds a new best practice to Ancestry’s CI/CD toolkit: automated continuous optimization (CO). By using Opsani’s AI-driven CO technology enterprise-wide, Ancestry’s DevOps teams have gained:

  • Assurance of the lowest possible cloud cost and highest application performance.
  • Superpowers to predict and deploy the most optimal cloud application runtime settings.
  • Loyalty from happy customers!

Early results demonstrate:

  • An average of 50%+ in cloud cost reduction.
  • Up to 230% performance efficiency gains.

Russ Barnett | Chief Architect

At Ancestry, our customer obsession drives us to continuously innovate and use cutting-edge technology to empower journeys of personal discovery for millions of our users. As the company continues to grow and invest in new products, our efficiency and performance are more important than ever. Opsani will allow us to manage costs, maintain optimal performance of our cloud resources and gain visibility in an increasingly complex environment.

Ancestry was looking to:

  • Successfully implement cloud cost optimization.
  • Determine efficient runtime settings.
  • Improve performance predictability.
  • Protect and improve user experience.
  • Expedite new code releases.


UP TO 230%
PERFORMANCE EFFICIENCY GAINS


50%+ CLOUD COST
REDUCTION

Ancestry, the global leader in family history and consumer genomics, harnesses the information found in family trees, historical records, and DNA to help people gain a new level of understanding about their lives. Ancestry has more than three million paying subscribers across its core Ancestry websites with an extensive collection of over 20 billion records and has more than 15 million people in the AncestryDNA network.

Since 1996, users have created over 100 million family trees and 11 billion ancestor profiles on the Ancestry flagship site and its affiliated international websites. Ancestry offers a suite of family history products and services including AncestryDNA®, Archives®, AncestryProGenealogists®, Newspapers.com™ and Fold3®. AncestryDNA is owned and operated by Ancestry.com DNA, LLC, a subsidiary of Ancestry.com Holdings LLC.