Introducing Multi-Armed Bandits in GrowthBook

Discover how Multi-Armed Bandits revolutionize A/B testing in GrowthBook. Learn to optimize experiments automatically, reduce opportunity costs, and make data-driven decisions faster.

· 5 min read
Introducing Multi-Armed Bandits in GrowthBook
Multi-Armed Bandits implementation diagram in GrowthBook's experimentation platform

Bandits have been released in beta as part of GrowthBook 3.3 for Pro and Enterprise customers. See our documentation on getting started with bandits.


Multi-armed bandits allow you to test many variations against one another, automatically driving more traffic towards better arms, and potentially discovering the best variation more quickly.

Bandits are built on the idea that we can simultaneously…

Graph showing the exploration and exploitation stages of bandits where traffic is allocated to better performing variations
Bandits start with equal allocations across variations, and then begin allocating more traffic to the variations that perform better.

In online experimentation, bandits can be particularly useful if you have more than 4 variations you want to test against one another. Scenarios where bandits can be helpful include:

Furthermore, bandits work best when:

Bandit leaderboard from GrowthBook, showing 3 variations' performance over time. Variation 2 is the clear winner!
Bandits help identify the best performing arm among many variations.

Read more about when and how to use bandits in our documentation.

The following table provides an summary of the differences between bandits and standard experiments.

Characteristic Standard Experiments Bandits
Goal Obtaining accurate effects and learning about customer behavior Reducing cost of experimentation when learning is less important than just shipping the best version
Number of Variations Best for 2-4 Best for 5+
Multiple goal metrics Yes No
Changing Variation Weights No Yes
Consistent User Assignment Yes Yes (with Sticky Bucketing)

What makes GrowthBook’s bandits special?

GrowthBook's bandits rely on Thompson Sampling, a widely used algorithm to balance exploring the value of all variations while driving most traffic to the better performers. However, our bandits are different in a few ways that ensure it works well in the context of digital experimentation.

Consistent user experience

Some bandit implementations do not preserve user experience across sessions, making those bandits tricky to use when stable user experiences are important. Because bandits dynamically update the percent of traffic going to each variation, if you run it on users who return to your site or your product and do not preserve their user experience, they may be exposed to a number of different variations over the course of one bandit.

This can lead to:

GrowthBook uses Sticky Bucketing, a service that allows you to store user variations in some service, such as a cookie, so that when they return they always get the same experience, even when the bandit has updated variation weights.

Setting up Sticky Bucketing in our HTML SDK is as easy as adding a parameter to our script tag.

<script async
  data-client-key="CLIENT_KEY"
  src="<https://cdn.jsdelivr.net/npm/@growthbook/growthbook/dist/bundles/auto.min.js>"
  data-use-sticky-bucket-service="cookie"
></script>

Implementing Multi-Armed Bandits in GrowthBook

Accommodates changing user behavior

GrowthBook’s bandits use a weighting method to prevent changing user behavior over time from biasing your results.

What is the issue? As your bandit runs, two things are changing: your bandit updates traffic allocations to your variations, and the kind of user entering your experiment changes (e.g., due to day-of-the-week effects). The correlation between these two can cause biased results if not addressed.

Imagine the following scenario:

You run a bandit that updates daily. You start your bandit on Friday and you have two variations that have a 50/50 split (100 observations per arm). You observe a 45% conversion rate in Variation A, and a 50% conversion rate in Variation B. After the first bandit update, the weights become 10/90 (just as an illustration, the actual values would be different). The total traffic on Saturday is also 200 users, but this time Variation B gets 90% of the traffic. Conversion rates on weekdays tend to be higher than on weekends, regardless of variation. On Saturday, you observe a 10% conversion rate in Variation A and a 15% conversion rate in Variation 2. On both Friday and Saturday, Variation B has larger conversion rates, but if you naively combine the data across both days, Variation A looks like the winner:

Friday Saturday Combined
Variation A 45 / 100 = 45% 2 / 20 = 10% 47 / 120 = 39%
Variation B 50 / 100 = 50% 27 / 180 = 15% 77 / 280 = 27.5%

The combined conversion rate for Variation B is 27.5%, while for Variation A it is 39%, despite Variation B outperforming Variation A on both days of the experiment. Clearly something is wrong here. In fact, sharp eyed readers might notice this is a case of Simpson’s Paradox.

How did we solve it? We use weights to estimate the mean conversion rate for each bandit arm under the scenario that equal experimental traffic was assigned to each arm throughout the experiment. In this scenario, we can recompute the observed conversion rates as if the arms received equal traffic (e.g., on Saturday we had 10/100 conversions instead of 2/20). Using these adjusted conversion rates, the combined conversion rates now make sense:

Friday Saturday Combined
Variation A 45/100 = 45% 10/100 = 10% 55/200 = 27.5%
Variation B 50/100 = 50% 15/100 = 15% 65/200 = 32.5%

Variation B is now the winner. By accounting for the changes in overall traffic on each day, the combined conversion rate now appropriately accounts for the differences in conversion rates and variation traffic over time. Our bandit applies a similar logic to ensure that day-of-the-week effects and other temporal differences do not bias your bandit results.

Built on a leading warehouse-native experimentation platform

GrowthBook is the leading open-core experimentation platform. It is designed to live on top of your existing data infrastructure, adding value to your tech stack without complicating it.

GrowthBook integrates with BigQuery (GA4), Snowflake, Databricks, and many more data warehouse solutions. If your events land within minutes in your data warehouse, then GrowthBook bandits can adaptively allocate traffic within hours or less.

GrowthBook is warehouse native, easily connecting to BigQuery, Databricks, Snowflake, ClickHouse, PostgreSQL, and more.
GrowthBook integrates with your existing data infrastructure (e.g. BigQuery, Databricks, Snowflake, Postgres, ClickHouse, and more)

Get started

To get started with Bandits in GrowthBook, check out our bandits set-up guide, or if you’re new to GrowthBook, get started for free.

Read next

Want to give GrowthBook a try?

In under two minutes, GrowthBook can be set up and ready for feature flagging and A/B testing, whether you use our cloud or self-host.