A/B Testing for AI Agents

Run a live experiment with two versions of your AI Agent at the same time. Configure a Variant B with different settings, split traffic between the two, and compare results in the Intelligence dashboard.

7 min read

Updated April 2026

Overview

A/B Testing lets you run two versions of your AI Agent in parallel without any interruption to your visitors. One group of visitors sees your current live configuration (Control A). A separate group sees your test configuration (Variant B). Once you have enough data, you compare performance in the Intelligence dashboard and declare a winner.

Note

A/B Testing is available on the Pro plan and above.

What You Can Test

Variant B supports the following configuration fields. Any field you leave blank inherits its value from your base agent settings.

Bot Name - Test different agent personas
Welcome Message - Test different conversation openers
Personality - Test different tones or communication styles
Primary Color - Test different widget color schemes
Suggested Prompts - Test different conversation starters shown to visitors

Per-Page Prompts take priority

If you have Per-Page Suggested Prompts configured, those take priority over both Control and Variant B suggested prompts for matching pages. This is intentional, because page-specific prompts are more relevant than general ones.

Setting Up a Test

Go to your AI Agent settings and open the A/B Testing tab.
Toggle Enable A/B Test to create Variant B.
Fill in the fields you want to test in the Variant B card. Leave any field blank to inherit from Control A.
Use the traffic split slider to set how much of your new visitor traffic should see Variant B. The default is 50%.
Click Save Variant to apply your changes. The test starts immediately.

Draft agents

A/B Testing is not available for draft agents. Publish your agent first before enabling a test.

How Visitors Are Assigned

Each new visitor is assigned a stable random token that is stored in their browser. This token is used to deterministically assign them to either Control A or Variant B based on your traffic split.

A visitor who is assigned Variant B will always see Variant B on return visits, until the test ends. This prevents the jarring experience of seeing a different agent on each visit.

Comparing Results

Once your test has been running long enough, go to the Intelligence dashboard for your agent. You will see a Variant Comparison tile at the bottom of the dashboard.

The tile shows the following metrics for each variant:

Conversations - Number of unique visitor sessions
Messages - Total messages exchanged
Positive Rating % - Percentage of rated messages that received a positive rating

Minimum data requirement

The Intelligence dashboard requires at least 50 conversations per variant before showing comparison results. Until that threshold is met, you will see a “Not enough data” state. This prevents decisions based on statistically insignificant samples.

Declaring a Winner

Once you have enough data, you can declare a winner from the Intelligence dashboard.

Declare Variant B Winner - Copies the Variant B configuration fields onto your base agent and stops the test. All future visitors see the winning config.
Keep Control (A) - Stops the test and keeps your current base configuration. Variant B is deleted.
Reset Test - Deletes Variant B without changing your base agent. Use this to discard the test entirely and start over.

Tip

Past conversation history is never deleted. Chat records that were assigned to Variant B keep their variant reference for future analysis, even after the test ends.

Pausing a Test

Toggle Enable A/B Test off in the A/B Testing tab at any time to pause the test. While paused, all visitors see Control A. Your Variant B configuration is saved and the test can be resumed by toggling it back on.

Per-Page Suggested Prompts

Integrations