Skip to content
Logo-black
  • Native Commerce Core™
    • IDC-marketscape-q4-2024-resources-hub-cover-562x562@2x

      Constructor is a leader in the IDC marketscape

      Constructor is a Leader in the IDC MarketScape on Worldwide Knowledge Discovery Software for External-Facing Use Cases, 2024.

        • Search & Autosuggest

          Hyper-personalized ecommerce search driven by AI and real-time shopper behavior.

        • Browse

          Personalize your ecommerce category pages for maximum conversions.

        • Recommendations

          Drive upsells and increase RPV with hyper-personalized product recommendations.

        • Collections

          Dynamic, personalized ecommerce collection pages that learn from every search, click, and purchase.

        • Quizzes

          Create smart product recommendation quizzes that drive shopper engagement.

        • Merchant Controls & Intelligence

          Get the ecommerce merchandising controls and insights you need to make high-impact changes that drive revenue.

        • Attribute Enrichment

          Automatically enrich product attributes and categories to give shoppers the best possible experience.

        • AI Shopping Assistant

          Help your online customers discover great products faster.

        • Retail Media

          Constructor Sponsored Listings complement (instead of compete with) your organic results.

        • Cross-Channel & Offsite Discovery

          Create and launch personalized, data-driven marketing campaigns across digital channels.

        • B2B Use-Cases

          Product discovery technology that’s purpose-built to handle the complex needs of B2B ecommerce.

  • Customers
    • A_B Testing Navigation Image

      The Ultimate Guide to A/B Testing for Search & Discovery

      Discover what your ecommerce team needs to know about fixing broken testing strategies, avoiding wasted effort, and unlocking the full value of A/B testing.

        • Upcoming Events

          Visit us in person at ecommerce industry events.

        • Reports & Guides

          New research, guidance, and insights for the future of retail.

        • Webinars

          Educational webinars, conversations with Constructor partners, and more.

        • Blog

          The latest ideas and news from the frontier of commerce search and discovery.

        • Split/Test Digest

          The most interesting learnings from A/B testing at Constructor.

        • News

          Read Constructor's latest news coverage and press announcements.

        • Documentation

          Learn about the components of Constructor's advanced, AI-powered product discovery suite.

        • About Us

          With AI in our DNA and by keeping ecommerce as our core focus, we’ve built the best product search and discovery solution specifically for the unique needs of retailers.

        • Partners

          Constructor partners with the world’s top retail consultants, systems integrators, and technology platforms.

        • Careers

          Constructor is building the next generation of search solutions.

Login
Book a Demo
  • Native Commerce Core™
    • IDC-marketscape-q4-2024-resources-hub-cover-562x562@2x

      Constructor is a leader in the IDC marketscape

      Constructor is a Leader in the IDC MarketScape on Worldwide Knowledge Discovery Software for External-Facing Use Cases, 2024.

        • Search & Autosuggest

          Hyper-personalized ecommerce search driven by AI and real-time shopper behavior.

        • Browse

          Personalize your ecommerce category pages for maximum conversions.

        • Recommendations

          Drive upsells and increase RPV with hyper-personalized product recommendations.

        • Collections

          Dynamic, personalized ecommerce collection pages that learn from every search, click, and purchase.

        • Quizzes

          Create smart product recommendation quizzes that drive shopper engagement.

        • Merchant Controls & Intelligence

          Get the ecommerce merchandising controls and insights you need to make high-impact changes that drive revenue.

        • Attribute Enrichment

          Automatically enrich product attributes and categories to give shoppers the best possible experience.

        • AI Shopping Assistant

          Help your online customers discover great products faster.

        • Retail Media

          Constructor Sponsored Listings complement (instead of compete with) your organic results.

        • Cross-Channel & Offsite Discovery

          Create and launch personalized, data-driven marketing campaigns across digital channels.

        • B2B Use-Cases

          Product discovery technology that’s purpose-built to handle the complex needs of B2B ecommerce.

  • Customers
    • A_B Testing Navigation Image

      The Ultimate Guide to A/B Testing for Search & Discovery

      Discover what your ecommerce team needs to know about fixing broken testing strategies, avoiding wasted effort, and unlocking the full value of A/B testing.

        • Upcoming Events

          Visit us in person at ecommerce industry events.

        • Reports & Guides

          New research, guidance, and insights for the future of retail.

        • Webinars

          Educational webinars, conversations with Constructor partners, and more.

        • Blog

          The latest ideas and news from the frontier of commerce search and discovery.

        • Split/Test Digest

          The most interesting learnings from A/B testing at Constructor.

        • News

          Read Constructor's latest news coverage and press announcements.

        • Documentation

          Learn about the components of Constructor's advanced, AI-powered product discovery suite.

        • About Us

          With AI in our DNA and by keeping ecommerce as our core focus, we’ve built the best product search and discovery solution specifically for the unique needs of retailers.

        • Partners

          Constructor partners with the world’s top retail consultants, systems integrators, and technology platforms.

        • Careers

          Constructor is building the next generation of search solutions.

Login
Book a Demo
Back to Blog

Still Running Classic A/B Tests? Here’s Why That Might Be a Big Mistake

Customer Experience Best Practices Ecommerce Technology
Posted on:
May 27, 2025
Author:
Polina Egubova
why running classic a/b tests could be a mistake in ecommerce search and product discovery
Table of Contents:
why running classic a/b tests could be a mistake in ecommerce search and product discovery

Imagine this. You’re running an A/B test. You’ve followed every best practice: scoped the hypothesis, implemented the feature cleanly, launched to 50% of users, and hit your minimum sample size. You check your metrics: revenue is up, conversion is climbing, and bounce rate is stable. 

It looks like a win, so you ship it.

Then, a few weeks later, something breaks. Revenue mysteriously dips. Engagement softens. You trace everything back to that “successful” test even though the data clearly showed that it worked. 

What happened?

What if I told you that your A/B test lied to you?

Introducing AABB Testing: A Validity Layer You Didn't Know You Needed

Traditional A/B tests have a fatal flaw: they assume everything is working as expected behind the scenes. Users are properly bucketed, your experimentation platform is unbiased, your data pipelines are clean, and attribution is stable.

But what if those assumptions fail?

AABB testing is a simple extension of the traditional A/B framework designed to catch these silent failures before they cost you time, trust, and revenue.

What is AABB testing?

Instead of splitting users into just two groups — A (control) and B (variant) — you split them into four:

  • A1 and A2: Two identical control groups

  • B1 and B2: Two identical variant groups

Some teams run an AA test before an AB test to validate the split, but this can be misleading because it doesn't account for how the feature itself might impact assignment. An AABB setup, by contrast, tests both the split and the feature under real conditions, helping detect feature-specific bugs or integration issues that an AA test would miss.

To be clear, this should not be confused with an ABCD test. An ABCD test compares four distinct variants to explore a broader range of ideas, while an AABB test repeats two variants across multiple groups to assess the consistency and reliability of results — prioritizing validation over exploration.

Since A1 and A2 are serving the same experience, they should behave identically. Same with B1 and B2. If they don’t, something’s broken, and now you know about it a lot sooner.

There’s zero power loss. You get the same statistical strength as a traditional A/B test, but now with much more confidence.

Why Validity Should Come Before Impact

An A/B test gives you an answer. But what guarantees that it’s the right answer?

You may trust your experimentation platform. It likely touts robust randomization, statistical rigor, and corrections like CUPED, CUPAC, or Bayesian smoothing. But no platform is immune to:

  • Cookie loss in Safari

  • Mismatched user IDs

  • Traffic routing bugs

  • Feature flag inconsistencies

  • Analytics attribution errors

And when those failures happen, your platform won’t raise a flag. But an AABB setup will.

Traditional Validity Checks Are Painful (And Incomplete)

Yes, you could try to catch these issues manually by taking steps that include:

  • Scrub for Sample Ratio Mismatch (SRM)

  • Compare pre-period metrics across groups

  • Monitor p-value stability over time

  • Review unrelated KPIs for odd spikes

  • Segment by geo, device, referrer, and browser

  • Check for jumpers, carryover, and bleed-through

But these checks are manual, brittle, and often skipped when time is short. Worse, many issues won’t appear until it’s too late.

The AABB Shortcut: One Built-In Sanity Check

Here’s the magic of AABB:

If A1 ≠ A2 (or B1 ≠ B2), your experiment is broken.

That’s it.

No 30-step checklist. No analysis rabbit holes. Just a simple test for test validity, embedded in the structure of your experiment itself.

How to Implement AABB Testing 

Running an AABB test is nearly identical to running a normal A/B test:

  1. Randomize into four groups: A1, A2, B1, B2

  2. Serve experiences: A1 and A2 get the control. B1 and B2 get the variant

  3. Check internal consistency:

    • Compare A1 vs A2 — should be statistically identical

    • Compare B1 vs B2 — likewise

  4. If consistent: Merge A1+A2 = A, B1+B2 = B and proceed with analysis

  5. If inconsistent: Stop and investigate because something is wrong

Best of all, there's no statistical power loss. You get all the rigor of traditional A/B testing (and a whole lot more peace of mind).

Real Stories: When AABB Caught the Hidden Bugs  

Through thousands of live experiments with our customers, we’ve found that AABB testing has repeatedly surfaced issues that classic A/B tests would have missed — from bucketing bugs to attribution errors and more. 

Below are some real-life examples from our data science team:

Case 1: Broken User ID assignment 

  • What We Tested: New product card layout

  • The Issue: User IDs were only assigned post-login, and returning users were bucketed inconsistently

  • AABB Signal: A1 and A2 showed a significant revenue per visitor (RPV) difference

  • Conclusion: Identified bucketing error and relaunched with server-side fix

Case 2: Attribution bug in paid traffic

  • What We Tested: Updated filter UI

  • The Issue: Paid traffic was overrepresented in A1 due to a bug in Urchin Tracking Module (UTM) parsing

  • AABB Signal: A1 outperformed A2 by 15% (p < 0.01)

  • Conclusion: Corrected traffic handling before making a false-positive decision

Case 3: Cookie loss in Safari 

  • What We Tested: Product comparison feature

  • The Issue: Safari cleared cookies between sessions, causing users to jump groups

  • AABB Signal: A1 ≠ A2 only in Safari

  • Conclusion: Switched to server-side assignment for consistency

Case 4: Hidden promo banner 

  • What We Tested: Homepage redesign

  • The Issue: Promo banner accidentally enabled in A1 only

  • AABB Signal: CVR up +20% in A1 alone

  • Conclusion: Caught a false lift before shipping the redesign

Case 5: Auth state affected rendering 

  • What We Tested: Logged-in recommendation block

  • The Issue: JS behavior diverged based on auth state and group

  • AABB Signal: A1 vs A2 only diverged for logged-in users

  • Conclusion: Found and fixed rendering-layer platform bug

What If You Don't Use AABB? 

Skipping AABB might seem harmless — until you're dealing with confusing results, misinformed decisions, and hours of lost time chasing down issues that could’ve been flagged instantly.

Let’s imagine two scenarios with a traditional A/B test:

  • You see a lift and ship it. The results look positive. Metrics are up, the p-value checks out, and stakeholders are eager to move fast. You push the variant live, confident it’s a win. But the improvement wasn’t real. Maybe a bug skewed traffic allocation. Maybe returning users weren’t bucketed consistently. Maybe only one control group had a hidden feature enabled. A few weeks later, revenue drops, and no one knows why. You’re forced to reverse-engineer what went wrong after the fact, undermining trust in both your data and your team.

  • You see noise and dig deeper. The experiment doesn’t show a clear result. Metrics are jumpy. One group looks off, but you can’t tell why. Instead of moving on, your analysts spend days diving into logs, debugging pipelines, checking bucketing logic, and trying to piece together what broke. Eventually, they find the issue: a subtle misconfiguration, traffic skew, or platform-level bug. The test was invalid all along, and all that time and effort could’ve been saved.

With AABB, both of these risks vanish.

AABB = Trustworthy Experiments, Less Drama 

A/B testing is already hard. Feature rollouts, metrics alignment, and stakeholder pressure are a lot. The last thing you want is to be blindsided by data you thought you could trust.

AABB testing adds a single safeguard step that helps you verify test integrity before you make decisions you can’t undo.

  • No extra lift

  • No power loss

  • Huge gains in confidence

So, the next time you're about to launch an experiment, ask yourself: Are you ready to bet your roadmap on results you haven’t verified? Or would you rather run an AABB test and be confident your results are real?

Start Running Better Experiments Today 

Download The Ultimate Guide to A/B Testing for Search & Discovery and take the guesswork out of search, merchandising, and product discovery.

Subscribe to our Experiments Blog

Get conversion-driving insights delivered straight to your inbox.

What's next?
Eyeglasses icon

Uncover lost revenue opportunities with a complimentary Search Experience Audit.

According to the 2024 State of Ecommerce Search & Product Discovery Survey, nearly 70% of shoppers think the search function on retail websites needs an upgrade. Our team has run over 1000 A/B tests to identify easy-to-implement algorithmic and UX improvements that get results. Use their research to your advantage with a complimentary Search Experience Audit — no strings attached.

Request an Audit
  • Analyze your search results quality
  • Identify "no results" pages
  • Pinpoint irrelevant results for long-form queries
  • Uncover UX opportunities for Search, Autocomplete, and PLPs

According to the 2024 State of Ecommerce Search & Product Discovery Survey, nearly 70% of shoppers think the search function on retail websites needs an upgrade. Our team has run over 1000 A/B tests to identify easy-to-implement algorithmic and UX improvements that get results. Use their research to your advantage with a complimentary Search Experience Audit — no strings attached.

Request an Audit
  • MACH Certified
  • 21972-312_SOC_NonCPA-150x150
  • Cartification-mark_04-09-300x253-1-150x150
Constructor.io Logo
  • Technology
    • Native Commerce Core™
    • Request a Demo
    • The Proof Schedule®
  • Solutions
    • Search & Autosuggest
    • Browse
    • Recommendations
    • Collections
    • Quizzes
    • Merchant Controls & Intelligence
    • Attribute Enrichment
    • AI Shopping Assistant
    • B2B Solutions
  • Resources Hub
    • Events
    • Reports, Guides & Webinars
    • Blog
    • Experiments
    • News
    • Constructor vs Alternatives
    • Documentation
  • About
    • About Us
    • Partners
    • Security & Compliance
    • Careers
  • Technology
    • Native Commerce Core™
    • Request a Demo
    • The Proof Schedule®
  • Solutions
    • Search & Autosuggest
    • Browse
    • Recommendations
    • Collections
    • Quizzes
    • Merchant Controls & Intelligence
    • Attribute Enrichment
    • AI Shopping Assistant
    • B2B Solutions
  • Resources Hub
    • Events
    • Reports, Guides & Webinars
    • Blog
    • Experiments
    • News
    • Constructor vs Alternatives
    • Documentation
  • About
    • About Us
    • Partners
    • Security & Compliance
    • Careers
© 2025 Constructor All rights reserved
  • Terms of Service
  • Privacy Policy