Researchers from Southern Methodist University and the University of Michigan published a study that examines platforms’ A-B testing of online ads and uncovers significant limitations that can create misleading conclusions about ad performance.
The study, published in the Journal of Marketing, is titled “Where A-B Testing Goes Wrong: How Divergent Delivery Affects What Online Experiments Cannot (and Can) Tell You About How Customers Respond to Advertising” and is authored by Michael Braun and Eric M. Schwartz.
Consider a landscaping company whose designs focus on native plants and water conservation. The company creates two advertisements: one focused on sustainability (ad A) and another on aesthetics (ad B).
As platforms personalize the ads that different users receive, ads A and B will be delivered to groups with diverging mixes. Users interested in outdoor activities may see the sustainability ad whereas users interested in home decor may see the aesthetics ad. Targeting ads to specific consumers is a major part of the value that platforms offer to advertisers because it aims to place the “right” ads in front of the “right” users.
In this study, researchers Braun and Schwartz find that online A-B testing in digital advertising may not be delivering the reliable insights marketers expect. Their research uncovers significant limitations in the experimentation tools provided by online advertising platforms, potentially creating misleading conclusions about ad performance.
The issue with ‘divergent delivery’
The study highlights a phenomenon called “divergent delivery” where the algorithms used by online advertising platforms like Meta and Google target different types of users with different ad content. The problem arises when the algorithm sends different ads to distinct mixes of users using A-B testing: an experiment designed to compare the effectiveness of the two ads.
Braun explains, “The winning ad may have performed better simply because the algorithm showed it to users who were more prone to respond to the ad than the users who saw the other ad. The same ad could appear to perform better or worse depending on the mix of users who see it rather than on the creative content of the ad itself.”
For an advertiser, especially with a large audience to choose from and a limited budget, targeting provides plenty of value. So large companies like Google and Meta use algorithms that allocate ads to specific users. On these platforms, advertisers bid for the right to show ads to users in an audience.
However, the winner of an auction for the right to place an ad on a particular user’s screen is not based on the monetary value of the bids alone, but also the ad content and user-ad relevance. The precise inputs and methods that determine the relevance of ads to users, how relevance influences auction results, and, thus, which users are targeted with each ad are proprietary to particular platforms and are not observable to advertisers.
It is not precisely known how the algorithms determine relevance for types of users and it may not even be able to be enumerated or reproduced by the platforms themselves.
The study’s findings have profound implications for marketers who rely on A-B testing of their online ads to inform their marketing strategies.
“Because of low cost and seemingly scientific appeal, marketers use these online ad tests to develop strategies even beyond just deciding what ad to include in the next campaign. So, when platforms are not clear that these experiments are not truly randomized, it gives marketers a false sense of security about their data-driven decisions,” says Schwartz.
A fundamental problem with online advertising
The researchers argue that this issue is not just a technical flaw in this tool, but a fundamental characteristic of how the online advertising business operates. The platform’s primary goal is to maximize ad performance, not to provide experimental results for marketers.
Therefore, these platforms have little incentive to let advertisers untangle the effect of ad content from the effect of their proprietary targeting algorithms. Marketers are left in a difficult position in that they must either accept the confounded results from these tests or invest in more complex and costly methods to truly understand the impact of creative elements in their ads.
The study makes its case using simulation, statistical analysis, and a demonstration of divergent delivery from an actual A-B test run in the field. It challenges the common belief that results from A-B tests that compare multiple ads provide the same ability to draw causal conclusions as do randomized experiments.
Marketers should be aware that the differences in effects of ads A and B that are reported by these platforms may not fully capture the true impact of their ads. By recognizing these limitations, marketers can make more informed decisions and avoid the pitfalls of misinterpreting data from these tests.
More information:
Michael Braun et al, EXPRESS: Where A-B Testing Goes Wrong: How Divergent Delivery Affects What Online Experiments Cannot (and Can) Tell You about How Customers Respond to Advertising, Journal of Marketing (2024). DOI: 10.1177/00222429241275886
Citation:
Are A-B tests leading market researchers and online advertisers astray? Study says they could be (2025, January 8)
retrieved 8 January 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.