Two Reasons Not to Trust Google Conversion Lift Studies

Incrementality is a hot topic in advertising right now. For those unfamiliar, it's essentially about answering one question: What is the impact of adding a specific tactic to my media mix?

Individual users are often exposed to messaging across multiple platforms, so the question becomes whether one channel is driving additional value, or whether users would have converted from exposure on another channel.

On Google and YouTube, one of the primary tools for measuring incremental impact is the Conversion Lift Study (CLS).

CLS splits users into two groups, a control group and an exposed group, then measures the lift that the ads have on the exposed group compared to the control. Many advertisers treat the results of these tests as the 100% truth, but in reality they're only reliable in specific scenarios. A few structural constraints can lead to results that cause you to make suboptimal decisions about your media investment.

Below, I'll walk through the two most significant issues I've repeatedly encountered, and shed some light on when to trust these tests and when to take their results with a grain of salt.

Issue #1: Fixed testing windows that don't account for conversion lag

Conversion lift studies run for a fixed number of days. Once that window closes, you receive your results and can calculate the incremental lift of your media. What's frequently overlooked, however, is the time it takes for users to actually complete a purchase after seeing an ad (aka ‘the conversion lag’).

Let’s say you sell backpacks. Some people will see an ad and buy the backpack within a day or two. However, many people will want to compare brands, read Reddit reviews, ask friends, or simply wait until they feel like swiping their credit card. Depending on the product, the gap between an ad click and a completed sale can stretch to several months.

Standard ad attribution models handle this naturally. Once a conversion occurs, credit is attributed back to the originating ad regardless of when the click happened (within a number of days set by the advertiser). However, CLS doesn't work that way. Once the end date hits, the window closes, and any conversions that haven't yet occurred simply don't count.

This means that ads served toward the end of your CLS window will have generated clicks, but the corresponding conversions haven't had time to register, making your ads appear less impactful than they actually were. This effect is especially pronounced in industries with longer purchase cycles (eg: B2B software or vehicles).

To illustrate, let's look at an example:

Suppose it takes 40 days for 100% of conversions to be attributed to a given ad click. In other words, if a user clicks your ad on March 1st, every person who is going to convert will have done so within 40 days. The distribution might look something like this:

What this chart shows is that within 1 day of an ad click, 25% of the users that are going to purchase have already purchased. By day 5, it is 60%. Then it’s a slow trickle for the next 35 days.

This becomes a problem for CLS, because the test stops attributing conversion credit once it reaches its end date.

This means that the percent of potentially attributed conversions might look something like this at the end of a 40 day CLS:

This chart shows what percent of potential conversions have been attributed per day of the CLS. For example, day 1 has 100% of its attributed sales by the end of the CLS because users have had 40 days to complete the purchase. Meaning everyone who is going to purchase from ad clicks on day 1 has done so by the end of the CLS. However, for day 40 only 25% of people have purchased because we haven’t given the remaining 75% of users enough time to complete the purchase process.

This becomes a problem if you are using the CLS as your source of truth. The end result is going to say you drove X% uplift in sales during that time horizon, but that number would actually be much larger if you were able to see the delayed impact.

In our example the conversion impact would be ~25% higher. In that 40 day CLS window you’re only capturing ~80% of the total impact because you haven’t given users enough time to complete the purchase.

So what do you do?

Use CLS to measure actions that happen more quickly
If you have a product with a quick sales cycle, then you should still use that as your conversion action in the CLS. However, if you have a long sales cycle then it might make sense to use an action earlier in the conversion funnel to measure against. For example, if you are a B2B SaaS company, you could use a ‘Contact Us’ form fill as the conversion instead of ‘Closed Won’ deals.

Issue #2: No universal holdout group across simultaneous studies

The other major issue I've consistently encountered with CLS is the lack of a universal holdout group. If you are running multiple conversion lift studies at the same time, users in the exposed group from Study 1 could end up in the control group for Study 2, and vice versa.

In theory, this shouldn't matter. Users from Study 1 are equally likely to land in either the control or exposed group for Study 2, so it should balance out. In practice, however, that's not always how it plays out.

Let’s say you want to see if ads in Study 1 have a larger impact than ads in Study 2.

If a disproportionate number of users from Study 1's exposed group end up in Study 2's control group, the ads in Study 2 will appear to be driving less incremental impact than they actually are. The control group isn't truly unexposed to advertising; they're just unexposed to Study 2's ads specifically, while still being served ads from Study 1. That's not a clean control, and it can meaningfully skew your results.

In an ideal world, your study set up would look something like this:

This is a good clean test. There is one control group, which allows you to measure the lift of Exposed Group #1 to Exposed Group #2 and see which one has a larger impact.

However, this isn’t an ideal world, and this is closer to what the test setup actually looks like:

While it’s still true that Control Group #1 is not shown ads from Exposed Group #1, you have no idea how much of Control Group #1 is seeing ads from Exposed Group #2 and how many users are in both control groups. This can lead to messy results and unclear takeaways.

So what do you do?

Run one CLS at a time

Unfortunately there is really only one fool proof solution for this. You should only be running one study at a time if you want clean results. You can roll the dice and run simultaneous studies, but there should always be a hint of skepticism that the results might not be valid due to murky control groups.

Two Reasons Not to Trust Google Conversion Lift Studies

Three Tactics to Understand The Incremental Impact of Media (With a Bonus)