The Beginner's Guide to A/B Testing During Black Friday Cyber Monday

The Beginner's Guide to A/B Testing During Black Friday Cyber Monday

A/B testing two variationsBlack Friday Cyber Monday is coming up fast and you want to “put your best store forward”. So, you might be thinking about a few A/B tests you can run before, during, or after.

Before you get too far down the rabbit hole, know that testing on a major shopping holiday like Black Friday Cyber Monday or Boxing Day is a whole different beast.

I’ll walk you through everything you need to know to earn valid test results and more cash this BFCM. It’s time we tame the beast, folks.

Template Icon

Free Reading List: Conversion Optimization for Beginners

Turn more website visitors into customers by getting a crash course in conversion optimization. Access our free, curated list of high-impact articles below.

Get the free reading list

First, who should be testing during Black Friday Cyber Monday?

There are a lot of people out there who should not be testing.

It’s a fact that doesn’t get discussed enough in ecommerce. We all want to be data-informed, but some sites just aren’t ready to be testing yet.

Let’s say you go through the time-intensive (or costly) process of testing:

  1. You conduct conversion research (surveys, user testing, session replays, data analysis) to figure out what to test.
  2. You design a full variation.
  3. You flex your CSS and Javascript muscles to create the test in your testing tool.
  4. You conduct quality assurance to make sure both variations work properly.

The results after 2 weeks? 310 conversions on variation A and 330 conversions on variation B. That’s a 6.5% uplift, but you don’t actually have enough data to know if B is really better than A. Why? Statistics.

Before you run a test, you should always calculate your sample size ahead of time. You can use this free calculator from Evan Miller to do that:

Sample size calculation

In this example, your current conversion rate for variation A (original) is 5%. So 5% of people who experience the site right now purchase something. “Minimum Detectable Effect” is “the smallest effect that will be detected 80% of the time”.

So, you would need to generate at least a 15% uplift in the case above to accurately call variation B the winner. Anything lower can’t be reliably detected.

Makes sense, right?

In this case, you would need 13,533 people to visit each variation for your results to be valid.

Now let’s assume your current conversion rate is still 5%, but you want to be able to detect that 6.5% uplift we talked about earlier:

Sample size calculation

Suddenly 13,533 jumps to 71,241. And remember, that’s per variation. So, the smaller the difference you want to detect, the larger the sample size you need.

Now you’re starting to see the problem.

Not everyone has that kind of traffic in two to four weeks to send to each variation. (Yes, there’s a limit to how long you can let tests run—more on that in a bit.)

The less traffic you have, the bigger the impact you need to generate a return on investment from testing. Given how time-consuming and/or expensive the four steps (see above) of testing can be, it just doesn’t make sense for low traffic, low conversion sites to be testing.

Does that mean you shouldn’t be optimizing if you have a low traffic, low conversion site? Of course not, you should always be optimizing. Conduct user testing, run surveys, interview your customers—those optimization efforts will yield more results than testing for you right now.

Why is testing during Black Friday Cyber Monday different?

You might be wondering, at this point, why you can’t just read The Beginner's Guide to Simple A/B Testing and be on your way. Well, testing during BFCM (or any other exceptional time, like Father’s Day or Boxing Day) is a bit different.

There’s a sneaky something called sample pollution, which comes into play during BFCM.

I remember listening to Bart Schutz of Online Dialogue presenting at CTA Conference a few years ago. He said something that’s always stuck with me: “If you don’t know about sample pollution, stop testing.”

Sample pollution is essentially any external factor that impacts your tests and leads to invalid results. Unfortunately, sample pollution is often present without the experimenter’s knowledge.

Here’s how it creeps in:

  • Running your test for too long. There’s no universal answer to how long you should let a test run, but the rule of thumb is two business cycles, which usually works out to two to four weeks. If you stop a test before the pre-calculated sample size is reached, you’ve polluted the data. If you stop a test too late, you run the risk of anomalies (your co-founder started a PPC campaigns, it’s BFCM, etc.)
  • Not testing on mobile and desktop separately. We all use multiple devices, right? Let’s say Daniel visits your store and sees variation A on his iPhone. A few days later, he visits your store from his laptop. There’s a 50% chance he’s going to see a different variation than he did the first time. To mitigate this, you should be testing mobile and desktop traffic separately.
  • Visitors deleting their cookies. How often do you delete your cookies? Other people delete their browsing history, too. Let’s say Susan visits your store and then deletes her cookies. When she returns to your store, there’s a 50% chance she’s going to see a different variation than she did the first time.

And that’s just to name a few.

The longer your test runs, the more likely it is that sample pollution will sneak its way in. So, as a general rule, you need to be able to reach your calculated sample size in four weeks or less. Though, you can never truly eliminate sample pollution—you can only limit it.

On top of all this, you want to have a representative sample.

That means you want to make sure the people who are involved in your test are representative of your typical audience.

BFCM makes that difficult because you might: see a spike in traffic, see a spike in purchases, offer deep discounts for a limited time, see people spending more than they usually would, see more bargain shoppers abandoning their carts, etc.

What happens during BFCM is not typical, it’s not representative of your usual audience. So how can you apply insights learned during an unusual time to your usual audience? You can’t, which makes BFCM testing different.

Pros and Cons of Testing During Black Friday Cyber Monday

So, should you test during BFCM at all? Well, it depends, really.

Matt Gershoff of Conductrics says there are no easy answers here:

Expert Headshot

Matt Gershoff, Conductrics

“Ultimately, there isn't a simple yes/no answer. Each company needs to step back and think about the both the expected costs and benefits of running tests, which of course will depend on the particular use case.”

The best thing you can do is use the testing knowledge you just gathered above to weigh the pros and cons of testing during BFCM.

The biggest reason to test during BFCM is, of course, to take advantage of all that extra traffic, all those bigger carts, all those extra purchases.

Matt agrees that it can be a good idea to capitalize on the traffic boost:

Expert Headshot

Matt Gershoff, Conductrics

“It can be reasonable to run tests on these days. Perhaps you have specific questions you want to answer about customer behavior during these sales days that you could then apply for future sales events.

Or maybe you need to determine which version of a campaign or offer will have the highest yield during the sale. Here it might make sense to run a bandit style test to learn and apply the version that is performing best throughout the day.”

We’ll get to bandit style testing in a bit, don’t worry.

There are a few reasons not to test during BFCM as well, though:

  • You’ll likely see an increase in unqualified visitors, leading to more cart abandonment.
  • You’ll see a decrease in your profit margins if you’re running discounts and promotions.
  • By the time the tests are concluded, you won’t be able to implement the winner because BFCM will be over.

Matt explains a few other negatives:

Expert Headshot

Matt Gershoff, Conductrics

“I think there are a couple reasons why companies might opt to not test on these high sales events:

1. The opportunity costs for introducing a bug or technical error can be very high, which may make any potential benefit not worth the risk.

2. The expected value of the information gleaned from testing might be relatively low if either the users themselves, or their behavior, isn't similar to a standard day (which is probably the case).”

Ultimately, the decision is yours, but you have a few testing options to consider before you make that decision.

How to Test Properly During Black Friday Cyber Monday

Finally, we’re ready to start talking about how you can actually run meaningful, valid tests during BFCM.

1. Just Run the Test Again

If you opt to run a regular A/B test during BFCM, make sure you run the test again post-BFCM to confirm the results.

For example, let’s say you’re running a test from November 17th to December 1st. It’s a two week test that includes both Black Friday (November 24th) and Cyber Monday (November 27th).

What performed best during that irregular time might not perform best during the rest of the year. So, what worked during BFCM 2017 might not work in February 2018. Or even BFCM 2018.

So, you’ll want to run the test again after BFCM (or any other shopping holiday) to ensure the results are actually valid.

2. Only Test Returning Visitors

Another option is to segment out your returning visitors and only include them in your test sample. Previous visitors (and customers) will likely be returning for your BFCM sales and promotions, so you should see more returning visitors than usual.

The bonus here is that you know they are akin to your “regular audience” because they visited before the deep discounts came along.

That said, there are still external forces at work here that you can’t control. While testing among returning visitors will give you more reliable data, I would still recommend running the test(s) again post-BFCM to confirm.

3. Run Bandit Tests Instead

Remember I said we’d get to bandit testing? The time has come, my friend. Before diving into why bandit tests make sense for BFCM, let’s back up and talk about what bandit tests even are.

Chances are, when you think of testing, you think of A/B testing:

A/B test visualized

Image Source

A and B go head-to-head, 50% of the traffic seeing variation A and 50% of the traffic seeing variation B.

You could even add a third variation, C, in there and continue splitting the traffic evenly: 33.3%, 33.3%, 33.3%.

There are other types of tests, though. Like bandit tests:

Bandit test visualized

Image Source

With A/B testing, in this example, you don’t declare a winner or start reaping the full rewards until week six of the program when you select Option A as the winner. With bandit testing, Option A gets more and more “screen time” as the weeks go on and it proves itself to be the most valuable.

This is perfect for BFCM testing.

With A/B testing, BFCM will likely be over before you have valid results. And once it’s over, we know the results can’t be generalized to your regular traffic. With bandit testing, the highest value variation (i.e. Option A above) will gradually be shown more often, allowing you to cash in during BFCM.

Alex Birkett of HubSpot explains:

Expert Headshot

Alex Birkett, HubSpot

"The holidays are usually a great time to use bandit testing instead of A/B tests.

There's a lot of reading you can do to learn more about bandit tests in-depth (start here), but just know that they adapt in real-time to exploit the winning variation sooner. So if variation B is performing better, the algorithm slowly adjusts to send more traffic to variation B.

This is super simplified, but bandits are great use cases for short-term campaigns for this reason. Think about it. Running an A/B test requires adhering to multiple time-based criteria, including running to a pre-calculated stopping point and sample size (and also on a representative sample, which isn't the case with holiday traffic).

By the time you've run your A/B test, the holiday period is over and you don't even get to exploit the winning variation for more profit. Bandits let you ‘exploit’ the winning experiences (put them in front of more traffic to get the reward of greater conversions) while you're ‘exploring’ (testing different variations to see which works best)."

Ton Wesseling, Chief Optimization Officer of Online Dialogue, agrees:

Expert Headshot

Ton Wesseling, Online Dialogue

"Don't use the holiday season for learning through running experiments. You won't have the time. Use it to make money now! Your experiments should act like a bandit during the holiday season."

In other words, Alex and Ton suggest using BFCM to make money with testing, not to gather insights with testing. Why? Those insights will come too late and won’t be applicable post-BFCM.

(You can learn more about bandit testing here.)

4. Build Your Email List Instead

Instead of launching major tests during BFCM, Brian Massey of Conversion Sciences suggests focusing on building your email list:

Expert Headshot

Brian Massey, Conversion Sciences

"The holidays will bring visitors to your website that you would never ever be able to reach, thanks to the orchestrated collusion of retailers across the country.

This time of year also changes people. A thoughtful Dr. Jekyll shopper is turned into the crazed Mr. Hyde by the potion of media frenzy and steep discounts.

Mr. Hyde is a “transactional” shopper, hell bent on saving every single dime. If methodical Dr. Jekyll thought shopping was a burden, Hyde sees the shopping experience as a sport.

So, after the holidays pass, and the potion has worn off, you would never expect to see the price-crazed Hyde again – at least not for another year.

If you optimize your site for purchases during the holidays, you will optimize for Hyde. This won’t work the rest of the year, when a restored Jekyll is roaming your marketplace. Instead, optimize to build your list. Get as many Hyde’s on your list as possible, and then spend the rest of the year getting them to buy, after they have returned to reality.

What would you use to get a Hyde on your list? This is what you test.

1. Offers in exit intent popups.
2. Remarketing to cart abandons.
3. Wish lists.
4. Year-round gift club membership.
5. Live chat concierge services.
6. Free purchase protection.

Then spend the rest of the year monetizing this list, this new asset you could never otherwise have."

So, in this case, you’re testing lead capture methods during BFCM. That way, you can turn that traffic and purchase surge into a year-long affair, collecting cash from Dr. Jekyll long after the holidays are over.

Plus, convincing these visitors to return after BFCM means more people to fuel your post-BFCM tests.

Conclusion

Running tests during BFCM is not as black and white as most people think. There’s a lot of gray and contrary opinions, even among testing and statistics experts.

Fortunately, you’re now armed with the info you need to run meaningful tests that will generate revenue during BFCM (and beyond).

Let me know in the comments if you will be running tests during BFCM or if you have any questions.


Still haven't created your store yet?

No problem. Start your free 14-day trial of Shopify—no credit card required!

About the Author

Shanelle Mullin is a content creator at Shopify, helping entrepreneurs grow their businesses faster.

Start your free 14-day trial of Shopify