11. How do you determine when your results are statistically significant.
Answer: This is a little bit tricky. It's more complicated than many people assume, because they've been exposed to overly simplified rules of thumb.
But stick with me, because I think I can help you get a good handle on it.
Let's take the single variable case first. How do you determine if a simple split test has significant results?
The old rule of thumb was that you wait for the leading option to have 30 actions, and then you choose the winner.
But here's the problem: Suppose headline A has a 30/1000 conversion ratio, and headline B has a 29/1000 conversion ratio. Can you really be sure that Headline A is better? No. In fact, there's probably a 40% chance or so that headline B will prove to be better in the long run.
On the other hand, if headline A has a 20/1000 conversion ratio, and headline B has a 5/1000 conversion ratio, this is extremely signficant, and headline A is almost certainly better than B, even though you haven't had 30 conversions for headline A yet.
In spite of this counterexample, I believe the problem with the 30 sale rule is actually that it invites you to end tests too early on average.
That's why the Split Test Accelerator has confidence numbers built in. These numbers are arrived at by doing pairwise comparisons of two creatives, and running a normal approximation to the binomial distribution test on them.
With these numbers you don't have to worry about rules of thumb. You just wait until one option is better than the other with 95% confidence, and then you can consider ending the test. (You should wait until all the options have at least one action, though.)
The image gives results for a multivariate test, but focus on just one of the factors, and you can see how this works in the single variable case.
For instance, you can see that the "headline color" factor is significant, as is the "testimonials" factor. They both come in at over 95%, and this is incdicated with yellow highlighting.
Now, the multi-variate case is a little more complicated.
The main reason it's more complicated is because there are many variables, and they don't all reach the state of significance at the same time.
One factor can show signficant results after 1000 impressions, and another might not show significance until you have 100,000 impressions.
So, what do you do about this?
Here's the multivariate rule of thumb (I'm not against rules of thumb, but some are more useful than others):
DON'T WAIT FOR MORE THAN 1/4 OF YOUR FACTORS TO BECOME SIGNIFICANT.
Typically, if you can get 4-5 factors to be significant in a 20 factor test, you're doing great. In fact, just 2 significant factors, where an alternative version of the factor beats the control, is cause for celebration. You've found two needles in a haystack of ideas, and those two factors might well give you a 50% boost.
Yes, it's the Pareto principle in action again. 90% of your gains will come from 10% of your factors.
And the name of the game is to find those factors quickly, and move on to another test.
So here are the rules for ending a multi-variate test:
- Make sure every ad (in the ad by ad section) has at least one action (this helps insure you against the possiblity that you set the test up wrong)
- Make sure you run your test for at least a week, so you can average out any "day of week" effects.
- THEN, try to get at least 1/10 of the factors to be significant, but don't wait for more than 1/4 of them to be significant.
For instance, in a 20 factor test, you are looking for 2-5 significant factors. In a 10 factor test, you are looking for 1-3, and in a 7 factor test, you are looking for 1-2 significant factors.
- If you have more than 400 actions (not impressions) in the test, and no factors are significant, you should consider locking in the leaders and starting over. You probably won't see much improvement in this case, but you won't see much improvement waiting longer either.
Now, how long should this take?
It depends on how well you designed your test.
The more WILDLY DIFFERENT your options, the faster you should reach signficance.
That's the weird thing. When you hit a home run with a Taguchi test, it happens very quickly. A test that doubles your conversion rate can be over with 100 actions. One that gives you a 20% boost might take 400 actions to determine winners in enough factors.
That's why I advise you to just start over if the test is taking too long. There's no point taking 3 months to find out which headline will give you a 1% boost, when you could try new factors that might give you a 50% boost in two weeks.