In direct marketing texts very little is discussed about sample sizes, and even less about Type I and Type II errors. Admittedly, the latter is somewhat complicated, and business statistics books often recommend that readers go over sections covering it carefully and repeatedly.
‘Complicated’ is, unfortunately, not synonymous with ‘of theoretical value only’. In fact, the opposite is true in this case, because the Type II error is of fundamental importance, as becomes apparent if we step back and ask why we bother about test and sample size in the first place.
We do so, basically, for two reasons. First, we don’t want to throw the baby out with the bathwater. We don’t want to take a test result that is somewhat less than our expectation on its face value. We’d rather use the test to estimate the list characteristic leading to the result, and if that turns out to be acceptable, we’d like to scale up. This is why we worry about a (the probability of rejecting a true null hypothesis).
At the same time, we don’t want to lose money by scaling up when we shouldn’t have. This means we should worry about the minimum response that would be ok. For this we must worry about b (the probability of accepting a false null hypothesis).
Yet, the commonly used formula for sample size completely ignores beta (not only in direct marketing books and online calculators but also in most of the business statistics texts I’ve come across)!
The formula goes like this:
N = | za/22p(1-p) |
E2 |
Where
z is the value used for the specified confidence level
p is the estimated response (population proportion) and
E is the ± sampling error allowed.
Let’s say estimated response is 2%; the allowed error is ±0.25%; and confidence level is 90%.
Putting these into the template yields a sample size of 8,485.
Let’s see what this means in terms of b if the true response (which we’ll get to know only if we scale up to the entire list) is, say, 1.65%. b turns out to be 23%, that is, 1 in 4!
Humm, that’s bad. There’s a 1 in 4 chance that while accepting a figure between 1.75% and 2.25% as ‘as good as’ a 2% response rate, we’ll actually accept a list with only 1.65% response.
No wonder one of the well accepted rules of the thumb in direct marketing goes: “As a rule, the response rate from a rollout to the balance of the list after a successful test mailing will usually be lower than the response from the test.”
This may be because of a variety of differences between test and rollout conditions. But one thing needs to be kept in mind: If one ignores Type II errors, the success of the test can be very suspect indeed.
A way out could be to use an alternative formula:
N = ( | |z0|(p0(1-p0))1/2 + |z1|(p1(1-p1))1/2 | )2 |
p0 – p1 |
Where
p0 is the estimated response
p1 is the value for which Type II error will be monitored
z0 is za or za/2 depending on whether the test is one- or two-tailed and
z1 is zb where b is the limit on type II error probability when p = p1.
Let’s see what happens if estimated response is 2%, the allowed error is ±0.25%, and confidence level is 90% (as before); while p1 is 1.65% and b is 10% (i.e., there is only a 1 in 10 chance – not 1 in 4 as earlier – that we’ll accept a list with a real response rate of 1.65% by performing the test).
We get a sample size of 12,643.
Sure, it’s a 50% increase in sample size. But it may be well worth it if the roll out numbers and costs are far higher than the test’s.
In any case, won’t decisions be better if they were taken with a clearer idea of the risks?
PS: Please excuse the ungainly appearance of the formula. It's the best I could manage using a Word file. Both formulas can be found in the useful templates at http://highered.mcgraw-hill.com/sites/0070620164/student_view0/excel_templates.html