Statistical Significance and Effect Size
-the overused and the under-used
(last updated on 2011-02-19)
Statistical significance is one of the most widely used statistical term, and likely,
the most overused or abused statistical term. No matter how small α is for
a treatment, it means nothing more than that the treatment probably makes difference.
Statistical significance does not tell how big the difference is. In some sense,
"significance" is an unfortunate choice of words for this statistical
term because it converys different meaning than its use in daily speech. For example,
suppose a cholesterol lowering drug reduces cholesterol level by 1 mg/dL, or less
than 1%. Assuming the standard deviations of the treatment and control arms are
10 mg/dL, it can be shown with
Biyee's
sampling and t-test simulator that a significance level of α < 0.05 can
be achieved with only 500 samples in each arm. Apparently this
drug would be essentially useless despite the statistical significance it may
show in lowering cholesterol. If it had any side effects, it might do more
harm than good.
The effect size is usually our ultimate concern. For the above example,
the effect size is 1 mg/dL (or 1 mg/dL devided by the standard deviation).
Of course, an effect size is meaningful only if it is backed by an acceptable signficance
level.
The scenario where the statistical significance is more likely used in lieu of the
effect size is when both the treatment of the effect are binary. Vaccines
are good examples in this category. The treatment variable of a vaccine is whether a subject
has received the vaccine;
the effect variable is whether the subject has contracted the corresponding disease
after the vaccination. In this case, the
best effect size is odds ratio
which shows how much difference the vaccine makes in terms of the likelyhood of
contracting the disease.