Sample Size and Statistical Power

Objective:

To introduce the concept of how animal welfare is served using basic statistical procedures and the selection of an appropriate sample size.

Contents:

 

Introduction:

Dr Eric Rexstad, Department of Biology and Wildlife, UAF, taught this section in the lecture course from 1993 through 1999. His lecture handout as a very nice brief overview on how to evaluate sample size during the design phase of your research project. From the viewpoint of an IACUC review process, it is important that you address the number of animals used. One of the 3 R's (remember them?) is reduction, so the IACUC wants to be assured that you are not using more animals than are necessary to achieve your stated objectives. However, the IACUC is equally concerned that you use enough animals to achieve your objectives. As painful as it might be, this invariably requires some thought about statistics, sample size, power, and type I and type II errors.

There is some flexibility in the review process. If you are conducting a pilot study or evaluating a technique, the IACUC can approve protocols using small numbers of animals. Be clear about your intent and do not overstate the objectives of your pilot study! For example, you cannot model the Prince William Sound ecosystem nor say anything about the effects of contaminants by sampling 5 fish.

The IACUC expects researchers to be competent enough in statistics to understand basic concepts and to know when to involve statisticians. For this reason, our graduate students are expected to enroll in appropriate courses in statistics and research design.

 

This section of the course should help you complete the following crucial section of the UAF IACUC Assurance of Animal Care form:

 

 

Eric's Handout

What is the purpose of statistical procedures?

Aid in answering questions, through the hypothetico-deductive method.

Few scientific questions can be answered with certainty.

Impractical to conduct censuses of all members of a population, so sampling is performed.

Type I and Type II errors

No issue is more fundamental to the proper evaluation of statistical procedures than knowing the possible ways of being wrong

The ways of being wrong have been given names, but not very descriptive names

Type I - rejecting HO when it is true - a

Type II - failing to reject HO when it is false - ß

Eric's Truth Table

 

Investigator Decision

Fail to Reject HO

Reject HO

Truth

HO true

Correct inference

Type I error = a

HO false

Type II Error = b

Correct inference

 

Errors come about as a result of distribution theory associated with the test statistic used for testing

Avoidance of errors is linked to the establishment of cutoff levels on the distributions of these statistics

For Type I errors

Establishment of a-level sets tolerance of this error

By "convention" we are often willing to risk a 5% chance of rejecting HO when it is true

setting a=0.05

There is the chance that our sampling of the population of interest may lead to something nonrepresentative of the phenomenon in the population (HO) by chance alone

If it is important not to make this mistake then you should make a very small

 

What about Type II errors?

These are defined in the context that HO is false, so what is true?

Magnitude of ß can't be determined without specification of HA

With HA specified, ß can be seen as the region of overlap between distributions of HO and HA

Shrinking a expands 1-a thereby extending the overlap and increasing ß -- as shown by changing a from 0.05 to 0.01

 

The complement of ß, 1-ß (a more positive thing)

It is P[reject HO | HA]

Remember ß is defined only for a single HA

It is unlikely that a priori we will know the true "state of nature" -- If we knew, why are we conducting the study?

In most instances, it is necessary to specify power over a spectrum of alternatives

HA1: =5, HA2: = 6, HA3: = 7, HA4; = 8, ..........

Plotting 1-ß against values of the specified alternatives forms power curves

Why care about power curves?

Shape of test statistic distributions are influenced by adherence to test assumptions

Shape of test statistic distributions differ between test statistics

for hypotheses where competing tests are available, some tests may be more powerful than others

Power curves may rise more rapidly against alternatives than other tests

Some tests may achieve overall higher power than other tests

One desirable property of a test is to be uniformly most powerful (UMP)

A test with this property has largest power for all alternative values

 

Two situations when calculating test power is of interest

As Peterman points out, when failing to reject HO, test power should always be presented, or discussed

There is another instance in which power analyses should be performed - study design

Prior to conducting a study, power calculations may be performed to determine likelihood of drawing correct inference from data gathered and statistical approach employed

 Power of a given test is defined by several characteristics

II=f (a D,n,s2)

a is under your direct control in hypothesis testing, where you set your rejection region

n is also under your control, often subject to cost constraints either in terms of time or money

D can be under your control in a manipulative study

s2 is unlikely to be under your control, as an intrinsic characteristic of the population

 

This is all well-and-good in theory, but how can I ever perform such a mysterious task?

We wish to test a null hypothesis of no treatment effect on survival of a population

Treatment can be anything:
  • cut their legs off
  • infect them with a disease
  • put an oil well in their home range

 

We plan to use a 2 x 2 c2 analysis to drive our test statistic

If we wish to calculate power, we need 1-ß . Where does ß come from? Specification of HA

Again, this may seem slightly circular specifying HA, but an investigator must have some knowledge prior to initiating a study

For our purposes, specification of HA is specifying D, namely the difference in S between groups

To keep it simple, we assume our control group has a survival rate of 50%

[ask yourself if there is a difference in power between tests of Sc=0.5 and St=0.7 and the test of Sc =0.7 and St =0.9; D is 0.2 in both cases]

 

The process is to:

Construct "observed" data in the table under HA

Compute the c2 value in the usual manner (equations do not translate well into html so look in an introductory statistics text for the formula)

Here is the trick

The computed value is a non-centrality parameter

Just as there are families of central c2 distributions, there are also families of non-central c2

[and also non-central t distributions]

These distributions are the secret to calculating power for test statistics that are c2 distributed

Peterman (p. 6) defines another method, but this is quicker

You need only know that SAS can calculate critical values of these non-central c2 distributions

Apply the process to a couple of 2 x 2 tables manipulating D and n

D=0.2, n=100 d =4.17 1-ß=0.53

D=0.1, n=200 d =2.02 1-ß=0.30

Complete the picture of study design with surface showing power as function of D and n

 

Take home message

Significance levels reported in every scientific publication is P[Observed test statistic|HO]

This is a, the probability of committing a Type I error

What I have described today is something different

P[Rejection of HO | HA], the probability of not committing a Type II error, 1-ß

Imagine the impact of your expert testimony when you say:

"My analysis was able to detect a treatment effect of 0.1 with a probability of 0.9."

as opposed to

"I was unable to detect a treatment effect."

That should also grab the attention of a funding agent when reviewing grant proposals.

References

Recommended reading

Erb, H.N. 1999. A non-statistical approach for calculating the optimum number of animals needed in research. Lab Animal March:45-49.

Engeman, R.M. and S.A Shumake. 1993. Animal welfare and the statistical consultant. The American Statistician 47:229-233.

 Johnson, D.H. 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63:763-772.

Peterman, R.M. 1990. Statistical power analysis can improve fisheries research and management. Canadian Journal of Fisheries and Aquatic Sciences 47:2-15.

Peterman, R.M. 1990. The importance of reporting statistical power: The forest decline and acidic deposition example. Ecology 71:2024-2027.

Rotenberry, J.T. and J.A. Wiens. 1985. Statistical power analysis and community-wide patterns. American Naturalist 125:164-168.

 Steidl, R.J., J.P. Hayes, and E. Schauber. 1997. Statistical power analysis in wildlife research. Journal of Wildlife Management 61:270-279.

Taylor, B.L. and T. Gerrodette. 1993. The uses of statistical power in conservation biology: The vaquita and northern spotted owl. Conservation Biology 7:489-500.

Toft, C.A. and P.J. Shea. 1983. Detecting community-wide patterns: Estimating power strengthens statistical inference. American Naturalist 122:618-625.

 Underwood, A.J. 1997. Experiments in Ecology. Cambridge Press, Cambridge. 504pp. (see Chapter 5)

 

Web Sites

Some simple calculations to assist you in computing power

http://members.aol.com/johnp71/javastat.html#Power