To introduce the concept of how animal welfare is served using basic statistical procedures and the selection of an appropriate sample size.
Dr Eric Rexstad, Department of Biology and Wildlife, UAF, taught this section in the lecture course from 1993 through 1999. His lecture handout as a very nice brief overview on how to evaluate sample size during the design phase of your research project. From the viewpoint of an IACUC review process, it is important that you address the number of animals used. One of the 3 R's (remember them?) is reduction, so the IACUC wants to be assured that you are not using more animals than are necessary to achieve your stated objectives. However, the IACUC is equally concerned that you use enough animals to achieve your objectives. As painful as it might be, this invariably requires some thought about statistics, sample size, power, and type I and type II errors.
There is some flexibility in the review process. If you are conducting a pilot study or evaluating a technique, the IACUC can approve protocols using small numbers of animals. Be clear about your intent and do not overstate the objectives of your pilot study! For example, you cannot model the Prince William Sound ecosystem nor say anything about the effects of contaminants by sampling 5 fish.
The IACUC expects researchers to be competent enough in statistics to understand basic concepts and to know when to involve statisticians. For this reason, our graduate students are expected to enroll in appropriate courses in statistics and research design.
This section of the course should help you complete the following crucial section of the UAF IACUC Assurance of Animal Care form:
This is the place on the IACUC form where you explain your research design to the committee. A poor explanation of your proposed sample size automatically tells the committee that your research design is also poor. Your protocol will be questioned or possibly rejected. Answering the following questions is an absolute minimum but this must be done in the context of how you plan to achieve the objectives previously outlined in the form.
1) How many animals will be needed for this project?
2) How many experimental groups, replications, trials, etc. are required?
3) How did you determine that the sample size, number of groups, replications, trials, etc. are appropriate, as they relate to the numbers of animals requested?
Aid in answering questions, through the hypothetico-deductive method.
Few scientific questions can be answered with certainty.
Impractical to conduct censuses of all members of a population, so sampling is performed.
i.e. Presidental preference polls
not everybody is asked their preference, so there is uncertainty involved
in conjunction with hypothesis testing, this uncertainty takes the form of being wrong
No issue is more fundamental to the proper evaluation of statistical procedures than knowing the possible ways of being wrong
The ways of being wrong have been given names, but not very descriptive names
Type I - rejecting HO when it is true - a
Type II - failing to reject HO when it is false - ß
Fail to Reject HO
Type I error = a
Type II Error = b
Errors come about as a result of distribution theory associated with the test statistic used for testing
Avoidance of errors is linked to the establishment of cutoff levels on the distributions of these statistics
For Type I errors
Establishment of a-level sets tolerance of this error
By "convention" we are often willing to risk a 5% chance of rejecting HO when it is true
» setting a=0.05
There is the chance that our sampling of the population of interest may lead to something nonrepresentative of the phenomenon in the population (HO) by chance alone
If it is important not to make this mistake then you should make a very small
What about Type II errors?
These are defined in the context that HO is false, so what is true?
Magnitude of ß can't be determined without specification of HA
With HA specified, ß can be seen as the region of overlap between distributions of HO and HA
Shrinking a expands 1-a thereby extending the overlap and increasing ß -- as shown by changing a from 0.05 to 0.01
It is P[reject HO | HA]
Remember ß is defined only for a single HA
It is unlikely that a priori we will know the true "state of nature" -- If we knew, why are we conducting the study?
In most instances, it is necessary to specify power over a spectrum of alternatives
HA1: µ=5, HA2: µ = 6, HA3: µ = 7, HA4; µ = 8, ..........
Plotting 1-ß against values of the specified alternatives forms power curves
Why care about power curves?
Shape of test statistic distributions are influenced by adherence to test assumptions
Shape of test statistic distributions differ between test statistics
» for hypotheses where competing tests are available, some tests may be more powerful than others
Power curves may rise more rapidly against alternatives than other tests
Some tests may achieve overall higher power than other tests
One desirable property of a test is to be uniformly most powerful (UMP)
A test with this property has largest power for all alternative values
Two situations when calculating test power is of interest
As Peterman points out, when failing to reject HO, test power should always be presented, or discussed
There is another instance in which power analyses should be performed - study design
Prior to conducting a study, power calculations may be performed to determine likelihood of drawing correct inference from data gathered and statistical approach employed
Power of a given test is defined by several characteristics
II=f (a D,n,s2)
a is under your direct control in hypothesis testing, where you set your rejection region
n is also under your control, often subject to cost constraints either in terms of time or money
D can be under your control in a manipulative study
s2 is unlikely to be under your control, as an intrinsic characteristic of the population
This is all well-and-good in theory, but how can I ever perform such a mysterious task?
We wish to test a null hypothesis of no treatment effect on survival of a population
We plan to use a 2 x 2 c2 analysis to drive our test statistic
If we wish to calculate power, we need 1-ß . Where does ß come from? Specification of HA
Again, this may seem slightly circular specifying HA, but an investigator must have some knowledge prior to initiating a study
For our purposes, specification of HA is specifying D, namely the difference in S between groups
To keep it simple, we assume our control group has a survival rate of 50%
[ask yourself if there is a difference in power between tests of Sc=0.5 and St=0.7 and the test of Sc =0.7 and St =0.9; D is 0.2 in both cases]
The process is to:
Construct "observed" data in the table under HA
Compute the c2 value in the usual manner (equations do not translate well into html so look in an introductory statistics text for the formula)
Here is the trick
The computed value is a non-centrality parameter
Just as there are families of central c2 distributions, there are also families of non-central c2
[and also non-central t distributions]
These distributions are the secret to calculating power for test statistics that are c2 distributed
Peterman (p. 6) defines another method, but this is quicker
You need only know that SAS can calculate critical values of these non-central c2 distributions
Apply the process to a couple of 2 x 2 tables manipulating D and n
D=0.2, n=100 » d =4.17 » 1-ß=0.53
D=0.1, n=200 » d =2.02 » 1-ß=0.30
Complete the picture of study design with surface showing power as function of D and n
Take home message
Significance levels reported in every scientific publication is P[Observed test statistic|HO]
This is a, the probability of committing a Type I error
What I have described today is something different
P[Rejection of HO | HA], the probability of not committing a Type II error, 1-ß
Imagine the impact of your expert testimony when you say:
"My analysis was able to detect a treatment effect of 0.1 with a probability of 0.9."
as opposed to
"I was unable to detect a treatment effect."
That should also grab the attention of a funding agent when reviewing grant proposals.
Erb, H.N. 1999. A non-statistical approach for calculating the optimum number of animals needed in research. Lab Animal March:45-49.
Engeman, R.M. and S.A Shumake. 1993. Animal welfare and the statistical consultant. The American Statistician 47:229-233.
Johnson, D.H. 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63:763-772.
Peterman, R.M. 1990. Statistical power analysis can improve fisheries research and management. Canadian Journal of Fisheries and Aquatic Sciences 47:2-15.
Peterman, R.M. 1990. The importance of reporting statistical power: The forest decline and acidic deposition example. Ecology 71:2024-2027.
Rotenberry, J.T. and J.A. Wiens. 1985. Statistical power analysis and community-wide patterns. American Naturalist 125:164-168.
Steidl, R.J., J.P. Hayes, and E. Schauber. 1997. Statistical power analysis in wildlife research. Journal of Wildlife Management 61:270-279.
Taylor, B.L. and T. Gerrodette. 1993. The uses of statistical power in conservation biology: The vaquita and northern spotted owl. Conservation Biology 7:489-500.
Toft, C.A. and P.J. Shea. 1983. Detecting community-wide patterns: Estimating power strengthens statistical inference. American Naturalist 122:618-625.
Underwood, A.J. 1997. Experiments in Ecology. Cambridge Press, Cambridge. 504pp. (see Chapter 5)
Some simple calculations to assist you in computing power