Choose Your Own Data-Analytic Adventure
Okay, we'll calculate sample size. Thus we need:
Alpha level is, WHEN the null hypothesis is true, in other words, WHEN the population effect size is zero, the probability that we will reject the null . Since it's bad to reject the null WHEN it is true, we want a low alpha level. We'll go with the customarily low alpha level of .05. Statistical power is, WHEN the specified (non-zero) effect size is true, the probability that we will reject the null hypothesis. Since, it's good to reject the null WHEN it is false, we want a lot of statistical power. We'll go with the customary minimally high statistical power of .80. |
What do you mean "specified (non-zero) effect size"? |
That is the tricky question that we must tackle now. To calculate sample size, we've got to specify the effect size in the population. |
But, if we knew the effect size in the population, why would we even conduct this research? That's exactly what we are trying to find out. Isn't it? |
There lies the rub. We must guess, and then we use that guess (in conjunction with our alpha level and statistical power) to calculate sample size. |
Guess? |
Yes. Our guess should be fairly conservative, because if we guess too big of a population effect, then we'll collect too small of sample size. |
How do I guess the effect size? Am I supposed to guess the difference in MYPAS scores of the intervention and control groups? |
Actually, that's one way to do it. You can take that average difference and divide by the standard deviation (of either group, assuming homoscedasticity), and you'll get Cohen's d. You could use Cohen's d as your effect size in the sample size calculation. This measure of effect size works for regression of a continuous variable on a dichotomous variable (i.e., two-sample t-test). You can play around with the idea at this terrific site. |
You seemed surprised. Did you have somethign else in mind? |
Cohen's d is a very specialized measure of effect size. I generally prefer to use the Pearson product-moment correlation, r. But, whatever works for you. |
Whether we use the difference in means (divided by the standard deviation) or the correlation... I... um... I don't know. How do I even begin? Is one easier? How do I guess for either? |
It's daunting. You have to use substantive knowledge and/or prior research. Think of the average control patient; how will she score on the MYPAS. Think of the average intervention patient; how will she score on the MYPAS? |
Now that I'm thinking about it. There was a recent study on the effectiveness of family-centered preparation in reducing preoperative anxiety, and they found a difference of about 5 points on the MYPAS between the control group and the intervention group. I think that the full slate of child life services would be even more effective. |
Great. There's your difference in means. Do you have standard deviation? |
They don't note the standard deviation in that study, but I have the standard deviation from the MYPAS validation study: sd = 8. |
It would be ideal to have the standard deviation for your exact population, or at least the sample in which they found the 5-point difference, but we'll take whatever information we can get. If we divide the difference by the standard deviation (5/8) we get a Cohen's d of .625.
Jacob Cohen gives guidelines for effect sizes:
|
Okay. So, now have everything we need to calculate sample size, right?
|
All we need to do now is fire up R and enter three lines of code:
|
You decide to:
Let's call it "good" and submit this data analytic strategy.
or
Let's go at it from another angle and calculate the statistical power of your study.
or
Let's go at it from another angle and calculate the effect size for which your study is geared.