Precision and sample-size analysis in Stata 16

Precision and sample-size analysis for

confidence intervals

ciwidth

January 23, 2020

Overview

• Introduction

• Precision and sample-size analysis

• Overview of ciwidth

• Inference using confidence intervals

• Examples

• One population mean

• One population variance

• Two paired means

• Two independent means

• Sensitivity analysis

• Custom tables and precision graphs

• Adding your own methods to ciwidth

• Summary

Introduction to precision and

sample-size analysis

Precision and sample-size analysis

• Hypothesis tests for inference

• Test that a population parameter of interest is equal, or smaller, or larger than a

specified value

• Power analysis estimates the required sample size for a future study to ensure

that the test will have higher power (chance) to detect when the parameter

estimate is different from the specified value

Confidence intervals for inference

Estimate an interval for the population parameter

Precision analysis estimates the required sample size for a future study to ensure

that the estimated interval will have the desired precision so that it is not too wide

ike hypothesis tests, confidence intervals are data dependent and so their precision

will vary across samples

Use precision and and sample-size analysis to account for the variability, and plan a

study with the desired precision

Precision and sample-size analysis for confidence intervals is analogous to power and

sample-size analysis for hypothesis tests

Precision and sample-size analysis

• Hypothesis tests for inference

• Test that a population parameter of interest is equal, or smaller, or larger than a

specified value

• Power analysis estimates the required sample size for a future study to ensure

that the test will have higher power (chance) to detect when the parameter

estimate is different from the specified value

• Confidence intervals for inference

• Estimate an interval for the population parameter

• Precision analysis estimates the required sample size for a future study to ensure

that the estimated interval will have the desired precision so that it is not too

wide

Like hypothesis tests, confidence intervals are data dependent and so their precision

will vary across samples

Use precision and and sample-size analysis to account for the variability, and plan a

study with the desired precision

Precision and sample-size analysis for confidence intervals is analogous to power and

sample-size analysis for hypothesis tests

Precision and sample-size analysis

• Hypothesis tests for inference

• Test that a population parameter of interest is equal, or smaller, or larger than a

specified value

• Power analysis estimates the required sample size for a future study to ensure

that the test will have higher power (chance) to detect when the parameter

estimate is different from the specified value

• Confidence intervals for inference

• Estimate an interval for the population parameter

• Precision analysis estimates the required sample size for a future study to ensure

that the estimated interval will have the desired precision so that it is not too

wide

• Like hypothesis tests, confidence intervals are data dependent and so their precision

will vary across samples

• Use precision and and sample-size analysis to account for the variability, and plan a

study with the desired precision

• Precision and sample-size analysis for confidence intervals is analogous to power

and sample-size analysis for hypothesis tests

Precision and sample-size analysis

• How many subjects would be required to ensure that the

confidence interval for the mean is no wider than 2?

• If we only have enough resources to include 40 subjects in

our study, what kind of precision would we expect our

confidence interval to have?

• How would this precision change as we increase our sample

size to 50, 60, and 70?

• The ciwidth command will help you answer these questions

and more

Overview of ciwidth

• Perform precision and sample-size analysis for confidence intervals

• Population mean

• Population variance

• Comparison of means from independent samples

• Comparison of means from paired samples

• Compute

• Sample size

• CI precision

• Probability of CI precision

• Perform sensitivity analysis

• Present results graphically or in a table

• Customize graphs and tables

• Add your own method

• Easily create tables and graphs as you would with official ciwidth commands

Computation of a

confidence interval

Confidence intervals

• Computation

• Confidence level

• 95%, 97%, other

• Sample size

• How many participants can you afford to have in your study?

• Standard deviation

• How did weight loss vary in the sample?

Precision

Measured by the confidence-interval width

Ensured by the probability of C.I. width

Confidence intervals

• Computation

• Confidence level

• 95%, 97%, other

• Sample size

• How many participants can you afford to have in your study?

• Standard deviation

• How did weight loss vary in the sample?

• Precision

• Measured by the CI width

• Ensured by the probability of CI width

Confidence interval for a population mean

A 100*(1- α)% CI:

[ ҧ𝑥 - 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) , ҧ𝑥 + 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) ]

α = significance level

s = sample standard deviation

n = sample size

Confidence interval for a population mean

A 100*(1- α)% CI:

[ ҧ𝑥 - 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) , ҧ𝑥 + 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) ]

α = significance level

s = sample standard deviation

n = sample size

width

Confidence interval for a population

mean

[ ҧ𝑥 - 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) , ҧ𝑥 + 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) ]

Sample mean Width Sample mean ±

half-width

95% Confidence interval

5 10 5-5, 5+5

5 8 5-4,5+4

5 4 5-2, 5+2

0 10

3 7

1 9

Confidence interval for a population

mean

[ ҧ𝑥 - 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) , ҧ𝑥 + 𝑡

n-1,1-α∕2

(

𝑠

𝑛

) ]

Choose the values of the confidence

level (100*(1-𝛼)), sample standard

deviation, and sample size

that will provide the desired level of

precision, given the resources you have

A first example

Precision analysis for a population mean

• How long do plug-in air fresheners last?

• How many air fresheners would we need to

test to obtain a two-sided 95% CI for the

mean scent duration with a width no larger

than 4 days?

Computing sample size for a population

mean

Computing CI width for a population mean

How big of an interval width would we

expect if we could only afford to

sample 30 air fresheners?

Computing CI width for a population mean

Computing probability of CI width for a

population mean

What’s the probability that we’ll obtain

a desired CI width of 4, if we only

sample 30 air fresheners?

Computing probability of CI width for a

population mean

Syntax overview for a one-mean CI

• Estimate sample size

• ciwidth onemean, sd(6) probwidth(0.96) width(4)

• Estimate CI width

• ciwidth onemean, sd(6) probwidth(0.96) n(30)

• Estimate probability of CI width

• ciwidth onemean, sd(6) width(4) n(30)

General syntax for ciwidth

• Estimate sample size

• ciwidth method …, probwidth() width() …

• Estimate CI width

• ciwidth method …, probwidth() n()…

• Estimate probability of CI width

• ciwidth method …, width() n()…

method:

onemean

onevariance

twomeans

pairedmeans

usermethod

General syntax for ciwidth

• Estimate sample size

• ciwidth method …, probwidth(numlist) width(numlist) [options]

• Estimate CI width

• ciwidth method …, probwidth(numlist) n(numlist) [options]

• Estimate probability of CI width

• ciwidth method …, width(numlist) n(numlist) [options]

method: options:

onemean table

onevariance graph

twomeans lower

pairedmeans upper

usermethod …

Finite populations

Infinite vs. finite population

• We’ve been performing our analyses for a confidence

interval for the mean scent duration of all plug-in air

fresheners (infinite population).

• But the parameters you are estimating may be for a

fixed population. For example, there are only 500

models of the Bugatti Chiron. (finite population).

• Apply a finite population correction with the fpc()option.

Finite population

• Suppose you work for this car manufacturer and

you’re designing a study to estimate a confidence

interval for the maximum speed of this limited

edition model.

• What is the largest estimated width for the

maximum speed you’ll obtain if you only sample

50 out of the 500 cars that were manufactured,

assuming a standard deviation of 16 mph?

Compute CI width using a finite population correction

Alternative specification for a finite population

correction

Finite population correction

Back to infinite populations of air fresheners,

and on to

one-sided confidence intervals

One-sided confidence intervals

• You may want one-sided confidence intervals if

• You are certain of the direction of an effect.

• Examples include toxicity studies and analysis of occurences of adverse drug

reaction data(Chow, Shao, Wang, & Lokhnygina, 2017).

• You are mainly interested in either the lower or upper limit.

• An example would be product quality and reliability (Meeker, Hahn, & Escobar,

2017 ).

• In general, use caution with one-sided confidence intervals.

• There have been cases where the placebo effect is superior to the

actual drug effect (Meeker, Hahn, & Escobar, 2017).

Lower 95% confidence interval

Two-sided 90% confidence level

Sensitivity analysis

• Precision and sample-size analysis is centered on a

prospective study, and we are obtaining estimates based

on values that are truly unknown

• Estimates may come from historical data or pilot studies

• Sensitivity analysis is used to investigate the effect of

varying study parameters on CI precision, probability of CI

width, sample size, and other components of a study.

• In Stata, you can perform sensitivity analysis with tables and

graphs.

Computing probability of CI width for a

population mean

Probability of CI width for a range of

standard deviations

Table and graph for sensitivity analysis

Sensitivity analysis graph

Probability of CI width for ranges of multiple

parameters

Tables for sensitivity analysis

Reordering table columns

Modifying column labels and widths

ciwidth

• We computed the following for the CI for a population mean

• Sample size

• Confidence-interval width

• Probability of confidence-interval width

• We created a graph and tables for sensitivity analysis

• We can perform precision and sample-size analysis for CIs for

any of the following

• A population mean

• A population variance

• A difference between two independent means

• A difference between paired means

Precision analysis for a

population variance

Computing probability of CI width for a

population variance

Computing probability of CI width for a

population variance

Computing sample size for a population

standard deviation

Sensitivity analysis for a population

standard deviation

Paired samples

Paired-means for bad (LDL) cholesterol levels

Cholesterol

Before

(mg/dL)

Cholesterol

After

(mg/dL)

Difference

167 140 -27

149 138 -11

192 194 2

200 180 -20

162 162 0

52 41 67.25

… … …

Mean

129.58 111.96 -17.62

S.D.

52.27 42.15 6.47

CI for a paired-means difference

Fractional sample sizes

Sample-size determination using the correlation

CI for a difference of two means

• Let’s consider the cholesterol example, but

instead randomly sample individuals who take

cholesterol medication and individuals who don’t

• Two independent samples

• Group sizes may differ

CI for a difference of two means

Graphs for sensitivity analysis

Define the x axis

Label distinct values on the y axis

Specifying multiple values for multiple parameters

Create subgraphs for values of width

Unbalanced sample sizes

Control group

(No medication)

Experimental group

(Medication)

Unbalanced sample sizes

Compute one sample size given another

Known standard deviations

Do you have another method in mind?

computations in the same way you would with other ciwidthcommands,

and create tables and graphs

Adding your own method to ciwidth

Adding your own methods to ciwidth is easy. Suppose

you want to add a method called mymethod to ciwidth.

Simply

1. write an r-class program called ciwidth_cmd_mymethod that

computes sample size, probability of CI width, or CI width and

follows ciwidth’s convention for naming common options and

storing results; and

2. place the program where Stata can find it.

You are done. You can now use mymethod within

ciwidth like any other official ciwidth method.

Program for computing CI width

program ciwidth_cmd_mymean, rclass

version 16.0

Program for computing CI width

program ciwidth_cmd_mymean, rclass

version 16.0

syntax, n(integer) [ Level(cilevel) Stddev(real 1) * ]

Program for computing CI width

program ciwidth_cmd_mymean, rclass

version 16.0

syntax, n(integer) [ Level(cilevel) Stddev(real 1) * ]

tempname width

scalar `width' = 2*invnormal(1/2+`level'/200)*`stddev'/sqrt(`n')

Program for computing CI width

program ciwidth_cmd_mymean, rclass

version 16.0

syntax, n(integer) [ Level(cilevel) Stddev(real 1) * ]

tempname width

scalar `width' = 2*invnormal(1/2+`level'/200)*`stddev'/sqrt(`n')

return scalar level = `level'

return scalar N = `n'

return scalar width = `width'

return scalar stddev = `stddev'

end

Program for computing CI width

ciwidth mymean : compute CI width

ciwidth onemean : compute CI width

Naming conventions for stored results

return scalar level = `level'

return scalar N = `n'

return scalar width = `width'

ciwidth mymean : Specifying multiple values

We can also compute results for multiple sample sizes and confidence levels

without any additional effort on our part:

ciwidth mymean : Automatic graphs

Customizing your ciwidth command

• Add method-specific options and set them up to allow

multiple values

• Tables

• Change column labels, formats, and widths to modify the look and

contents for the table created by default

• Graphs

• Change the default column labels

• Use different symbols to label the results

Summary

• Perform precision and sample-size analysis for CIs for

• A population mean

• A population variance

• A difference between two independent means

• A difference between paired means

• Compute

• Sample size, CI width, and probability of CI width

• Perform sensitivity analysis graphically and with a table

• Implement your own method, and easily create tables

and graphs as if it were an official ciwidth command

References

• Dixon, W. J., and F. J. Massey, Jr. 1983. Introduction to

Statistical Analysis. 4th ed. New York: McGraw–Hill.

• Chow, Shein-Chung, J. Shao, H. Wang, and Y.

Lokhnygina. 2017. Sample Size Calculations in Clinical

Research. 3rd ed. Boca Raton: Taylor & Francis.

• Meeker, W. Q., G. J. Hahn, and L. A. Escobar. 2017.

Statistical Intervals: A Guide for Practitioners and

Researchers. 2nd ed. Hoboken, NJ: Wiley.

Thank you !!