
Sample Size Given Desired Power (Probabilistic Bisection)
2026-03-04
Source:vignettes/articles/x_from_power_for_n_pb.Rmd
x_from_power_for_n_pb.RmdIntroduction
This article is a brief illustration of how to use
n_from_power() and power4test() from the
package power4mome
to find by simulation the sample size with power close to a desired
level to detect an effect, given the level of population effect
size.
The illustration will use an indirect effect tested by Monte Carlo
confidence interval as an example, though the procedure is similar for
other tests supported by power4test().
NOTE: This is one version of two nearly-identical articles. This version introduces how to use the probabilistic bisection algorithm (Waeber et al., 2013), which sometimes takes longer to run but can be used for a higher level of precision using a larger number of replications. The other version uses the (informal) bisection algorithm (see Chalmers, 2024 for a review of this method) to identify the approximate sample size with a limited number of replications.
Prerequisite
Basic knowledge about fitting models by lavaan is
required. Readers are also expected to know how to use
power4test() (see this get-started for an introduction).
Scope
This is a brief illustration. More complicated scenarios and other
features of x_from_power() will be described in other
vignettes and articles.
Some sections are repeated from other vignettes and articles to make this vignette self-contained.
Workflow
Three functions, along with some methods, are sufficient for estimating the sample size, given the desired power, along with other factors such as the test, the model, and population values. This is the workflow:
Specify the model syntax for the population model, in
lavaanstyle, and set the population values of the model parameters.Call
power4test()to examine the setup and the datasets generated. Repeat this and previous steps until the model is specified correctly.Call
power4test()again, with the test to do specified, using an initial sample size.Call
rejection_rates()to compute the power and verify that the test is specified correctly.Call
n_from_power()to estimate the sample size required for the desired power.
Mediation
Let’s consider a simple mediation model. We would like to estimate the sample size required to detect a mediation effect by Monte Carlo confidence interval, with 95% level of significance.
Because sampling (simulation) error is involved when there is no analytic solution to find the sample size, instead of estimating the unknown sample size, we can also estimate, approximately, the region of sample sizes with their levels of power
significantly lower than the target power, or
significantly higher than the target power.
That is, we find the sample size that has its confidence interval, 95% by default, just below (Case 1) or just above (Case 2) the target power, approximately.
We will consider Case 1 first.
Specify the Population Model
This is the model syntax
mod <-
"
m ~ x
y ~ m + x
"Note that, even if we are going to test mediation, we do not need to
label the parameters and define the indirect effect as usual in
lavaan. This will be taken care of by the test functions,
through the use of the package manymome
(Cheung & Cheung, 2024).
Specify The Population Values
There are two approaches to do this: using named vectors or lists, or
using a multiline string similar to lavaan model syntax.
The second approach is demonstrated below.
Suppose we want to estimate the power when:
The path from
xtomare “large” in strength.The path from
mtoyare “medium” in strength.The path from
xtomare “small” in strength.
By default, power4mome use this convention for
regression path and correlation:1
Small: .10 (or -.10)
Medium: .30 (or -.30)
Large: .50 (or -.50)
All these values are for the standardized solution (the so-called “betas”).
The following string denotes the desired values:
mod_es <-
"
m ~ x: l
y ~ m: m
y ~ x: s
"Each line starts with a tag, which is the parameter
presented in lavaan syntax. The tag ends with a colon,
:.
After the colon is a population value, which can be:
-
A word denoting the value. By default:
s: Small. (-sfor small and negative.)m: Medium. (-mfor medium and negative.)l: Large. (-lfor large and negative.)nil: Zero.
All regression coefficients and covariances, if not specified, are set to zero.
Call power4test() to Check the Model
out <- power4test(nrep = 2,
model = mod,
pop_es = mod_es,
n = 50000,
iseed = 1234)These are the arguments used:
nrep: The number of replications. In this stage, a small number can be used. It is more important to have a large sample size than to have more replications.model: The model syntax.pop_es: The string setting the population values.n: The sample size in each replications. In this stage, just for checking the model and the data generation, this number can be set to a large one unless the model is slow to fit when the sample size is large.iseed: If supplied, it is used to set the seed for the random number generator. It is advised to always set this to an arbitrary integer, to make the results reproducible.2
The population values can be shown by print this object:
print(out,
data_long = TRUE)
#>
#> ====================== Model Information ======================
#>
#> == Model on Factors/Variables ==
#>
#> m ~ x
#> y ~ m + x
#>
#> == Model on Variables/Indicators ==
#>
#> m ~ x
#> y ~ m + x
#>
#> ====== Population Values ======
#>
#> Regressions:
#> Population
#> m ~
#> x 0.500
#> y ~
#> m 0.300
#> x 0.100
#>
#> Variances:
#> Population
#> .m 0.750
#> .y 0.870
#> x 1.000
#>
#> (Computing indirect effects for 2 paths ...)
#>
#> == Population Conditional/Indirect Effect(s) ==
#>
#> == Indirect Effect(s) ==
#>
#> ind
#> x -> m -> y 0.150
#> x -> y 0.100
#>
#> - The 'ind' column shows the indirect effect(s).
#>
#> ======================= Data Information =======================
#>
#> Number of Replications: 2
#> Sample Sizes: 50000
#>
#> ==== Descriptive Statistics ====
#>
#> vars n mean sd skew kurtosis se
#> m 1 1e+05 0.00 1 0.01 0.03 0
#> y 2 1e+05 0.01 1 0.01 0.00 0
#> x 3 1e+05 0.00 1 0.01 0.01 0
#>
#> ===== Parameter Estimates Based on All 2 Samples Combined =====
#>
#> Total Sample Size: 100000
#>
#> ==== Standardized Estimates ====
#>
#> Variances and error variances omitted.
#>
#> Regressions:
#> est.std
#> m ~
#> x 0.500
#> y ~
#> m 0.295
#> x 0.102
#>
#>
#> ==================== Extra Element(s) Found ====================
#>
#> - fit
#>
#> === Element(s) of the First Dataset ===
#>
#> ============ <fit> ============
#>
#> lavaan 0.6-21 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 50000
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0The argument data_long = TRUE is used to verify the
simulation.
The population values for the regression paths in the section
Population Values are what we specified. So the model is
specified correctly.
The section Descriptive Statistics, generated by
psych::describe(), shows basic descriptive statistics for
the observed variables. As expected, they have means close to zero and
standard deviations close to one, because the datasets were generated
using the standardized model.
The section Parameter Estimates Based on shows the
parameter estimates when the population model is fitted to all the
datasets combined. When the total sample size is large, these estimates
should be close to the population values.
By the default, the population model will be fitted to each dataset,
hence the section <fit>. This section just verifies
that the population can be fitted
The results show that population model is the desired one. We can proceed to the next stage
Call power4test() to Do the Target Test
We can now do the simulation to estimate power for an initial sample size, to verify the test we want to do. A large number of datasets (e.g., 500) of the target sample size are to be generated, and then the target test will be conducted in each of these datasets.
Suppose we would like to estimate the power of using Monte Carlo
confidence interval to test the indirect effect from x to
y through m, when sample size is 50. This is
the call, based on the previous one:
out <- power4test(nrep = 50,
model = mod,
pop_es = mod_es,
n = 50,
R = 199,
ci_type = "mc",
test_fun = test_indirect_effect,
test_args = list(x = "x",
m = "m",
y = "y",
mc_ci = TRUE),
iseed = 2345,
parallel = TRUE)If our goal is to find a sample size for a specific level of power,
with sufficient precision, we do not need a large number of replications
(nrep) in this stage. We can use as few as 50. We can set
the target number of replications when calling the function
n_from_power(), which is a wrapper to the general function
x_from_power().
These are the new arguments used:
R: The number of replications used to generate the Monte Carlo simulated estimates, 199 in this example. This may look small, but this value is acceptable because the goal is not to have a stable result in one replication, but to estimate the power across replications. The method proposed by Boos & Zhang (2000), detailed below, will be used to estimate the power.ci_type: The method used to generate estimates. Support Monte Carlo ("mc") and nonparametric bootstrapping ("boot").3 Although bootstrapping is usually used to test an indirect effect, it is very slow to doRbootstrapping innrepdatasets (the model will be fittedR * nreptimes). Therefore, it is preferable to use Monte Carlo to do the initial estimation.test_fun: The function to be used to do the test for each replication. Any function following a specific requirement can be used, andpower4momecomes with several built-in function for some tests. The functiontest_indirect_effect()is used to test an indirect effect in the model.test_args: A named list of arguments to be supplied totest_fun. Fortest_indirect_effect(), it is a named list specifying the predictor (x), the mediator(s) (m), and the outcome (y). A path with any number of mediators can be supported. Please refer to the help page oftest_indirect_effect().4parallel: If the test to be conducted is slow, which is the case for test done by Monte Carlo or nonparametric bootstrapping confidence interval, it is advised to enable parallel processing by settingparalleltoTRUE.5
In power analysis involving resampling methods such as Monte Carlo
confidence interval and bootstrapping, the processing time can be very
long. Boos & Zhang (2000) proposed a
method to extrapolate the rejection rates of these methods using a
limited number of (re)sampling. This method requires specific numbers of
R, determined by the level of significance (1 minus the
level of confidence, for confidence interval methods). For a level of
significance of .05, the default, these are the first few supported
values of R:
#> [1] 39 79 119 159 199 239 279 319 359
In our experience, an R of 199 is large enough for the
search.
Note that the simulation can take some time to run. Progress will be printed when run in an interactive session.
This is the default printout:
print(out)
#>
#> ====================== Model Information ======================
#>
#> == Model on Factors/Variables ==
#>
#> m ~ x
#> y ~ m + x
#>
#> == Model on Variables/Indicators ==
#>
#> m ~ x
#> y ~ m + x
#>
#> ====== Population Values ======
#>
#> Regressions:
#> Population
#> m ~
#> x 0.500
#> y ~
#> m 0.300
#> x 0.100
#>
#> Variances:
#> Population
#> .m 0.750
#> .y 0.870
#> x 1.000
#>
#> (Computing indirect effects for 2 paths ...)
#>
#> == Population Conditional/Indirect Effect(s) ==
#>
#> == Indirect Effect(s) ==
#>
#> ind
#> x -> m -> y 0.150
#> x -> y 0.100
#>
#> - The 'ind' column shows the indirect effect(s).
#>
#> ======================= Data Information =======================
#>
#> Number of Replications: 50
#> Sample Sizes: 50
#>
#> Call print with 'data_long = TRUE' for further information.
#>
#> ==================== Extra Element(s) Found ====================
#>
#> - fit
#> - mc_out
#>
#> === Element(s) of the First Dataset ===
#>
#> ============ <fit> ============
#>
#> lavaan 0.6-21 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 50
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
#>
#> =========== <mc_out> ===========
#>
#>
#> == A 'mc_out' class object ==
#>
#> Number of Monte Carlo replications: 199
#>
#>
#> =============== <test_indirect: x->m->y> ===============
#>
#> Mean(s) across replication:
#> est cilo cihi sig pvalue R nlt0 alpha bz_39 bz_79 bz_119 bz_159
#> 0.167 NaN NaN 0.548 0.129 199.000 16.860 0.050 0.496 0.523 0.528 0.536
#> bz_199
#> 0.540
#>
#> - The value 'sig' is the rejection rate.
#> - If the null hypothesis is false, this is the power.
#> - Number of valid replications for rejection rate: 50
#> - Proportion of valid replications for rejection rate: 1.000As shown above, the settings are correct. The columns after
R (e.g., nlt0 and bz_39) are for
the method by Boos & Zhang (2000). The
presence of these columns confirms that the method by Boos & Zhang (2000) is used.
We can now call n_from_power() to do the search.
Call n_from_power() to Estimate the Sample Size
This is a simplified description of how n_from_power()
works when our goal is to find a sample size with its power close
to the target power, with a desired level of precision determined
by the number of replications (nrep).
The probabilistic bisection algorithm (Waeber et al., 2013) will be used. It takes some time to run, sometimes ten minutes or more, but it is particularly suitable for determining the sample size given a target power with a high level of precision, which requires a large number of replications. See Chalmers (2024) for an illustration of this method in power analysis.
This is the function call:
out_n <- n_from_power(out,
target_power = .80,
final_nrep = 2000,
final_R = 2000,
seed = 1234,
algorithm = "probabilistic_bisection")The argument used above:
target_power: The target power. Default is .80.final_nrep: The number of replications desired in the solution. Fornrep = 2000, the width of the 95% confidence limits based on normal approximation for a power of .80 is about .035. This should be precise enough for estimating the sample size required. Increase or decreasenrepto get the desired level of precision.final_R: It is preferable to use the method by Boos & Zhang (2000) with a smallRduring the search. Nevertheless, if a larger number ofRis desired during the final check, we can setfinal_Rto this value. If omitted, theRin the object frompower4test()will be used.seed: To make the search reproducible, if possible, set this seed to an integer.
By default, the progress of the search will be displayed on screen
(add progress = FALSE to hide the progress). Each iteration
only uses a small number of replications (nrep), but the
information will be accumulated across iterations to determine the
sample size. When a potential solution (the sample size with the desired
power) is found, a final check will be conducted using
final_nrep replications and final_R (if set).
If the estimated power of this solution is “close enough” to the target
power, the search will stop. If not, the search will continue. These
steps will be continued until some termination criteria are met.
(Details to be described in another article).
Examine the Output
The example above will take several minutes to run, depending on
factors such as whether parallel is TRUE and
the number of CPU cores used when calling power4test(), as
well as the target number of replications (final_nrep). The
larger the final_nrep is, the higher the precision of the
solution, but the longer the search.
This is the basic output:
out_n
#> Call:
#> power4mome::x_from_power(object = out, x = "n", target_power = 0.8,
#> what = "point", goal = "ci_hit", final_nrep = 2000, final_R = 2000,
#> seed = 1234, algorithm = "probabilistic_bisection")
#>
#> Setting
#> Predictor(x): Sample Size
#> Parameter: N/A
#> goal: close_enough
#> what: point
#> algorithm: probabilistic_bisection
#> Level of confidence: 95.00%
#> Target Power: 0.800
#>
#> - Final Value of Sample Size (n): 103
#>
#> - Final Estimated Power (CI): 0.814 [0.797, 0.831]
#>
#> Call `summary()` for detailed results.The estimated sample size is 103. The estimated power based on simulation is 0.814, with its confidence interval equal to [0.797, 0.831].
To obtain a more detailed results for the search, we can use the
summary() method:
summary(out_n)
#>
#> ====== x_from_power Results ======
#>
#> Call:
#> x_from_power(object = out, x = "n", target_power = 0.8, what = "point",
#> goal = "ci_hit", final_nrep = 2000, final_R = 2000, seed = 1234,
#> algorithm = "probabilistic_bisection")
#>
#> Predictor (x): Sample Size
#>
#> - Target Power: 0.800
#> - Goal: Find 'x' with estimated estimated power close enough to the
#> target power.
#>
#> === Major Results ===
#>
#> - Final Value (Sample Size): 103
#>
#> - Final Estimated Power: 0.814
#> - Confidence Interval: [0.797; 0.831]
#> - Level of confidence: 95.0%
#> - Based on 2000 replications.
#>
#> === Technical Information ===
#>
#> - Algorithm: probabilistic_bisection
#> - Tolerance for 'close enough': Within 0.01525 of 0.800
#> - The range of values explored: 50 to 2000
#> - Time spent in the search: 2.378 mins
#> - The final crude model for the power-predictor relation:
#>
#> Model Type: Linear Regression
#>
#> Call:
#> power_curve(object = by_x_1, formula = power_model, start = power_curve_start,
#> lower_bound = lower_bound, upper_bound = upper_bound, nls_args = nls_args,
#> nls_control = nls_control, verbose = progress)
#>
#> Predictor: n (Sample Size)
#>
#> Model:
#>
#> Call:
#> stats::lm(formula = reject ~ x, data = reject_df, weights = weights)
#>
#> Coefficients:
#> (Intercept) x
#> 0.7952892 0.0001529
#>
#>
#> - Detailed Results:
#>
#> [test]: test_indirect: x->m->y
#> [test_label]: Test
#> n est p.v nrep reject r.cilo r.cihi
#> 1 50 0.167 1.000 50 0.548 0.404 0.670
#> 2 50 0.145 1.000 51 0.451 0.323 0.586
#> 3 103 0.153 1.000 2000 0.814 0.797 0.831
#> 4 142 0.153 1.000 51 0.899 0.790 0.957
#> 5 216 0.159 1.000 51 1.000 0.930 1.000
#> 6 350 0.147 1.000 51 1.000 0.930 1.000
#> 7 591 0.150 1.000 51 1.000 0.930 1.000
#> 8 1025 0.152 1.000 51 1.000 0.930 1.000
#> 9 2000 0.150 1.000 51 1.000 0.930 1.000
#> Notes:
#> - n: The sample size in a trial.
#> - p.v: The proportion of valid replications.
#> - nrep: The number of replications.
#> - est: The mean of the estimates in a test across replications.
#> - reject: The proportion of 'significant' replications, that is, the
#> rejection rate. If the null hypothesis is true, this is the Type I
#> error rate. If the null hypothesis is false, this is the power.
#> - r.cilo,r.cihi: The confidence interval of the rejection rate, based
#> on Wilson's (1927) method.
#> - Refer to the tests for the meanings of other columns.It also reports major technical information regarding the search, such as the range of sample sizes tried, the time spent, and the table with all the sample sizes examined, along with the estimated power levels and confidence intervals.
Note that it is normal for the probabilistic bisection method to try the same value (same sample size) several times. Each trial is independent of other trials, and provides unique information to the search.
It also prints the model, the “power curve”, used to estimate the relation between the power and the sample size. Note that this is only a crude model intended only for the values of sample size examined (9 in this example). It is not intended to estimate power for sample sizes outside the range studied.
Unlike the (informal) bisection method, the final solution is usually
validated by a large number of replications (final_nrep)
when probabilistic bisection is used. Therefore, it is not necessary to
use the search process or the power curve to evaluate the final
solution.
Advanced Features
This brief illustration only covers basic features of
n_from_power(). There are other ways to customize the
search, such as the range of sample sizes to examine, the level of
confidence for the confidence interval, and the termination criteria
(e.g., the maximum number of trials, 100 by default for probabilistic
bisection). Please refer to the help page of
x_from_power(), which n_from_power() calls,
for these and other options.