Bootstrap Confidence Interval for Standardized Solution in lavaan
Shu Fai Cheung
Source:vignettes/standardizedSolution_boot_ci.Rmd
standardizedSolution_boot_ci.Rmd
Introduction
This document introduces the function
standardizedSolution_boot_ci()
, and related helpers, from
the package semhelpinghands
.
What standardizedSolution_boot_ci()
Does
In lavaan
, even with se = "bootstrap"
, the
confidence intervals in the standardized solution are not
bootstrap confidence intervals. This is a problem when researchers want
to form bootstrap confidence intervals for parameters such as a
standardized indirect effect.1
The function standardizedSolution_boot_ci()
addresses
this problem. It accepts a lavaan::lavaan-class
object
fitted with se = "bootstrap"
(or se = "boot"
)
and forms the percentile confidence intervals based on the bootstrap
estimates stored in the object.
Data and Model
A mediation model example modified from the official
lavaan
website is used (https://lavaan.ugent.be/tutorial/mediation.html).
library(lavaan)
set.seed(1234)
n <- 100
# X drawn from a Chi-square distribution with df = 2
X <- (rchisq(n, df = 2) - 2) / sqrt(2 * 2)
M <- .40 * X + sqrt(1 - .40^2) * rnorm(n)
Y <- .30 * M + sqrt(1 - .30^2) * rnorm(n)
Data <- data.frame(X = X,
Y = Y,
M = M)
model <-
"
# direct effect
Y ~ c*X
# mediator
M ~ a*X
Y ~ b*M
# indirect effect (a*b)
ab := a*b
# total effect
total := c + (a*b)
"
This model is fitted with se = "bootstrap"
and 5000
replication. (Change ncpus
to a value appropriate for the
system running it.)
fit <- sem(model,
data = Data,
se = "bootstrap",
bootstrap = 5000,
parallel = "snow",
ncpus = 4,
iseed = 1234)
(Note that having a warning for some bootstrap runs is normal. The failed runs will not be used in forming the confidence intervals.)
This is the standardized solution with delta-method confidence intervals.
standardizedSolution(fit)
#> lhs op rhs label est.std se z pvalue ci.lower ci.upper
#> 1 Y ~ X c 0.054 0.118 0.461 0.645 -0.176 0.285
#> 2 M ~ X a 0.370 0.098 3.768 0.000 0.178 0.563
#> 3 Y ~ M b 0.255 0.097 2.622 0.009 0.064 0.446
#> 4 Y ~~ Y 0.922 0.055 16.653 0.000 0.813 1.030
#> 5 M ~~ M 0.863 0.073 11.866 0.000 0.720 1.006
#> 6 X ~~ X 1.000 0.000 NA NA 1.000 1.000
#> 7 ab := a*b ab 0.094 0.045 2.093 0.036 0.006 0.183
#> 8 total := c+(a*b) total 0.149 0.108 1.375 0.169 -0.063 0.361
Bootstrap Percentile CIs for Standardized Solution
To form bootstrap percentile confidence intervals for the
standardized solution, simply use
standardizedSolution_boot_ci()
instead of
lavaan::standardizedSolution()
:
library(semhelpinghands)
ci_boot <- standardizedSolution_boot_ci(fit)
ci_boot
#> lhs op rhs label est.std se z pvalue ci.lower ci.upper
#> 1 Y ~ X c 0.054 0.118 0.461 0.645 -0.176 0.285
#> 2 M ~ X a 0.370 0.098 3.768 0.000 0.178 0.563
#> 3 Y ~ M b 0.255 0.097 2.622 0.009 0.064 0.446
#> 4 Y ~~ Y 0.922 0.055 16.653 0.000 0.813 1.030
#> 5 M ~~ M 0.863 0.073 11.866 0.000 0.720 1.006
#> 6 X ~~ X 1.000 0.000 NA NA 1.000 1.000
#> 7 ab := a*b ab 0.094 0.045 2.093 0.036 0.006 0.183
#> 8 total := c+(a*b) total 0.149 0.108 1.375 0.169 -0.063 0.361
#> boot.ci.lower boot.ci.upper boot.se
#> 1 -0.171 0.286 0.117
#> 2 0.144 0.537 0.101
#> 3 0.061 0.443 0.097
#> 4 0.766 0.986 0.058
#> 5 0.712 0.979 0.070
#> 6 NA NA NA
#> 7 0.016 0.202 0.047
#> 8 -0.048 0.362 0.106
The bootstrap percentile confidence intervals are appended to the
right of the original output of
lavaan::standardizedSolution()
, in columns
boot.ci.lower
and boot.ci.upper
. The standard
errors based on the bootstrap estimates (the standard deviation of the
estimates) are listed on the column boot.se
.
As expected, the bootstrap percentile confidence interval of the
indirect effect, ab
, is [0.016, 0.202], wider than the
delta-method confidence interval, [0.006, 0.183], and is shifted to the
right.
Print in a Friendly Format
The print-method of the output of
standardizedSolution_boot_ci()
supports printing the
results in a text Format similar to the summary of lavaan
output. Call print()
directly and add
output = "text"
:
print(ci_boot,
output = "text")
#>
#> Standardized Estimates Only
#>
#> Standard errors Bootstrap
#> Confidence interval Bootstrap
#> Confidence Level 95.0%
#> Standardization Type std.all
#> Number of requested bootstrap draws 5000
#> Number of successful bootstrap draws 5000
#>
#> Regressions:
#> Standardized Std.Err ci.lower ci.upper
#> Y ~
#> X (c) 0.054 0.117 -0.171 0.286
#> M ~
#> X (a) 0.370 0.101 0.144 0.537
#> Y ~
#> M (b) 0.255 0.097 0.061 0.443
#>
#> Variances:
#> Standardized Std.Err ci.lower ci.upper
#> .Y 0.922 0.058 0.766 0.986
#> .M 0.863 0.070 0.712 0.979
#>
#> Defined Parameters:
#> Standardized Std.Err ci.lower ci.upper
#> ab 0.094 0.047 0.016 0.202
#> total 0.149 0.106 -0.048 0.362
Note that it will replace the results of unstandardized solution by those from the standardized solution.
To print both the unstandardized and standardized results in the
text-format, add standardized_only = FALSE
when calling
print()
.
Note
The function standardizedSolution_boot_ci()
takes some
time to run because it retrieves the estimates of the unstandardized
solution in each bootstrap sample and computes the estimates in the
standardized solution. Therefore, if 5,000 bootstrap samples are
requested, this process is repeated 5,000 times. Nevertheless, it is
still much faster than fitting the model 5,000 times again.
Background
This function was originally proposed in an issue
at GitHub, inspired by a discussion at the Google
group for lavaan. It is not a versatile function and used some
“tricks” to do the work. A more reliable way is to use function like
lavaan::bootstrapLavaan()
. Nevertheless, this simple
function is good enough for the cases I encountered in my work.