Internal Workflow of semlbci()
Shu Fai Cheung
2024-06-07
Source:vignettes/articles/technical_workflow.Rmd
technical_workflow.Rmd
Introduction
This vignette is a simplified version of the technical appendix hosted at the OSF project site of Cheung & Pesigan (2023), associated with the
package semlbci
. It
presents the workflow used by ci_i_one()
and
ci_bound_wn_i()
to find a bound of a likelihood-based
confidence interval (LBCI) of a parameter in a fitted
lavaan
model.
ci_i_one()
and ci_bound_wn_i()
are internal
functions used by semlbci()
. They are not supposed to be
used by users. Nevertheless, developers can use them directly if so
desired.
Purposes of the Functions
ci_i_one()
is called by semlbci()
for each
confidence bound (limit), twice for each selected parameter, once for
the lower bound and once for the upper bound. It sets up necessary
arguments and then call the lowest level function responsible for
searching a bound.
Currently, only one such function is available,
ci_bound_wn_i()
, which implements the algorithm proposed by
Wu & Neale (2012), adapted by Pek & Wu (2015), without the part for
handling parameters with attainable bounds. To avoid confusion with the
method by Wu & Neale (2012) for
parameters with attainable bound, we denote this method WNPW
(Wu-Neale-Pek-Wu) method in this document.
Overall Workflow
We first present the overall workflow1 of
semlbci()
. We then present the workflow of
ci_i_one()
, called by semlbci()
once for each
bound of each parameter requested (twice for each parameter, once for
the lower bound and once for the upper bound). Last, we present the
workflows of the two stages of ci_bound_wn_i()
, the
minimization stage and the checking stage.
Details for the arguments mentioned in this section can be found from
the help pages of semlbci()
, ci_i_one()
, and
ci_bound_wn_i()
.
semlbci()
semlbci()
is the main function used by users. It is used
to find the LBCI of selected parameters. It is also responsible for
setting the equality constraint required by the WNPW method. If robust
LBCI proposed by Falk (2018) is requested,
it will also compute the scaling and shift factors to be used by the
robust likelihood ratio test by Satorra
(2000).
The general workflow of semlbci()
is shown below:
We describe below the two functions called by semlbci()
in the workflow, set_constraint()
and
:scaling_factor3()
.
set_constraint()
This function, called by semlbci()
, is used to set the
equality constraint used by the WNPW method.
As presented in Pek & Wu (2015), to find the lower bound, \(\theta_L\), of the 100(1-\(\alpha\))% LBCI of \(\hat{\theta}_j\), \(\theta_L\) is minimized with respect to all parameters, \((\theta_L, \boldsymbol{\theta}_q)\), \(\boldsymbol{\theta_q}\) being all parameters other than \(\theta_L\), subject to the following constraint:
\[ 0 = 2nF(\theta_L, \boldsymbol{\theta_q}) - (2nF(\hat{\theta}) + \chi_{(1, 1 - \alpha)}^2) \]
\(F(\hat{\boldsymbol{\theta}})\) is the value of the discrepancy function evaluated at the ML estimate \(\hat{\boldsymbol{\theta}}\). \(\chi_{(1, 1 - \alpha)}^2\) is the \(\chi^2\) critical value at \(df = 1\) and level of significance = \(\alpha\) (about 3.84 with \(\alpha = .05\)), and \(n\) is the sample size (total sample size in multisample models).
This constraint means that model \(\chi^2\) difference test between the model with \(\theta_j\) fixed to \(\theta_L\) and the original model with \(\theta_j\) freely estimated should have a p-value equal to \(\alpha\).
In lavaan
, \(F(\theta_L,
\boldsymbol{\theta}_q)\) is "fmin"
in the output of
lavaan::fitMeasures()
. When the estimator is
ML
and likelihood = "normal"
, the default,
multiplying "fmin"
by \(2n\) yields the model \(\chi^2\).
To find the upper bound, \(\theta_U\), \(-\theta_U\) is minimized (equivalently, \(\theta_U\) is maximized). The final value will be multiplied by \(-1\) to get the upper bound.
The WNPW method can easily be used when a model has one or more
equality constraints on the parameters. These constraints are simply
extracted from the lavaan
object and passed to
nloptr::nloptr()
, the function used to do the minimization,
along with the constraint by the WNPW method when minimizing the
objective function.
The function set_constraint()
returns a general version
of the constraint:
\[ 0 = 2nF(\theta_L, \boldsymbol{\theta}_q) - (2nF(\hat{\boldsymbol{\theta}}) + \frac{\chi_{(1, 1 - \alpha)}^2 - b}{a}) \]
In this general form, \(a\) is the
scaling factor and \(b\) is the shift
factor. When \(a = 1\) and \(b = 0\), it is the constraint in the WNPW
method. When robust LBCI is requested, the scaling and shift factors are
computed as in the same way by lavaan::lavTestLRT()
with
method = satorra.2000
and
A.method = "exact"
.2
semlbci:::scaling_factors3()
This internal function, not exported, computes the scaling and shift
factors used by lavaan::lavTestLRT()
with
method = satorra.2000
and A.method = "exact"
.
This is the method proposed by Falk (2018)
for robust LBCI. These factors can be used to do the robust likelihood
ratio test proposed by Satorra (2000)
without directly calling lavaan::lavTestLRT()
when doing
the minimization. Because these factors depend only on the parameter for
which the LBCI is to be searched, they can be computed once and reused
during the minimization.
Since version 0.6-13 of lavaan
,
lavaan::lavTestLRT()
stores the scaling and shift factors
in the output. Therefore, for future versions of semlbci
,
it is possible to remove semlbci:::scaling_factor3()
and
use lavaan::lavTestLRT()
to get the scaling and shift
factors directly.
ci_i_one()
The workflow of ci_i_one()
is presented below:
Its job is to call ci_bound_wn_i()
once or more than
once to find the bound of an LBCI of a parameter. It does not check the
validity or plausibility of the bound. The lowest level function,
ci_bound_wn_i()
, is responsible for doing the checks. If
something is wrong, ci_bound_wn_i()
will set
status
to a non-zero value.
The argument try_k_more_times
is used to determine
whether it will try “harder” for k more times if the first
attempt failed (e.g., status != 0
, meaning that the bound
found fails one or more checks).
ci_bound_wn_i()
The function ci_bound_wn_i()
is the lowest level
function that is responsible for searching the bound. It implements an
optimization method (the WNPW method in this case) and takes care of all
the technical details. It is not supposed to be used by users and the
interface is not designed to be user-friendly. Nevertheless, interested
users can use it directly to find a bound, bypassing
semlbci()
. Examples can be found in
vignette("technical_searching_one_bound", package = "semlbci")
.
The workflow of ci_bound_wn_i()
can be separated into
two stages, the minimization stage and the checking
stage.
Minimization Stage
The workflow of the minimization stage is presented below:
Optimization Algorithm
The function used to do the minimization is
nloptr::nloptr()
, using NLOPT_LD_SLSQP
as the
value for algorithm
. The parameters (x0
) are
all free parameters in the model. The objective function,
eval_f
is set to lbci_b_f()
, the function that
returns the bound in each iteration. The gradient function,
eval_grad_f
, is lbci_b_grad()
. The equality
constraint(s), eval_g_eq
, is the result of
set_constraint()
, supplied through the argument
f_constr
from semlbci()
, along with other
equality constraints in the model, if any.
The two functions created by ci_bound_wn_i()
in this
stage, lbci_b_f()
and lbci_b_grad()
, are
described below.
lbci_b_f()
In the WNPW method, if the parameter, \(\theta_j\), is a free parameter, and the lower bound is to be searched, this function simply returns the value of \(\theta_j\) (all other free parameters other than \(\theta_j\) are denoted by \(\theta_q\)):
\[ f(\theta_j, \boldsymbol{\theta}_q) = \theta_j \]
If the upper bound is to be searched, then this function is:
\[ f(\theta_j, \boldsymbol{\theta}_q) = -\theta_j \]
If the parameter is a function of some free parameters (e.g., an indirect effect, or a standardized coefficient such as correlation), denoted by \(h(\boldsymbol{\theta})\), \(\boldsymbol{\theta}\) being all free parameters (though some may not be used in \(h\)), and the lower bound is to be searched, then this function is:
\[ f(\boldsymbol{\theta}) = h(\boldsymbol{\theta}) \]
Similarly, if the upper bound is to be searched, this function is:
\[ f(\boldsymbol{\theta}) = -h(\boldsymbol{\theta}) \]
Therefore, in the optimization, minimization is conducted whether the lower or upper bound is to be searched.
lbci_b_grad()
The gradient of the object function, lbci_b_f()
, is
either precomputed (if the parameter for which the bound is being
searched is a parameter) or computed by
lavaan::lav_func_gradient_complex()
.
Tweaking the Optimization
The first attempt of optimization may fail, especially when the
target parameter is a function of free parameters (e.g., an indirect
effect, or a parameter in the standardized solution). The arguments
xtol_rel
, ftol_rel
, and lb
are
adjusted across attempts. This is done by ci_i_one()
for
the second and subsequent calls to ci_bound_wn_i()
.
Checking Stage
The workflow of the checking stage is presented below:
It will do the following checks, in this order:
-
Is the status code of
nloptr::nloptr()
equal to 0 (“success”)?If the status code of
nloptr::nloptr()
is not equal to 0, then the status code forci_bound_wn_i()
will be set to 1. -
In final solution, do the values of the free parameters result in an admissible solution? (Checked by
lavaan::lavInspect()
withwhat = "post.check"
).The values of the parameters in the final solution is used to fit the model, and then
lavaan::lavInspect()
is called to check the solution. Examples of inadmissible solution are negative variances and correlations greater than one in magnitude.If the solution is not admissible, then the status code for
ci_bound_wn_i()
will be set to 1. -
When the target parameter (free or derived, i.e., the function of parameters, such as an indirect effect or a standardized regression coefficient) is fixed to the bound found, is the p-value of the likelihood ratio test between this constrained model and the original model equal to 1 - confidence level (.05 for a 95% LBCI)?
If not, then the likelihood-based confidence bound is by definition invalid. The status code for
ci_bound_wn_i()
will be set to 1.The check is equivalent to using
lavaan::lavTestLRT()
. If robust LBCI is requested, then the check is equivalent to usinglavaan::lavTestLRT()
withmethod = satorra.2000
andA.method = "exact"
.
The bound is set to NA
if the solution fails any of the
three checks presented above, to prevent users from accidentally using a
bound that may be invalid.
In sum, the bound returned, if not NA
, has the following
characteristics:
Noted as “success” in the optimization by
nloptr::nloptr()
.The values of the parameters do not yield an inadmissible solution.
The bound is by definition valid.
Main Arguments
This section presents in details the arguments used by
ci_i_one()
and ci_bound_wn_i()
.
ci_i_one()
ci_i_one()
is an interface between user functions such
as semlbci()
and low level functions such as
ci_bound_wn_i()
. It is responsible for setting necessary
values to be used by ci_bound_wn_i()
.
These are the main arguments of ci_i_one()
:
-
i
:The position (row number) of the target parameters as appeared in the parameter table of the output of
lavaan::lavaan()
and its wrappers, such aslavaan::sem()
andlavaan::cfa()
. This uniquely identifies a parameter, which can be free, fixed, or user-defined. -
which
:Either
"lbound"
or"ubound"
, denoting lower bound and upper bound, respectively. The confidence bound (limit) to be searched. -
sem_out
:A
lavaan
-class object. the output oflavaan::lavaan()
and its wrappers, such aslavaan::sem()
andlavaan::cfa()
. -
method
:The method to be used to find a confidence bound. Currently, only the modified Wu-Neale method presented by Pek and Wu (2015) is supported (
"wn"
). Separating the low level function from this function allows for the possibility to develop low level functions for other methods, without the need to change the interface implemented inci_i_one()
. -
standardized
:Logical. Whether the confidence bound of the parameter in the standardized solution is to be searched. For example, for a covariance, whether the covariance, or the correlation, is to be used in searching the bound. Default is
FALSE
. -
robust
:Whether robust likelihood-based confidence bound is to be searched, and if yes, the method to be used. Currently only support
"none"
(robust method not used) or"satorra.2000"
, proposed by Falk (2018). -
sf_full
:Used when
robust
is"satorra.2000"
. IfNA
, the scaling and shift factors used in the likelihood ratio test will be computed internally. If supplied, it should be a list with two scalar elements,c_r
andc_rb
, the scaling factor and the shift factors. -
sf_args
:If
robust
is"satorra.2000"
andsf_full
isNA
, this is a named list of arguments to be passed tosemlbci:::scaling_factor3()
, an internal function for computing the scaling and shift factors proposed by Asparouhov & Muthén (2010). -
try_k_more_times
:How many more times to try if the status code is not zero. Default is 0 but
semlbci()
set this argument to 2 when calling this function. If set to an integer greater than zero, it will call the low level function until the status code is zero or until this number of additional calls have been attempted. In each successive call, some values will be modified to do the search using these new settings.
ci_bound_wn_i()
ci_bound_wn_i()
is the low level function called by
ci_i_one()
. This function implements the modified Wu-Neale
method presented by Pek & Wu (2015),
named Wu-Neale-Pek-Wu (WNPW) method in this document. This function is
not supposed to be used by users and the interface is not user-friendly.
Interested users can refer to
vignette("technical_searching_one_bound", package = "semlbci")
to see how to use ci_bound_wn_i()
directly.
These are the main arguments of ci_bound_wn_i()
:
-
i
:The position (row number) of the target parameters as appeared in the parameter table of the output of
lavaan::lavaan()
and its wrappers, such aslavaan::sem()
andlavaan::cfa()
. This id uniquely identifies a parameter, which can be free, fixed, or user-defined. -
npar
:The number of free parameters in the model, including those constrained to be equal. To be supplied by
ci_i_one()
. -
sem_out
:A
lavaan
-class object. the output oflavaan::lavaan()
and its wrappers, such aslavaan::sem()
andlavaan::cfa()
. -
f_constr
:The constraint function generated by
set_constraint()
. Created bysemlbci()
and then passed to it throughci_i_one()
. -
which
:Either
"lbound"
or"ubound"
, denoting lower bound and upper bound, respectively. The confidence bound (limit) to be searched. -
perturbation_factor
:The number by which the parameter estimates in
sem_out
will be multiplied, to set the starting values, because using the parameter estimates as starting values may lead to errors in the first few iterations. Default is .90. This argument is ignored ifwald_ci_start
, described below, isTRUE
. -
lb_var
:The lower bound for free parameters that are variances. If equal to
-Inf
, the default,lb_prop
andlb_se_k
, described below, will be used to set the lower bounds for free variances. If it is a number, it will be used to set the lower bounds for all free variances. -
wald_ci_start
:If
TRUE
, there are no equality constraints in the model, and the target parameter is not a user-defined parameter, the Wald or delta confidence bounds will be used as the starting values. -
standardized
:Logical. Whether the confidence bound of the parameter in the standardized solution is to be searched. For example, for a covariance, whether the covariance, or the correlation, is to be used in searching the bound. Default is
FALSE
. -
opts
:A named list of options to be passed to
nloptr::nloptr()
, the function used for the optimization. Default islist()
. This argument can be used to override internal settings used byci_bound_wn_i()
. -
ciperc
:The intended coverage probability for the confidence interval. Default is .95, and the bound for a 95% likelihood-based confidence interval will be sought.
-
ci_limit_ratio_tol
:The tolerance for the ratio of
a
tob
, wherea
is the distance between an bound of an LBCI and the point estimate, and theb
is the distance between the original confidence bound (by default the Wald or delta CI inlavaan::lavaan()
) and the point estimate. If the ratio is larger than this value or smaller than the reciprocal of this value, a warning is set in the status code. Default is 1.5. -
verbose
:If
TRUE
, the function will store more diagnostic information in the attributediag
. Default isFALSE
. -
sf
:A scaling factor. Used for robust confidence bounds. Default is 1. Precomputed by an internal function called by
semlbci()
orci_i_one()
whenrobust = "satorra.2000"
. -
sf2
:A shift factor. Used for robust confidence bounds. Default is 1. Precomputed by an internal function called by
semlbci()
orci_i_one()
whenrobust = "satorra.2000"
. -
p_tol
:The tolerance for checking the achieved level of confidence, that is, the p-value of the likelihood ratio test between the original model and the model with the parameter fixed to the bound found. If the absolute difference between the achieved level and
ciperc
is greater than this number, a warning is set in the status code and the bound is set toNA
. Default is 5e-4. -
xtol_rel_factor
:Multiply the internal default value of
xtol_rel
fornloptr::nloptr()
(1.0e-5) by this number, usually a positive number equal to or less than 1, to change the default termination criterion. Default is 1. This allows tweaking the settings for optimization without knowing the internal default value. -
ftol_rel_factor
:Multiply the internal default value of
ftol_rel
fornloptr::nloptr()
(1.0e-5) by this number, usually a positive number equal to or less than 1, to change the default termination criterion. Default is 1. This allows tweaking the settings for optimization without knowing the internal default value. -
lb_prop
:Used by an internal function to set the lower bound for free variances. Default is .05, setting the lower bound to (.05)(point estimate). Used only if the lower bound set by
lb_se_k
is negative. This constraint is used only in the optimization to prevent intermediate values too far away from the point estimates. The final check done by fitting the model inlavaan
will not implement this constraint. -
lb_se_k
Used by an internal function to set the lower bound for free variances. Default is 3, the estimate minus 3 \(\times\) standard error. If negative, the lower bound is set using
lb_prop
. This constraint is used only in the optimization to prevent intermediate values too far away from the point estimates. The final check done by fitting the model inlavaan
will not implement this constraint.
Please refer to the help page of ci_bound_wn_i()
to
learn about other arguments.