Case Residuals Computation
casewise_residuals.Rmd
(Work-In-Progress)
Introduction
This is a technical appendix outlining how the data used to generate the plots are computed.
Computing Casewise Residuals
The function casewise_residual()
first identifies the
\(x\) variables: variables that have
point to at least one other variable, and \(y\) variables, variables that that have at
least one other variable pointing to them. Note that a variable can in
both sets of variables (e.g., a mediator). It then enumerates all
pairwise combination of \(x\) and \(y\) variables to see which of the following
cases they belong to:
Free or or Fixed-To-Nonzero Path
If a path is free, or fixed to non-zero, then residuals of \(x\) and \(y\) are computed by predicting by all other variables that predict \(y\) in the model. The regression coefficients are computed by
\[ \beta_{x} = \Sigma_{x0}^{-1}\Sigma_{(x0, x)}, \] \[ \beta_{y} = \Sigma_{x0}^{-1}\Sigma_{(x0, y)}, \]
where \(\Sigma_{x0}\) is the model-implied covariance of \(x0\), all other variables that predict \(y\), and \(\Sigma_{(x0, x)}\) is a column vector of model-implied covariance between \(x\) and \(x0\). Similarly, \(\Sigma_{(x0, y)}\) is a column vector of model-implied covariance between \(y\) and \(x0\).
The predicted values of \(x\) and \(y\), \(\hat{X}\) and \(\hat{Y}\), respectively, are computed by
\[ \hat{X} = X\beta_{x}, \] \[ \hat{Y} = Y\beta_{y}. \]
where \(X\) and \(Y\) are column vectors of the values of \(x\) and \(y\).
The residuals of \(x\) and \(y\) are computed:
\[ E_{x} = X - \hat{X}, \] \[ E_{y} = Y - \hat{Y}. \]
That is, they computed as in multiple regression although the model-implied covariance matrix is used to compute the regression coefficient. Simple standardized residuals (standardized by their standard deviations, without correction as in studentized residuals in multiple regression) are also stored.
Fixed-to-Zero Path
For a path fixed to zero, if add_path
is
FALSE
(the default), the casewise_residuals()
will do nothing. If add_path
is TRUE
, and the
\(y\) variable does not point to \(x\) in the model[^This prevents adding a
bidirectional path], then it will add a path
from \(x\) to \(y\) and fit the model again. The residuals
for \(x\) an \(y\) will then be computed as for free paths
in the model and stored.
Note that this is done simply by adding y ~ x
to the
model. Functions like lavaan::sem()
may fix or free other
parameters and so the model being fitted may not differ from the
original model by one degree of freedom. The purpose of this step is
purely for exploration.
Limitations
Note that these residuals are model based but lack the adjustments usually done in multiple regression. To our knowledge, these adjustments have not be extended to path analysis. Nevertheless, these residuals are still useful for graphical assessment of linearity and identification of unusual cases with respect to a model.