Computes the Mahalanobis distance for each case on all observed variables in a model.
Usage
mahalanobis_rerun(
fit,
emNorm_arg = list(estimate.worst = FALSE, criterion = 1e-06)
)
Arguments
- fit
It can be the output from
lavaan
, such aslavaan::cfa()
andlavaan::sem()
, or the output fromlavaan_rerun()
.- emNorm_arg
No longer used. Kept for backward compatibility.
Value
A md_semfindr
-class object, which is
a one-column matrix (a column vector) of the Mahalanobis
distance for each case. The row names are the case identification
values used in lavaan_rerun()
.
A print method is available for user-friendly output.
Details
mahalanobis_rerun()
gets a lavaan_rerun()
or
lavaan::lavaan()
output and computes the Mahalanobis distance for
each case on all observed variables.
If there are no missing values, stats::mahalanobis()
will be used
to compute the Mahalanobis distance.
If there are missing values on the observed predictors, the means
and variance-covariance matrices will be estimated by maximum
likelihood using lavaan::lavCor()
. The estimates will be passed
to modi::MDmiss()
to compute the Mahalanobis distance.
Supports both single-group and multiple-group models. For multiple-group models, the Mahalanobis distance for each case is computed using the means and covariance matrix of the group this case belongs to. (Support for multiple-group models available in 0.1.4.8 and later version).
References
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 2, 49-55.
Author
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448.
Examples
library(lavaan)
dat <- pa_dat
# The model
mod <-
"
m1 ~ a1 * iv1 + a2 * iv2
dv ~ b * m1
a1b := a1 * b
a2b := a2 * b
"
# Fit the model
fit <- lavaan::sem(mod, dat)
summary(fit)
#> lavaan 0.6.17 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 100
#>
#> Model Test User Model:
#>
#> Test statistic 6.711
#> Degrees of freedom 2
#> P-value (Chi-square) 0.035
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|)
#> m1 ~
#> iv1 (a1) 0.215 0.106 2.036 0.042
#> iv2 (a2) 0.522 0.099 5.253 0.000
#> dv ~
#> m1 (b) 0.517 0.106 4.895 0.000
#>
#> Variances:
#> Estimate Std.Err z-value P(>|z|)
#> .m1 0.903 0.128 7.071 0.000
#> .dv 1.321 0.187 7.071 0.000
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|)
#> a1b 0.111 0.059 1.880 0.060
#> a2b 0.270 0.075 3.581 0.000
#>
# Fit the model n times. Each time with one case removed.
# For illustration, do this only for selected cases.
fit_rerun <- lavaan_rerun(fit, parallel = FALSE,
to_rerun = 1:10)
#> The expected CPU time is 0.37 second(s).
#> Could be faster if run in parallel.
# Compute the Mahalanobis distance for each case
out <- mahalanobis_rerun(fit_rerun)
# Results excluding a case, for the first few cases
head(out)
#> md
#> 1 1.9107778
#> 2 0.4442464
#> 3 3.7867385
#> 4 1.0653437
#> 5 1.9803351
#> 6 0.2875484
# Compute the Mahalanobis distance using stats::mahalanobis()
md1 <- stats::mahalanobis(dat, colMeans(dat), stats::cov(dat))
# Compare the results
head(md1)
#> [1] 1.9107778 0.4442464 3.7867385 1.0653437 1.9803351 0.2875484
# A CFA model
dat <- cfa_dat
mod <-
"
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f1 ~~ f2
"
# Fit the model
fit <- lavaan::cfa(mod, dat)
fit_rerun <- lavaan_rerun(fit, parallel = FALSE,
to_rerun = 1:10)
#> The expected CPU time is 0.63 second(s).
#> Could be faster if run in parallel.
mahalanobis_rerun(fit_rerun)
#>
#> -- Mahalanobis Distance --
#>
#> md
#> 3 21.492
#> 2 8.802
#> 1 8.178
#> 10 8.090
#> 9 6.552
#> 5 5.135
#> 8 3.415
#> 7 2.470
#> 6 2.382
#> 4 2.142
#>
#> Note:
#> - All stored cases are displayed.
#> - Cases sorted by Mahalanobis distance in decreasing order.
# A latent variable model
dat <- sem_dat
mod <-
"
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
f2 ~ a * f1
f3 ~ b * f2
ab := a * b
"
# Fit the model
fit <- lavaan::cfa(mod, dat)
fit_rerun <- lavaan_rerun(fit, parallel = FALSE,
to_rerun = 1:10)
#> The expected CPU time is 0.65 second(s).
#> Could be faster if run in parallel.
mahalanobis_rerun(fit_rerun)
#>
#> -- Mahalanobis Distance --
#>
#> md
#> 6 17.521
#> 9 16.870
#> 4 13.337
#> 1 13.332
#> 3 9.792
#> 7 9.485
#> 2 8.445
#> 5 8.319
#> 8 5.332
#> 10 1.904
#>
#> Note:
#> - All stored cases are displayed.
#> - Cases sorted by Mahalanobis distance in decreasing order.