Technical Appendix: Workflow of `get_prod()` and Friends
Shu Fai Cheung & Sing-Hang Cheung
Source:vignettes/articles/get_prod.Rmd
get_prod.Rmd
Goal
This technical appendix describes how product terms in a model or dataset is identified in the package manymome (Cheung & Cheung, 2023).
The function, get_prod()
, is called internally by other
main functions. Users do not need to call it directly. Nevertheless,
advanced users may be interested in learning how it works.
get_prod()
This function is used to find all product term(s), if any, from
x
to y
, x
having a direct path to
y
.
Main arguments
x
: The name of the predictor.y
: The name of the outcome.fit
: A fit object. Currently only supports alavaan
-class object.est
: The output oflavaan::parameterEstimates()
. IfNULL
, the default, it will be generated fromfit
. If supplied,fit
will ge ignored.data
: A data frame. If supplied, it will be used to identify the product terms.operator
: The string used to indicate a product term. Default is":"
, used in bothlm()
andlavaan::sem()
for observed variables. If raw data is not available, this is needed for identifying product terms.
Output
This is a sample output:
It returns a list with these elements:
prod
: The name(s) of the product term(s) in the model or data. It can have more than one element if the effect ofx
ony
is moderated by more than one moderator.b
: The coefficient(s) of the product term(s).w
: The name(s) of the moderator(s).x
: The predictor,x
.y
: The outcome variable,y
.
Sample Data and Model
The sample dataset modmed_x1m3w4y1
from the package
manymome
will be used for illustration:
## x w1 w2 w3 w4 m1 m2 m3 y gp city
## 1 0.192 1.461 -0.280 0.936 1.239 10.20 9.25 9.435 17.2 mars gamma
## 2 0.362 1.302 -0.476 -1.494 0.677 10.46 9.05 6.527 12.7 mars alpha
## 3 -1.282 0.409 0.751 -0.938 -1.689 6.94 12.19 19.180 11.1 earth alpha
## 4 -0.321 -1.296 -0.439 -1.066 1.588 9.78 8.50 -0.168 11.3 earth gamma
## 5 1.941 2.409 1.416 1.589 -0.814 11.79 10.18 12.061 15.2 earth sigma
## 6 -0.598 0.861 0.918 1.289 1.713 9.36 11.21 16.945 17.7 earth beta
For illustration, some product terms will be formed manually, and
some product terms will be formed using the :
operator in
lavaan
:
## This is lavaan 0.6-19
## lavaan is FREE software! Please report any bugs.
mod <-
"
m1 ~ x + w1 + x_o_w1
m2 ~ m1 + w2 + m1:w2
m3 ~ m2
y ~ m3 + w4 + m3:w4 + x + w3 + x:w3 + x:w4
"
fit <- sem(model = mod,
data = dat,
meanstructure = TRUE,
fixed.x = FALSE)
Fit the model by lm()
:
Modes
It works in several modes.
fit
only, with raw data
A fit
object (lavaan
-class object) only is
supplied, which has raw data stored.
The parameter estimate table, est
, is extracted from
fit
.
Raw data is extracted from fit
to identify product
terms.
It then returns product terms that involve x
in
predicting y
, if any.
Case: A path with a moderator
get_prod(x = "x", y = "m1", fit = fit)
## $prod
## [1] "x_o_w1"
##
## $b
## x_o_w1
## 0.2337967
##
## $w
## [1] "w1"
##
## $x
## [1] "x"
##
## $y
## [1] "m1"
Case: A path with more than one moderator
get_prod(x = "x", y = "y", fit = fit)
## $prod
## [1] "x:w3" "x:w4"
##
## $b
## x:w3 x:w4
## 0.5576550 -0.4167135
##
## $w
## [1] "w3" "w4"
##
## $x
## [1] "x"
##
## $y
## [1] "y"
Case: A path without a moderator
get_prod(x = "m2", y = "m3", fit = fit)
## [1] NA
Case: x
does not have a direct path to
y
get_prod(x = "m1", y = "m3", fit = fit)
## [1] NA
est
and data
In this mode, raw data supplied directly is used to identify product terms.
The parameter estimate table est
is used to determine
the form of the model.
It then returns product term(s) that involve x
in
predicting y
, if any.
Case: A path with a moderator
get_prod(x = "x", y = "m1", est = lm_est$est, data = lm_est$data)
## $prod
## [1] "x:w1"
##
## $b
## x:w1
## 0.2337967
##
## $w
## [1] "w1"
##
## $x
## [1] "x"
##
## $y
## [1] "m1"
Case: A path with more than one moderator
get_prod(x = "x", y = "y", est = lm_est$est, data = lm_est$data)
## $prod
## [1] "x:w3" "w4:x"
##
## $b
## x:w3 w4:x
## 0.5576550 -0.4167135
##
## $w
## [1] "w3" "w4"
##
## $x
## [1] "x"
##
## $y
## [1] "y"
Case: A path without a moderator
get_prod(x = "m2", y = "m3", est = lm_est$est, data = lm_est$data)
## [1] NA
Case: x
does not have a direct path to
y
get_prod(x = "m1", y = "m3", est = lm_est$est, data = lm_est$data)
## [1] NA
est
only
In this mode, operator
needs to be set to identify
product terms from the parameter estimate table est
.
The table from est
is then used to determine the form of
the model.
It then returns product term(s) that involve x
in
predicting y
, if any.
This mode is useful when a product term is not formed from the raw data. For example, when the product term is a product of two latent variables, or when raw data is not available.
Case: A path with a moderator
get_prod(x = "x", y = "m1", est = lm_est$est, operator = ":")
## $prod
## [1] "x:w1"
##
## $b
## x:w1
## 0.2337967
##
## $w
## [1] "w1"
##
## $x
## [1] "x"
##
## $y
## [1] "m1"
Case: A path with more than one moderator
get_prod(x = "x", y = "y", est = lm_est$est, operator = ":")
## $prod
## [1] "x:w3" "w4:x"
##
## $b
## x:w3 w4:x
## 0.5576550 -0.4167135
##
## $w
## [1] "w3" "w4"
##
## $x
## [1] "x"
##
## $y
## [1] "y"
Case: A path without a moderator
get_prod(x = "m2", y = "m3", est = lm_est$est, operator = ":")
## [1] NA
Case: x
does not have a direct path to
y
get_prod(x = "m1", y = "m3", est = lm_est$est, operator = ":")
## [1] NA
find_all_products()
and
find_product()
How it works
The function find_all_products()
is an internal function
for identifying all product terms in a dataset. It requires the raw data
to work. It is called by get_prod()
when raw data is
available.
For each column in a dataset, it calls find_product()
to
check whether this column is the product of two other columns. If yes,
find_product()
returns the component column names. It then
returns a named list of the output of find_product()
for
columns of product term.
The search by find_product()
is done numerically,
checking whether a column can be formed from the product of any two
other columns (missing data allowed).
This approach is exact and simple. There is no need to rely on any special naming convention in the model to denote a product term.
find_all_products()
returns a named list of product term
components.
If expand
= TRUE, it will try to find all lowest order
components. Therefore, a product term of three or more columns can also
be identified.
Example
We first create a dataset with some product terms.
library(manymome)
set.seed(63224)
dat <- round(as.data.frame(MASS::mvrnorm(100, rep(0, 5), diag(5))), 3)
head(dat)
## V1 V2 V3 V4 V5
## 1 -1.291 0.166 0.384 0.270 1.469
## 2 0.516 0.497 0.071 1.538 -0.441
## 3 -1.262 -0.146 -1.103 -0.031 -0.664
## 4 1.586 -0.098 1.241 1.035 -0.004
## 5 2.072 0.384 -0.964 -1.068 -0.307
## 6 -0.428 -0.187 -0.717 1.519 0.416
dat$V2V1 <- dat$V2 * dat$V1
dat$V3V5 <- dat$V3 * dat$V5
dat$V1V2V3 <- dat$V1 * dat$V2 * dat$V3
print(head(dat), digits = 3)
## V1 V2 V3 V4 V5 V2V1 V3V5 V1V2V3
## 1 -1.291 0.166 0.384 0.270 1.469 -0.214 0.56410 -0.0823
## 2 0.516 0.497 0.071 1.538 -0.441 0.256 -0.03131 0.0182
## 3 -1.262 -0.146 -1.103 -0.031 -0.664 0.184 0.73239 -0.2032
## 4 1.586 -0.098 1.241 1.035 -0.004 -0.155 -0.00496 -0.1929
## 5 2.072 0.384 -0.964 -1.068 -0.307 0.796 0.29595 -0.7670
## 6 -0.428 -0.187 -0.717 1.519 0.416 0.080 -0.29827 -0.0574
manymome:::find_all_products(dat)
## $V2V1
## [1] "V1" "V2"
##
## $V3V5
## [1] "V3" "V5"
##
## $V1V2V3
## [1] "V3" "V1" "V2"
The output is a named list of character vectors. The names are the column names that are product terms. Each character vector stores the column names of its components.
Reference
Cheung, S. F., & Cheung, S.-H. (2023). manymome: An R package for computing the indirect effects, conditional effects, and conditional indirect effects, standardized or unstandardized, and their bootstrap confidence intervals, in many (though not all) models. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02224-z