SEM (Structural Equation Modeling)

Author

HAN University of Applied Sciences

Published

October 14, 2024

1 Take-Aways

This module introduces Structural Equation Modeling (SEM).

SEM is a sophisticated statistical approach that enables researchers to analyze the relationships between observed variables and underlying latent constructs.

You will learn:

The basic concepts of SEM.
The key terms used in SEM: latent and observed variables; reliability; direct and indirect effects; and path diagrams.
The types of models you can analyze (confirmatory factor analysis; path models; and models that combine factor analysis and regression models).
How SEM as a general technique can be used to estimate regression models as a special case.
How to evaluate and improve your model: goodness-of-fit statistics and modification indices.
How to report the results of SEM, including a visualization.
How to specify and run SEM-models in R’s lavaan package, or with user-friendly tools (JASP and Jamovi) which are based on lavaan.

2 What is Structural Equation Modeling?

2.1 Introduction

Structural equation modeling (SEM) is used for estimating causal relationships among variables.

SEM is a combination of two techniques discussed in other modules:
- Factor analysis, and
- Regression analysis.

2.1.1 Factor Analysis & Measurement

Typically, in social and economic research we make use of multiple item scales to measure concepts that are hard to measure directly.

Single-item measures are often tricky. They are vulnerable to biases of meaning and interpretation.

Multiple-item scales are designed to cover the full range of a construct, while single items tend to have narrow and/or ambiguous interpretations.

While it may seem easy for a hotel guest to answer one single question on his or her satisfaction with their stay, satisfaction is the result of differences between expected and perceived aspects (or dimensions) of service quality. Each of these aspects is indicated or measured by several items, as described in a general model of service quality like the SERVQUAL model (Parasuraman, Zeithaml, and Berry 1985).

The dimensions and the items to measure them are often based on extensive research, in which many items are proposed and analyzed using (exploratory) factor analysis.

Once there is agreement on the definition of several constructs and their measurement, then we can use questionnaires on these constructs for a quantitative assessment.

2.1.2 Regression Analysis and Causal Research

When doing applied research on (causal) relationships, we often start from existing theories. We can conduct a survey that makes use of a questionnaire based on, for instance, the SERVQUAL model.

We would still be interested in the reliability of the items measuring our constructs, but mainly to confirm the theory rather than to explore.

SEM models often combine measurement and causal research within one model. These models are visualized in path diagrams.

A path diagram for a research model on customer satisfaction and loyalty can look like this.

In this path diagram we see:

The constructs satisfaction and loyalty in ovals. Since these constructs are not measured directly, they are called latent variables. These constructs are also referred to as factors or dimensions.
The constructs are measured or indicated by items, which are the (observed) variables. Observed variables appear as rectangles in what we call a path diagram.

It is hard to understand why the arrows point from the latent variables to their measurements, rather than the other way around.

The idea is that the responses (observed scores on measurements) are caused by some latent variable. Each answer to an item (e.g., the respondent’s intention to recommend the hotel to others) is partly caused by the latent variable loyalty.

The circles with the Greek letters ε (pronounced epsilon) indicate that the items are not fully explained by the latent variable but have a unique, unexplained variance. Even though our respondent is very loyal to the hotel and plans to go back whenever he can, he may still have reasons not to recommend the hotel to family and friends (maybe he doesn’t have any)! But by and large, we would expect that items are positively correlated, and jointly form valid and reliable measurements of their constructs.
The above remarks are related to the measurement part of the model. The (causal) relation between satisfaction and loyalty is referred to as the structural part.
While in regression analysis, we are typically dealing with one dependent variable, and one or more independent variables, SEM-models can be more complex. In the figure below, overall customer satisfaction is a latent variable which explains customer loyalty but is also explained by other latent variables (perceived quality, perceived value, and customer expectations).

Complex models like this contain complex patterns of causality. Satisfaction is explained directly by customer expectations, and indirectly, via the impact of customer expectations on perceived quality. These direct and indirect effects are relevant from both a theoretical and practical point of view.
This model complexity calls for a different terminology from what we are used to in regression analysis. In regression analysis, variables are either dependent or independent. In SEM, variables (like overall customer satisfaction in the figure above), can be both dependent and independent! For that reason, we speak of endogenous and exogenous variables. Exogenous variables are all (latent or observed variables) that do not have an arrow pointing to them. Endogenous are all other variables, which are explained within the model.

In summary, structural equation models represent causal relationships between latent and observed variables.

Observed variables can be measured directly.
Latent variables cannot be measured directly.

Full SEM-models consist of two parts.

A measurement part, and
A structural part.

The measurement model specifies the relationships between observed variables and latent variables, while the structural model specifies the causal relationships between variables.

SEM-models may lack either a measurement part or a structural part!

2.2 Confirmatory Factor Analysis (CFA)

Confirmatory Factor Analysis (CFA) is a type of SEM-model used to test the validity of measurement models. It verifies whether the observed data fit a pre-specified model. A structural part is missing.

An example of a CFA-model is depicted below. This example is based on the Holzinger-Swineford data used in many papers and books on SEM.

A short version of the data set consists of mental ability test scores, with 9 observed variables on 3 factors. The two-sided arrows between the latent variables indicate that the latent variables are correlated.

Note that this simplified graphical representation is incomplete: the observed variables are bound to have a unique variance which is not shown for the sake of simplicity.

2.3 Path Analysis

Path Analysis is a SEM-model that includes observed variables only.

In a graphical representation, a path analysis model only has rectangles, and no ovals (not counting the variances of the endogenous observed variables).

It is probably here that confusion sets in! 😕

Attitude, in the path model above, is a variable that is hard to measure directly and is typically based on some multiple-item scale. The same holds true for the other three variables in the model.

Note

Traditionally, in the pre-SEM era, researchers used regression models with (independent and/or dependent) variables based on sum scores.

Satisfaction, in the model at the beginning of this module, would be defined as the sum (or more likely, the average) of items 1 to 4. Technically, satisfaction as a sum score is then added as a new and seemingly observed variable to the data set, and subsequently used as such in analyses.

Sum scores are a second-best alternative in the absence of better ways to do it.

SEM, in contrast to regression analysis, allows us to include satisfaction and loyalty as latent variables, each measured by four items with their unique variances. Using sum scores, these variances unique to the items are absorbed in the new variable, while in SEM the latent variable is free of measurement error!

Important

Why then use sum scores in SEM models?

Well, preferably, we don’t!

However, if the sample size is relatively small relative to the complexity of the model, then your software may encounter estimation problems.

One way out of this estimation issue is to reduce the complexity by falling back on sum scores. This takes away one of the advantages of using SEM (dealing with measurement error) but tends to be robust in the structural part of the model – which may be the most interesting part of the analysis.

3 Comparing Regression Analysis and SEM

From the previous section, we can conclude that SEM is a general technique that covers and integrates factor analysis and regression analysis.

In reverse, regression models are just special cases of models that can be estimated with SEM!

Since most of us are familiar with regression models, and how to interpret regression analysis output, we will proceed to show that, indeed, SEM can be used for regression analysis.

This will help you to familiarize yourself with typical SEM output, which looks very different from regression analysis.

Throughout this module, we will make use of a data set that is sampled from a much bigger data set on High-Performance Organizations (HPO; (Waal, Goedegebuure, and Hinfelaar 2015)).

3.1 The Data Set

The High-Performance Organizations (HPO) framework consists of five factors, which are measured by a total of 35 items.

We have sampled 500 responses from a large data with the responses on the questionnaire that was used to collect data from managers and employees in numerous organizations and countries across the globe.

We will use three out of the five factors. For each of these three factors we have picked the - in our view - four most salient items.

The three selected factors are:

Continuous Improvement & Renewal (CI)
Management Quality (MQ), and
Long-Term Orientation (LT).

The 12 items used to measure these factors are the following.

Item	Description
ci1	Our organization has adopted a strategy that sets it clearly apart
ci2	Everything that matters to performance is explicitly reported
ci3	We continuously innovate our core competencies
ci4	We continuously innovate our products, processes and services
mq1	My manager is a role model for organizational members
mq2	My manager coaches organizational members to achieve better results
mq3	My manager applies strong leadership
mq4	My manager holds organizational members responsible for their results
lt1	Our organization maintains good long-term relationships with stakeholders
lt2	Our organization is aimed at servicing the customers optimally
lt3	New management is promoted from within the organization
lt4	Our organization is a secure workplace

3.2 The SEM Model

Our ultimate aim in this module is to test the following model.

In the core of this diagram, CI is explained by both MQ and LT. LT is explained within the model by MQ. As a consequence, MQ has both a direct effect on CI, and an indirect effect on CI via LT.

CI, MQ and LT are latent variables (which is why they appear in ovals), measured by 4 items each (the observed variables in our data set). The latent variables are not measured, and therefore not part of the original data set!

The small circles that are connected to all (latent and observed) endogenous variables, are error variances. Error variances indicate that the endogenous variables may not be fully explained by the model.

3.3 A Regression Model in SEM

It is clear that the model above cannot be tested using one single regression analysis! In order to shed light on the full model, you would have to run at least two regression analyses: one model to regress LT on MQ, and a nother model to regress CI on MQ and LT. That’s not what we want!

But in order to show that regression analysis is a special case of SEM, we simplify the model:

We do away with the measurement part of the model.
We exclude the indirect effect of MQ on CI via LT.

This results in the following model:

We now have:

One dependent variable (CI), and
Two independent variables (MQ and LT).

These variables appear as observed variables, in rectangles. However, they are not (yet) in our data set, so we have to compute them from the observed 12 items that are part of the data set. Traditionally, we use sum scores, after making sure that the items are reliable measurements of the three factors.

Let’s go through the motions!

We will check the reliability of the 12 items as measurements of the three factors.
If reliability is OK, then we will proceed to compute sum scores for the three factors, and add them as new variables (MQ, LT, and CI) to the data.
Lastly, we will regress CI on MQ and LT using both regression analysis and an equivalent model in SEM.

3.3.1 Step 1 - Checking Reliability

We use R to test the reliability of the items used to measure the three factors.

The most widely used measure for reliability, is Cronbach’s alpha (α) (Tavakol and Dennick 2011). It is a measure of internal consistency, indicating how closely related a set of items are as a group. It does so by comparing the shared variance (or covariance) among the items to the amount of overall variance.

We use the alpha() function in R’s psych package, to perform the reliability tests.

Let’s first read the data, which are available from GitHub, and summarize the variables ci1 to ci4.

Code

hpoData <- read.csv("https://raw.githubusercontent.com/statmind/jasp_sem/refs/heads/main/jasp_hpo_csv.csv", header=TRUE)
hpoData

Code

summary(hpoData[1:4])

      ci1              ci2             ci3              ci4        
 Min.   : 1.000   Min.   : 1.00   Min.   : 1.000   Min.   : 1.000  
 1st Qu.: 5.000   1st Qu.: 5.00   1st Qu.: 5.000   1st Qu.: 6.000  
 Median : 6.000   Median : 6.00   Median : 6.000   Median : 7.000  
 Mean   : 6.208   Mean   : 6.25   Mean   : 6.284   Mean   : 6.578  
 3rd Qu.: 7.000   3rd Qu.: 8.00   3rd Qu.: 8.000   3rd Qu.: 8.000  
 Max.   :10.000   Max.   :10.00   Max.   :10.000   Max.   :10.000

The items used to measure CI are in the first four columns of the data set. We can compute the Cronbach’s α statistic as follows.

Code

# install.packages("psych") 
library(psych)
alpha(hpoData[,1:4]) # reliability of the ci-items


Reliability analysis   
Call: alpha(x = hpoData[, 1:4])

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
      0.85      0.85    0.82      0.59 5.8 0.011  6.3 1.6     0.57

    95% confidence boundaries 
         lower alpha upper
Feldt     0.83  0.85  0.87
Duhachek  0.83  0.85  0.87

 Reliability if an item is dropped:
    raw_alpha std.alpha G6(smc) average_r S/N alpha se   var.r med.r
ci1      0.83      0.84    0.78      0.63 5.1    0.013 0.00669  0.65
ci2      0.82      0.82    0.77      0.61 4.7    0.014 0.00594  0.58
ci3      0.78      0.78    0.70      0.54 3.6    0.017 0.00015  0.54
ci4      0.81      0.81    0.74      0.59 4.2    0.015 0.00334  0.58

 Item statistics 
      n raw.r std.r r.cor r.drop mean  sd
ci1 500  0.80  0.80  0.69   0.64  6.2 2.0
ci2 500  0.83  0.82  0.72   0.67  6.2 2.1
ci3 500  0.88  0.88  0.84   0.77  6.3 2.0
ci4 500  0.83  0.84  0.77   0.70  6.6 1.8

Non missing response frequency for each item
       1    2    3    4    5    6    7    8    9   10 miss
ci1 0.02 0.02 0.06 0.08 0.15 0.18 0.24 0.14 0.06 0.04    0
ci2 0.03 0.03 0.04 0.10 0.13 0.18 0.21 0.15 0.09 0.05    0
ci3 0.02 0.03 0.05 0.05 0.16 0.20 0.22 0.16 0.07 0.04    0
ci4 0.01 0.02 0.03 0.05 0.11 0.17 0.31 0.17 0.08 0.04    0

The output is massive. In addition to Cronbach’s α (.85), the output includes among others confidence intervals for α, the impact on α of dropping an item.

A Cronbach’s α of .70 is considered acceptable, and .80 is considered good. It is tempting to go for excellent (.90 or higher) but values of α close to 1 reflect very high correlations between the items - which raises the question if some of the items are not just duplications, and therefore redundant. The α of items ci1 to ci4 is OK.

Dropping an item makes sense if doing so would increase reliability. According to the output, this is not the case for any of the items ci1 to ci4, and therefore we conclude that the items are internally consistent.

The video below summarizes the interpretation of Cronbach’s α.

We do the the same for MQ and LT:

Code

alpha(hpoData[,5:8])


Reliability analysis   
Call: alpha(x = hpoData[, 5:8])

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
      0.87      0.87    0.84      0.63 6.8 0.009  6.7 1.7     0.63

    95% confidence boundaries 
         lower alpha upper
Feldt     0.85  0.87  0.89
Duhachek  0.86  0.87  0.89

 Reliability if an item is dropped:
    raw_alpha std.alpha G6(smc) average_r S/N alpha se   var.r med.r
mq1      0.82      0.82    0.77      0.61 4.7   0.0133 0.00968  0.58
mq2      0.82      0.82    0.76      0.60 4.5   0.0138 0.00696  0.58
mq3      0.82      0.82    0.76      0.60 4.4   0.0137 0.01310  0.53
mq4      0.88      0.88    0.83      0.71 7.5   0.0091 0.00039  0.72

 Item statistics 
      n raw.r std.r r.cor r.drop mean  sd
mq1 500  0.87  0.87  0.81   0.76  6.8 2.1
mq2 500  0.89  0.88  0.83   0.78  6.6 2.2
mq3 500  0.88  0.88  0.83   0.78  6.6 2.0
mq4 500  0.75  0.78  0.64   0.61  6.9 1.7

Non missing response frequency for each item
       1    2    3    4    5    6    7    8    9   10 miss
mq1 0.01 0.02 0.04 0.05 0.11 0.14 0.20 0.20 0.13 0.09    0
mq2 0.02 0.04 0.05 0.06 0.10 0.16 0.19 0.20 0.12 0.07    0
mq3 0.01 0.03 0.05 0.06 0.10 0.17 0.22 0.20 0.09 0.07    0
mq4 0.01 0.01 0.02 0.05 0.10 0.19 0.25 0.22 0.09 0.06    0

Code

alpha(hpoData[,9:12])


Reliability analysis   
Call: alpha(x = hpoData[, 9:12])

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
       0.8      0.81    0.77      0.51 4.2 0.015    7 1.6     0.52

    95% confidence boundaries 
         lower alpha upper
Feldt     0.77   0.8  0.83
Duhachek  0.77   0.8  0.83

 Reliability if an item is dropped:
    raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
lt1      0.75      0.76    0.69      0.51 3.1    0.019 0.0082  0.50
lt2      0.72      0.72    0.65      0.47 2.6    0.022 0.0076  0.50
lt3      0.81      0.81    0.75      0.59 4.3    0.015 0.0030  0.60
lt4      0.72      0.73    0.67      0.48 2.7    0.022 0.0205  0.42

 Item statistics 
      n raw.r std.r r.cor r.drop mean  sd
lt1 500  0.78  0.80  0.71   0.62  6.9 1.9
lt2 500  0.83  0.84  0.78   0.68  7.4 1.9
lt3 500  0.74  0.72  0.56   0.51  6.3 2.2
lt4 500  0.83  0.83  0.75   0.68  7.1 2.0

Non missing response frequency for each item
       1    2    3    4    5    6    7    8    9   10 miss
lt1 0.01 0.02 0.03 0.04 0.10 0.16 0.27 0.20 0.09 0.08    0
lt2 0.00 0.02 0.04 0.02 0.06 0.09 0.21 0.26 0.16 0.14    0
lt3 0.04 0.03 0.04 0.08 0.14 0.14 0.19 0.19 0.10 0.05    0
lt4 0.02 0.01 0.03 0.04 0.08 0.12 0.21 0.25 0.13 0.11    0

Although the output suggests that dropping some of the items would marginally increase the Cronbach’s α values on both factors, there is no compelling reason to do so. Since the items have been taken from a theoretical model that has been tested in many settings, dropping items would make our findings less comparable to similar studies.

3.3.2 Step 2 - Generate New Variables as Sum Scores

The Cronbach’s α scores for the three sets of items are OK. We can proceed to generate sum scores, as follows.

Code

hpoData$ci_avg <- rowMeans(hpoData[,1:4 ])
hpoData$mq_avg <- rowMeans(hpoData[,5:8 ])
hpoData$lt_avg <- rowMeans(hpoData[,9:12])
summary(hpoData[,13:15]) # the new variables are in columns 13 to 15

     ci_avg          mq_avg          lt_avg      
 Min.   : 1.00   Min.   : 1.00   Min.   : 1.000  
 1st Qu.: 5.25   1st Qu.: 5.75   1st Qu.: 6.000  
 Median : 6.50   Median : 7.00   Median : 7.000  
 Mean   : 6.33   Mean   : 6.73   Mean   : 6.955  
 3rd Qu.: 7.50   3rd Qu.: 8.00   3rd Qu.: 8.000  
 Max.   :10.00   Max.   :10.00   Max.   :10.000

In our data set, we do not have any missing data. If you do have missing data, then you have to think of a strategy to deal with it! If some respondents miss out on one or two items then the missing data can be somehow imputed. If they miss out on most items, it is probably best to delete these cases altogether.

We are now ready to run our regressions, either using a function for regression analysis or SEM.

3.3.3 Step 3 - Regression Analysis

The variables to be used in regression analysis have been added to our data set, in the previous step.

We use R to perform a multiple regression analysis, regressing CI (the dependent variable) on the regressors MQ and LT

3.3.3.1 Regression Analysis Using lm()

In R, we use the lm() function to do the regression analysis.

For easy interpretation and comparison, we use the lm.beta() function from the QuantPsyc package to give us the standardized coefficients which are obtained after standardizing all variables to have a mean of 0 and a standard deviation of 1.

Code

hpoReg <- lm(ci_avg ~ mq_avg + lt_avg, data=hpoData)
summary(hpoReg)


Call:
lm(formula = ci_avg ~ mq_avg + lt_avg, data = hpoData)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7253 -0.5859  0.1047  0.7145  3.7088 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.84675    0.23315   3.632 0.000311 ***
mq_avg       0.35334    0.03959   8.924  < 2e-16 ***
lt_avg       0.44648    0.04292  10.403  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.114 on 497 degrees of freedom
Multiple R-squared:  0.5406,    Adjusted R-squared:  0.5388 
F-statistic: 292.5 on 2 and 497 DF,  p-value: < 2.2e-16

Code

# install.packages(QuantPsyc) # if needed
library(QuantPsyc)
lm.beta(hpoReg)

   mq_avg    lt_avg 
0.3701947 0.4315377

The regression coefficients turn out to be positive as expected, and significant.

The coefficient of determination, R², is .5406, indicating that 54% of the variance of CI is explained by the model.

3.3.3.2 Regression Analysis Using SEM

Our aim is to replicate the above results using SEM, which should be possible if regression analysis is indeed a special case of SEM. What makes it special is that we have neither a measurement model with latent variables, nor any indirect effects.

SEM models can be estimated using the lavaan package.

Essentially, we first have to translate model as we have drawn it in the diagram into equations. Since our regression model is simple, the code in lavaan is straightforward:

Code

# install.packages("lavaan")
library(lavaan)
hpoRegLavaan <- '
  # measurement model
  # regressions
    ci_avg ~ mq_avg + lt_avg
'

hpoRegLavaanFit <- sem(hpoRegLavaan, data = hpoData)
summary(hpoRegLavaanFit, standardized=TRUE)

lavaan 0.6-19 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         3

  Number of observations                           500

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ci_avg ~                                                              
    mq_avg            0.353    0.039    8.951    0.000    0.353    0.370
    lt_avg            0.446    0.043   10.434    0.000    0.446    0.432

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .ci_avg            1.234    0.078   15.811    0.000    1.234    0.459

Note that the lines starting with “#” are just comment lines which can be left out. Below the “# measurement model” comment, there is no code since we do not have latent variables measured by observed variables. We only have “observed variables” which in this case are sum scores of truly observed variables.

The output of both regression analyses is extensive. We will focus on the key statistics:

The (unstandardized and standardized) regression coefficients.
The significance of the regression coefficients.
The R², indicating how much of the variance of the dependent variable is explained by the model.

The table below summarizes the information.

The unstandardized and standardized regression coefficients in both approaches are identical. While regression analysis uses a t-test for the significance of the coefficients, SEM uses a z-test. If the sample size is large, like here, then the difference between the tests is small. The coefficients are significantly different from zero.

The model explains 54% of the variance in the dependent variable (CI). We can deduce that information from the variance of the error term of CI (or ci_avg, to be precise) in the SEM output.

SEM indeed produces the same results as regression analysis 😅 !

Warning

Maybe the use of different tests for the significance of the regression coefficients worries you.

Since SEM models are usually way more complicated than regression models, testing the significance of the coefficients follows a different approach.

Some experts actually warn against the use of the z-test, as you see in this video!

4 A SEM model

Now that we are a bit familiar with the output of SEM, and how it relates to the output of regression analysis, we can proceed to applying SEM to the more complex model we wish to estimate.

We include:

A measurement part: latent variables (for CI, MQ, and LT), and the 12 observed variables (items) that measure them.
In the structural part, a path from MQ to LT (and hence, an indirect effect of MQ on CI, via LT).

Code

hpoSEM <- '
  # measurement model
    CI =~ ci1 + ci2 + ci3 + ci4
    MQ =~ mq1 + mq2 + mq3 + mq4
    LT =~ lt1 + lt2 + lt3 + lt4
  # regressions
    CI ~ MQ + LT
    LT ~ MQ
'

hpofit <- sem(hpoSEM, data = hpoData)
summary(hpofit, standardized = TRUE)

lavaan 0.6-19 ended normally after 34 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        27

  Number of observations                           500

Model Test User Model:
                                                      
  Test statistic                               163.619
  Degrees of freedom                                51
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  CI =~                                                                 
    ci1               1.000                               1.373    0.698
    ci2               1.162    0.075   15.410    0.000    1.595    0.754
    ci3               1.237    0.072   17.204    0.000    1.698    0.858
    ci4               1.025    0.065   15.804    0.000    1.407    0.776
  MQ =~                                                                 
    mq1               1.000                               1.703    0.819
    mq2               1.076    0.050   21.500    0.000    1.832    0.839
    mq3               1.044    0.046   22.494    0.000    1.778    0.869
    mq4               0.665    0.043   15.483    0.000    1.133    0.653
  LT =~                                                                 
    lt1               1.000                               1.410    0.759
    lt2               1.091    0.062   17.688    0.000    1.538    0.805
    lt3               0.867    0.072   12.036    0.000    1.223    0.559
    lt4               1.071    0.065   16.370    0.000    1.510    0.746

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  CI ~                                                                  
    MQ                0.231    0.061    3.804    0.000    0.287    0.287
    LT                0.571    0.081    7.015    0.000    0.586    0.586
  LT ~                                                                  
    MQ                0.667    0.044   15.035    0.000    0.805    0.805

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .ci1               1.979    0.142   13.956    0.000    1.979    0.512
   .ci2               1.927    0.146   13.216    0.000    1.927    0.431
   .ci3               1.032    0.100   10.276    0.000    1.032    0.264
   .ci4               1.312    0.102   12.830    0.000    1.312    0.398
   .mq1               1.426    0.115   12.405    0.000    1.426    0.330
   .mq2               1.415    0.120   11.817    0.000    1.415    0.296
   .mq3               1.028    0.097   10.644    0.000    1.028    0.246
   .mq4               1.731    0.119   14.599    0.000    1.731    0.574
   .lt1               1.464    0.115   12.709    0.000    1.464    0.424
   .lt2               1.285    0.111   11.564    0.000    1.285    0.352
   .lt3               3.293    0.223   14.800    0.000    3.293    0.688
   .lt4               1.813    0.140   12.946    0.000    1.813    0.443
   .CI                0.572    0.083    6.884    0.000    0.303    0.303
    MQ                2.900    0.268   10.823    0.000    1.000    1.000
   .LT                0.700    0.096    7.290    0.000    0.352    0.352

It is good practice to present the main results of the analysis in a path diagram.

Of course, you can draw the diagram yourself, e.g., in PowerPoint. Always follow the main rules:

Latent variables appear in ovals, observed variables in rectangles.
For the measurement part of the model, the arrows (normally) point from latent variables to observed variables.
For the structural part of the model, the arrows run from causes (e.g., MQ in our example) to effects (here, LT and CI).
Endogenous variables have an error variance, shown in small ovals/circles. In order to avoid clutter, the error variances are sometimes left out.

The figure below gives the SEM-diagram.

We produced the diagram in STATA. The big advantage of STATA is that you can draw the diagram in a so-called GUI (Graphical User Interface), and run the analysis directly from the diagram. Apart from the convenience (no coding needed), this assures that the estimated model and the graphical representation are aligned.

You can check for yourself that the results of the analysis in STATA are the same as in lavaan.

The results indicate that:

The coefficients are positive, and significant.
Apart from the direct effect of MQ on CI, there is a strong indirect effect of MQ on CI via LT. The total effect of MQ on CI is the sum of the direct effect plus the indirect effect. While the direct effect (.29) is directly visible from the diagram, the indirect effect has to be computed. There are options to do so within the code, but it is easy to do it by hand: the indirect effect of MQ on CI via LT is the product of the effect of MQ on LT, times the effect of LT on CI (.80*.59 = .47). The total effect of MQ on CI then is .47+.29=.76, of which the largest part (.47/.76 = 62%) is indirect.
The error term (or unexplained variance) of CI is .30, implying that the remainder (70%) of the variance of CI is explained by the model. Likewise, 65% of LT is explained by the model. MQ is an exogenous variable, and has no error term.

5 Goodness-of-Fit Statistics

How well does the model fit our data?

SEM uses several goodness-of-fit measures.

SEM starts from a covariance matrix (a matrix of variances and covariances) of the variables in the model. This matrix is used to estimate all parameters in the model, based on an estimation method (mostly the maximum likelihood method).

In reverse, the estimated parameters imply a set of variances and covariances which then can be compared to the starting matrix.

The closer the starting and implied covariance matrices are, the better the model fit.

Over time, statisticians have developed a variety of goodness-of-fit measures based on the above principle, and guidelines for what makes up a good model.

An excellent although somewhat dated overview, is provided by Hooper et al, 2008.

5.1 Chi-Square

Chi-Square (χ²) is the traditional measure for model fit. It reflects the difference between the starting covariance matrix and the one implied by the model. A good model has a χ² close to zero. Since we hope to find a small value for χ², it is sometimes called a badness-of-fit statistic.

χ² is sensitive to sample size. It easily rejects the model if the sample is large.

Some prefer to use the relative version of this meaure, which divides χ² by the degrees-of-freedom. There is no consensus on an acceptable value for this statistic. Recommendations range from 2 to 5.

χ² is considered less relevant as a goodness-of-fit measure, but it is important in testing the significance of parameters and in improving the model, as we will see in the next sections.

5.2 RMSEA

The Root Mean Square Error of Approximation (RMSEA) is based on the same principle of comparing the observed covariance matrix to the implied one. It is considered informative because of its sensitivity to the model complexity. Low values of RMSEA indicate a good model fit. Values close to .06 or an upper limit of .07 indicate a good model fit.

An advantage of the RMSEA is that it comes with a confidence interval. In a well-fitting model the lower limit of the interval is close to 0 and the upper limit is below .08.

A related measure is p_close, the probability that RMSEA<.05. The higher p_close the better!

5.3 SRMR

The Standardized Root Mean Squared Residual (SRMR) measures how close the model comes to reproducing all correlations, on average. An SRMR of, say, .05 indicates that the model manages to reproduce correlations to .05 differences.

Values for well fitting models have an SRMR<.05, but values as high as .08 are acceptable.

5.4 CFI

Comparative fit indices compare χ² to a baseline model.

The Comparative Fit Index (CFI) assumes that all latent variables are uncorrelated and compares the sample covariance matrix with this null model.

The values of the CFI for this statistic can range from 0 to 1, with values close to 1 indicating a good fit. A CFI of .95 or higher indicates a good fit.

5.5 Reporting Goodness-of-Fit

Hooper et al (2008) recommend to report:

χ² including the degrees of freedom and the p-value.
RMSEA and its confidence interval, and/or p_close.
SRMR.
CFI.

A well-fitting model has the following goodness-of-fit scores:

Goodness-of-Fit Measure	Values	Comment
χ² (df), p-value	p>.05	A low χ² relative to the degrees-of-freedom (df). Models are easily rejected in case of large samples.
χ²/df	<5	No clear cut-off point. Values below 5 are considered acceptable.
RMSEA	<.07	.07 is considered an upper limit. A value of .03 or lower indicates a good fit.
p_close	>.05	If the value is larger than .05, then the hypothesis of an RMSEA close to .05 (indicative of a well-fitting model) is accepted.
SRMR	<.08
CFI	>.95

The Annex shows a helpful video on reporting SEM results.

5.6 Goodness-of-Fit Statistics in Our Example

In our example, we can obtain these and other goodness-of-fit statistics as follows.

Code

summary(hpofit, fit.measures=TRUE)

lavaan 0.6-19 ended normally after 34 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        27

  Number of observations                           500

Model Test User Model:
                                                      
  Test statistic                               163.619
  Degrees of freedom                                51
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                              3446.768
  Degrees of freedom                                66
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.967
  Tucker-Lewis Index (TLI)                       0.957

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -10991.910
  Loglikelihood unrestricted model (H1)     -10910.101
                                                      
  Akaike (AIC)                               22037.820
  Bayesian (BIC)                             22151.615
  Sample-size adjusted Bayesian (SABIC)      22065.915

Root Mean Square Error of Approximation:

  RMSEA                                          0.066
  90 Percent confidence interval - lower         0.055
  90 Percent confidence interval - upper         0.078
  P-value H_0: RMSEA <= 0.050                    0.009
  P-value H_0: RMSEA >= 0.080                    0.026

Standardized Root Mean Square Residual:

  SRMR                                           0.033

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  CI =~                                               
    ci1               1.000                           
    ci2               1.162    0.075   15.410    0.000
    ci3               1.237    0.072   17.204    0.000
    ci4               1.025    0.065   15.804    0.000
  MQ =~                                               
    mq1               1.000                           
    mq2               1.076    0.050   21.500    0.000
    mq3               1.044    0.046   22.494    0.000
    mq4               0.665    0.043   15.483    0.000
  LT =~                                               
    lt1               1.000                           
    lt2               1.091    0.062   17.688    0.000
    lt3               0.867    0.072   12.036    0.000
    lt4               1.071    0.065   16.370    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  CI ~                                                
    MQ                0.231    0.061    3.804    0.000
    LT                0.571    0.081    7.015    0.000
  LT ~                                                
    MQ                0.667    0.044   15.035    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ci1               1.979    0.142   13.956    0.000
   .ci2               1.927    0.146   13.216    0.000
   .ci3               1.032    0.100   10.276    0.000
   .ci4               1.312    0.102   12.830    0.000
   .mq1               1.426    0.115   12.405    0.000
   .mq2               1.415    0.120   11.817    0.000
   .mq3               1.028    0.097   10.644    0.000
   .mq4               1.731    0.119   14.599    0.000
   .lt1               1.464    0.115   12.709    0.000
   .lt2               1.285    0.111   11.564    0.000
   .lt3               3.293    0.223   14.800    0.000
   .lt4               1.813    0.140   12.946    0.000
   .CI                0.572    0.083    6.884    0.000
    MQ                2.900    0.268   10.823    0.000
   .LT                0.700    0.096    7.290    0.000

Summarizing the results:

Goodness-of-Fit Measure	Value
χ² (df); p-value	χ²(51) = 163.6; p=0.00 (poor fit)
χ²/df	163.6 / 51 = 3.2 (good)
RMSEA	.066 (interval .055 - .078). Acceptable or good.
p_close	P(RMSEA<.05) = .009 (the probability that the RMSEA is close to that of a well-fitting model, is small). P(RMSEA>.08) = .026 (the probability that the RMSEA is higher than a poorly fitting model is also small)
SRMR	.033 (good)
CFI	.967 (good)

Considering or remarks on χ² as a measure of fit, the key statistics look OK.

6 Improving the Model

Since our hypothesized is supported by the data, there is no need to make amendments.

Should the model not fit the data well, then we can dig deeper.

Although the model is not overly complicated, we have excluded a lot of potential paths!

For example, our latent variable CI is only measured by the four items ci1 to ci4, and not by any of the other observed variables (items). We assume, for the validity of our measurements, that the effect of latent variables MQ and LT on the responses ci1 to ci4 is zero.

By fixing a lot of such paths to zero, we reduce the number of parameters to be estimated, thereby increasing the degrees-of-freedom. By adding one path to the model, we will produce a better fit at the cost of giving up a degree of freedom.

We can use trial-and-error to add paths to the model, but in relatively complex models this would be a tedious job.

Luckily, SEM software can do the job for us, using so-called modification indices option. The result of this option gives an estimate of the reduction in χ² by adding a path to the model.

Warning

It is highly recommend not to add paths blindly in search of a good model fit. There has to be a clear (theoretical) justification to add a path!

In many cases, browsing through the modification indices output will give you an idea of items that for some reason cause problems.

Rather than adding paths that are hard to explain or justify, it is often better leave out problematic items.

Although there is no need to modify our model, we will illustrate the modification indices below.

Code

modindices(hpofit, sort = TRUE, maximum.number = 8)

We have opted for a maximum of 8 paths, in the output. By sorting (in descending order) on the size of the modification indices (mi, in the output), the function returns the paths which - if added - lead to the highest reductions in χ².

In the table we encounter something that we have not discussed before. Some pairs of variables are linked by a double tilde (“~~”). The double tilde signifies a covariance between variables or between error terms. For example, if we would add a covariance term between the error terms of mq1 and mq2, then χ² would be reduced by an estimated 22, down from 163.

But why would we add this covariance? Apart from the fact that our model performs fine, the issue is probably caused by the fact that responses to items mq1 and mq2 differ from the other mq items.

Item lt4 appears thrice in the list of 8 paths, and in relation to latent variables (CI) and measurements (mq4) that are not connected theoretically. If lt4 confuses respondents or can be considered a measurement of other latent variables in addition to LT, then it may be wise to drop the item. It is best to base such decisions on reasoning rather than on the wish to increase the model fit!

Tip

Measurement using multiple item scales can get very complicated. Some of the issues and how to treat them in SEM are well-documented (Goedegebuure and Adhikari 2023).

7 SEM the User-Friendly Way

We have covered the most important elements of SEM, and explained the key terms.

Everything you have learned in this module should enable you to apply SEM to your own data, using the software of your preference.

There is a lot of powerful software available for SEM.

We will go over some of the software packages, and their pros and cons.

7.1 SEM Using STATA or AMOS

In terms of user-friendliness, some commercial software packages are the best.

Like R, commercial software like STATA and IBM SPSS cover a broad range of basic and advanced statistical techniques.

STATA and Amos (of IBM SPSS) can be used for SEM via Graphical User Interfaces (GUIs). The GUIs allow you to focus on the contours of the model and thereby avoid error-prone and tedious coding. The coding is done behind the scenes.

In addition, commercial software have excellent tutorials to help you on the way.

The main disadvantage is that commercial software is expensive.

7.2 SEM Using R/lavaan

It is fairly easy to run SEM using the lavaan package in R.

One of the good things about lavaan are the excellent documentation and tutorials, including the easy-to-follow examples.

A drawback is that it is harder to obtain publication-quality path diagrams - let alone that you can fit your model from a GUI like in STATA or AMOS. That is, coding cannot be avoided.

There are packages that allow you to produce path diagrams with the main results of the analysis, which is good but adds to the burden of coding.

7.3 SEM Using JASP

An alternative is to use JASP. JASP can be downloaded for free.

JASP positions itself as a fresh way to do statistics - which is a fair claim! JASP is definitely designed with the user in mind. It can perform a wide range of descriptive and inferential statistics, and is a great option for most researchers and for most statistical challenges.

JASP can run SEM-models with limited coding, and the option to generate a high-quality path diagram.

It makes use of R and lavaan but in the background.

The only coding that comes in, is the lavaan-like model specification.

The measurement part of the model specifies the latent variables and their measurements, in (following our example) CI =~ c1+c2+c3+c4 specification.
The structural part follows the regression notation of R: CI ~ MQ+LT
In some cases, it makes sense to add covariances (or correlations, in the standardized version) to error terms. Covariances are specified by a double tilde (~~).

You can refer to the lavaan website and tutorials, for examples of model specifications.

After reading your data into JASP, you only have to specify the model which follows the rules of lavaan. Optionally, the output contains a SEM diagram.

The following video shows you how to run SEM models in JASP.

The data, and the JASP-file, can be found on GitHub.

Once you have downloaded JASP you can download and open the file, and inspect the input and the output. Feel free to play around with the various options, and maybe to adapt the model!

Some things to be aware of:

SEM does not show up in the standard menu of JASP. You can add SEM and many other techniques to the menu, easily.
After loading your data to JASP, you have to check that the variables to be used in SEM are (interval or ratio) scale data, rather than nominal or ordinal data. SEM gives an error message when your variables are not scale (i.e., nominal or ordinal).
The SEM-diagram is great, but in the version of JASP used in this module the coefficients are the unstandardized ones. The standardized coefficients do appear in the results, but cannot be printed in the diagram. Hopefully this will be fixed in next versions of the software.

7.4 SEM Using Jamovi

There’s another tool which is remarkably similar to JASP, called Jamovi.

SEM can be added to Jamovi’s menu. And, like JASP, Jamovi also makes use of R and lavaan, in the background.

There are slight differences between SEM in JASP and Jamovi.

SEM in Jamovi can be done interactively, implying even less coding.
The SEM diagram of Jamovi does offer the option to print standardized estimates, which helps.
Jamovi - at least in the interactive option - forces users to make a distinction between exogenous and endogenous (latent) variables, but the use of the terms is inconsistent, or even irrelevant. Anyway, it’s confusing.
While JASP can handle path models (without latent variables) easily, it is probably possible but not very clear how to run path models in Jamovi the interactive mode.

You can download Jamovi here.

After downloading Jamovi, you can download and open the Jamovi file with the data and SEM analysis, from Github.

The screenshot below includes the only lines of lavaan-like coding that is required to run the model. The structural part of the model has to be specified in the custom model settings.

Any covariances between error terms can be added in the variances and covariances section. For the sake of the example, we have added a covariance between the error terms of mq1 and mq2, which (according to the output of lavaan, JASP, and Jamovi) would reduce the χ² by 22, down from 164.

By adding one parameter, we decrease the degrees-of-freedom by one, from 51 to 50. Apart from an improvement in all other goodness-of-fit statistics as well, the regression coefficients are slightly different from the earlier model.

8 Annexes

8.1 SEM in R/lavaan (including path diagrams)

8.2 SEM in JASP

8.3 SEM in Jamovi

Part 1:

Part 2:

8.4 Reporting SEM Results

References

Goedegebuure, Robert, and Manorama Adhikari. 2023. “Factor Analysis: Dealing with Response Bias.” International Journal of Business and Management Research 11 (1): 25–33. https://doi.org/10.37391/ijbmr.110103.

Parasuraman, A., Valarie A. Zeithaml, and Leonard L. Berry. 1985. “A Conceptual Model of Service Quality and Its Implications for Future Research.” Journal of Marketing 49 (4): 41–50. https://doi.org/10.1177/002224298504900403.

Tavakol, Mohsen, and Reg Dennick. 2011. “Making Sense of Cronbach’s Alpha.” International Journal of Medical Education 2 (June): 53–55. https://doi.org/10.5116/ijme.4dfb.8dfd.

Waal, André de, Robert Goedegebuure, and Eveline Hinfelaar. 2015. “Developing a Scale for Measuring High Performance Partnerships.” Journal of Strategy and Management 8 (1): 87–108. https://doi.org/10.1108/jsma-07-2014-0065.