Tuesday, 7 August 2018

Structural Equation Modeling and Its Implementation in R


Structural Equation Modeling is a quantitative research technique. Which is used for following things-

1) Causal Modeling /Path Analysis - I always get confused with correlation and causation. This simple example from abs.gov site helps me always-

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect. 

Theoretically, the difference between the two types of relationships are easy to identify — an action or occurrence can cause another (e.g. smoking causes an increase in the risk of developing lung cancer), or it can correlate with another (e.g. smoking is correlated with alcoholism, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation.

SEM finds causal relationship among latent variables and observed variables. This relationship also gives path analysis. (path analysis is used to describe the directed dependencies among a set of variables)

2) Confirmatory Factor Analysis- It is used to test whether measures of a factor are consistent with a researcher's understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. Constructs are generally a set of questions representing a factor.


3)  Partial least squares path modeling-  allows estimating complex cause-effect relationship models with latent variables

4) Latent growth modeling – used to estimate to estimate growth trajectory. It is a longitudinal analysis technique to estimate growth over a period of time. It is widely used in the field of behavioral science, education and social science

Symbols to understand structured relationship


There is a Lavaan package in R which is still in beta release( 08/2018). Let’s use this package to rum SEM model. In our example, we will use the built-in PoliticalDemocracy dataset. Measured variables are survey questionnaire that are constructed to capture a factor( latent variable). So SEM would give multiple relationship between these variables( latent and observed)  included in the model. 

The figure below contains a graphical representation of the model that we want to fit

Explanation of above image can be explained -
  •     x1, x2, x3, y1, y2, y3, y4, y5, y6, y7 and y8 are measured variables (MV).
  •       ind60, dem60 and dem65 are latent variables (LV).
  •       ind60 has direct relationship with x1, x2, x3 MVs and dem60, dem65 LVs
  •       dem60 has direct relationship with y1, y2, y3, y4 MVs and dem65 LV
  •       dem65 has direct relationship with y5, y6, y7 and y8 MVs
  •       y1 has correlation with y5 which is not explained by their latent variables.
  •       y2 has correlation with y4 and y6 which is not explained by their latent variables.
  •     y3 has correlation with y7 which is not explained by their latent variables.
  •       y4 has correlation with y8 which is not explained by their latent variables.
  •        y6 has correlation with y8 which is not explained by their latent variables.   

Complete code to run SEM in R-  ( data -PoliticalDemocracy is available with lavaan package itself) 


library(lavaan) # only needed once per session
model <- '
  # measurement model
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
'
fit <- sem(model, data=PoliticalDemocracy)
summary(fit, standardized=TRUE)
After running the above code we will get fitted regression model for all the variables for the relation we specified in Structured model. result can be related to  linear regression where p values less that .5 is considered strong relationship between variables.


To know about graphical explanation of regression assumption- Graphical analysis of regression assumption

To know about the relation of time series with regression- Regression and Time Series










No comments:

Post a Comment