Cari Blog Ini

Pengikut

Laman

Selasa, 01 Januari 2008

Handling Non-Normal Data in SEM

Handling non-normal data in structural equation modeling (SEM)

Question:
I am having trouble getting my hypothesized structural equation model to fit my data. Someone told me that non-normal data are a problem for SEM models; this person suggested using the generalized least-squares (GLS) estimator to fit my model instead of the default maximum likelihood (ML) estimator. What is the best way to handle non-normal data when fitting a structural equation model?

Answer:
The hypothesis tests conducted in the structural equation modeling (SEM) context fall into two broad classes: tests of overall model fit and tests of significance of individual parameter estimate values. Both types of tests assume that the fitted structural equation model is true and that the data used to test the model arise from a joint multivariate normal distribution (JMVN) in the population from which you drew your sample data. If your sample data are not JMVN distributed, the chi-square test statistic of overall model fit will be inflated and the standard errors used to test the significance of individual parameter estimates will be deflated. Practically, this means that if you have non-normal data, you are more likely to reject models that may not be false and decide that particular parameter estimates are statistically significantly different from zero when in fact this is not the case (type 1 error). Note that this type of assumption violation is also a problem for confirmatory factor analysis models, latent growth models (LGMs), path analyses, or any other type of model that is fit using structural equation modeling programs such as LISREL, EQS, AMOS, and PROC CALIS in SAS.

How can you correct for non-normal data in SEM programs? There are three general approaches used to handle non-normal data:
1. Use a different estimator (e.g., GLS) to compute goodness of fit tests, parameter estimates, and standard errors
2. Adjust or scale the obtained chi-square test statistic and standard errors to take into account the non-normality of the sample data
3. Make use of the bootstrap to compute a new critical chi-square value, parameter estimates, and standard errors

Estimators
Most SEM software packages offer the data analyst the opportunity to use generalized least-squares (GLS) instead of the default maximum likelihood (ML) to compute the overall model fit chi-square test, parameter estimates, and standard errors. Under joint multivariate normality, when the fitted model is not false GLS and ML return identical chi-square model fit values, parameter estimates, and standard errors(Bollen, 1989). Recent research by Ulf H. Olsson and his colleagues, however, (e.g., Olsson, Troye, & Howell, 1999) suggests that GLS underperforms relative to ML in the following key areas:

1. GLS accepts incorrect models more often than ML
2. GLS returns inaccurate parameter estimates more often than ML

A consequence of (2) is that modification indices are less reliable when the GLS estimator is used. Thus, we do not recommend the use of the GLS estimator. A second option is to use Browne's (1982) Asymptotic Distribution Free (ADF) estimator, available in LISREL. Unfortunately, the use of ADF requires sample sizes that exceed at least 1000 cases and small models due to the computational requirements of the estimation procedure. As Muthén (1993) concludes, "Apparently the asymptotic properties of ADF are not realized for the type of models and finite sample sizes often used in practice. The method is also computationally heavy with many variables. This means that while ADF analysis may be theoretically optimal, it is not a practical method" (p. 227).

For these reasons, the standard recommendation is to use the ML estimator (or one of the variants described below) when fitting a model to data that are drawn from a population with variables that are assumed to be normally and contiuously distributed in the population from which you drew your sample. By contrast, if your variables are inherently categorical in nature, consider using a software package designed specifically for this type of data. Mplus is one such product. It is uses a variant of the ADF method mentioned previously, weighted-least squares (WLS). WLS as implemented by Mplus for categorical outcomes does not require the same sample sizes as does ADF for continuous, non-normal data. Further discussion of the WLS estimator is beyond the scope of this FAQ; interested readers are encouraged to peruse Muthén, du Toit, and Spisic (1997) and Muthén (1993) for further details.

Robust scaled and adjusted Chi-square tests and parameter estimate standard errors A variant of the ML estimation estimation approach is to correct the model fit chi-square test statistic and standard errors of individual parameter estimates. This approach was introduced by Satorra and Bentler (1988) and incorporated into the EQS program as the ml,robust option. The ml,robust option in EQS 5.x provides the Satorra-Bentler scaled chi-square statistic, also known as the scaled T statistic that tests overall model fit. Current, West, and Finch (1996) found that the scaled chi-square statistic outperformed the standard ML estimator under non-normal data conditions. Mplus also offers the scaled chi-square test and accompanying robust standard errors via the estimator option mlm. Mplus offers also offers a similar test statistic called the Mean and Variance adjusted chi-square statistic via the estimator option mlmv.

An adjusted version of the scaled chi-square statistic is presented in Bentler and Dudgeon (1996). Fouladi (1998) conducted an extensive simulation study that found that this adjusted chi-square test statistic outperformed both the standard ML chi-square and the original scaled chi-square test statistic, particularly in smaller samples. Unfortunately, the adjusted test statistic is not available in EQS 5.x.

The robust approaches work by adjusting, usually downward, the obtained model fit chi-square statistic based on the amount of non-normality in the sample data. The larger the multivariate kurtosis of the input data, the stronger the applied adjustment to the chi-square test statistic. Standard errors for parameter estimates are adjusted upwards in much the same manner to reduce appropriately the type 1 error rate for individual parameter estimate tests. Although the parameter estimate values themselves are the same as those from a standard ML solution, the standard errors are adjusted (typically upward), with the end result being a more appropriate hypothesis test that the parameter estimate is zero in the population from which the sample was drawn.

Bootstrapping
The robust scaling approach described above adjusts the obtained chi-square model fit statistic based on the amount of multivariate kurtosis in the sample data. An alternative method to deal with non-normal input data is to not adjust the obtained chi-square test statistic and instead adjust the critical value of the chi-square test. Under the assumption of JMVN and if the fitted model is not false, the expected value of the chi-square test of model fit is equal to the model's degrees of freedom (DF). For example, if you fit a model that was known to be true and the input data were JMVN and the model had 20 DF, you would expect the chi-square test of model fit to be 20, on average. On the other hand, non-normality in the sample data can inflate the obtained chi-square to a value that exceeds DF, say 30. The robust scaled and adjusted chi-square tests mentioned in the previous section work by lowering the value of the obtained chi-square to correct for non-normality. For instance, in this example a reasonable value for the robust scaled or adjusted chi-square might be 25 instead of 30. Ideally, the adjusted chi-square would be closer to 20, but the adjustments are not perfect.

Bootstrapping works by computing a new critical value of the chi-square test of overall model fit by computing a new critical chi-square value. In our example, instead of the JMVN expected chi-square value of 20, a critical value generated via the bootstrap might be 27. The original obtained chi-square statistic for the fitted model (e.g., 30) is then compared the bootstrap critical value (e.g., 27) rather than the original model DF value (e.g., 20). A p-value based upon the comparison of the obtained chi-square value to the bootstrap-generated critical chi-square value is then computed.

How is the bootstrap critical chi-square value generated? First, the input data is assumed to be the total population of responses and the bootstrap program draws samples, with replacement, of size N from this pseudo-population repeatedly. For each drawn sample, the input data are transformed to assume that your fitted model is true. This step is necessary because the critical chi-square value is computed from a central chi-square distribution; a central chi-square distribution assumes the null hypothesis is not false. The same assumption is made when you use the standard ML chi-square to test model fit: the obtained chi-square is equal to the model DF when the null hypothesis is not rejected.

Next, the model is fit to the data and the obtained chi-square is output and saved. This process is repeated across each of the bootstrap samples. At the conclusion of the bootstrap sampling, the bootstrap program collects the chi-square model fit statistics from each sample and computes their mean value. This mean value becomes the critical value for the chi-square test from the original analysis.

The procedure detailed above is credited to Bollen and Stine (1993) and is implemented in AMOS. AMOS allows the data analyst to specify the number of bootstrap samples drawn (typically 250 to 2000 bootstrap samples) and it outputs the distribution of the chi-square values from the bootstrap samples as well as the mean chi-square value and a Bollen-Stine p-value based upon a comparison of the original model's obtained chi-square with the mean chi-square from the bootstrap samples.

AMOS also computes individual parameter estimates, standard errors, confidence intervals, and p-values for tests of significances of individual parameter estimates based upon various types of bootstrap methods such as bias-correction and percentile-correction. Mooney and Duval (1993) and Davison and Hinkley (1997) describe these methods and their properties whereas Efron and Tibshirani (1993) provide an introduction to the bootstrap. Fouladi (1998) found in a simulation study that the Bollen-Stine test of overall model fit performed well relative to other methods of testing model fit, particularly in small samples.

Cautions and notes
One of the corollary benefits of the bootstrap is the ability to obtain standard errors and therefore p-values for quantities for which normal theory standard errors are not defined, such as r-square statistics. A primary disadvantage of the bootstrap and the robust methods mentioned previously is that they require complete data (i.e., no missing data are allowed). Use of the bootstrap method requires the data analyst to set the scale of latent variables by fixing a latent variable's value to 1.00 rather than by fixing the corresponding factor's variance value to 1.00because under the latter scenario bootstrapped standard error estimates may be artificially inflated by switching positive and negative factor loadings across bootstrap samples (Hancock & Nevitt, 1999).

if you have any questions, send E-mail to stats@its.utexas.edu

Tidak ada komentar: