1 Introduction

Measurement invariance is a desired property of scales scores generated with measurement models fitted onto polytomous items responses, to allow comparisons among groups (Tse, Lai, & Chang, 2024). These groups often include meaningful factors such as sex, age, the language of the test or survey, and the participating country, among other groups of interest. Ideally, latent variable models can be used to provide evidence (or the lack of it) that comparisons among these groups can be made on the generated scale scores.

There are different approaches to assess if the assume comparability is tenable. The most popular approaches include differential item functioning (DIF), and factorial invariance (Thissen, 2024), two model based strategies to assess comparability among groups. The simplest version of differential item functioning is uniform DIF. Uniform DIF consist of the study of item location parameters conditional to group membership of the observations. DIF is a procedure to produce evidence that the expected response onto items, is equivalent among groups when the level of the attribute is the same among the contrasting groups. Thus, if two (or more) groups of the same attribute level do differ on the expected responses to an item, this scenario is taken as evidence of DIF (e.g., Wu et al., 2016). There are other forms of DIF such as non-uniform DIF that consist of group interactions involving, not only item location parameters, but also item slopes (or factor loadings depending on the measurement model parameterization) (Meulders & Xie, 2004), and more complex forms of DIF that account for variability on the measurement model parameters, not only due to groups (i.e., categorical variables), but conditional to continuous variables (Bauer, 2016).

On the other hand, factorial invariance or measurement invariance is also a model-based strategy, where different measurement models’ specifications are fitted onto the groups of interest, to assess the equivalence of the measurement model parameters besides group latent mean (Millsap, 2011). As such, comparisons of interest among groups includes not only location item paratemers, but also factor loadings, and residuals or item uniqueness (Kline, 2023). Traditionally, a descriptive model is specified (i.e. configural model), and a sequence of more constrained models are fitted, where different measurement model parameters are held equal among groups till the most equivalent model specification is fitted where factor loadings, item location and item uniques parameters are held equal, and only latent mean differences are allowed to vary (e.g., Dimitrov, 2010).

Current developments on factorial invariance, specifically developments in confirmatory factor analysis (CFA) fitted onto ordinal indicators have recommendations, that depart from traditional CFA measurement invariance (Wu & Estabrook, 2016; Svetina, Rutkowski & Rutkowski, 2020; Tse, Lai & Chang, 2024). Two points of contention are of special relevance for the present guideline. These are the order in which the different model specifications should be fitted; and if the different invariance model specifications described for CFA with continuous indicators are applicable for measurement models with ordinal indicators.

The present guideline is focused on the model specifications included in the library(rd3c3). This is an R library, with a collection of different wrapper functions that helps to speed up the process of writing syntax to fit different latent variable models onto large scale assessment studies. library(rd3c3) follows the work of Wu & Estabrook (2016) on model identification and starts first with item threshold invariance as the first step to build a model sequence to assess measurement invariance. It also follows Svetina et al. (2020), Tse et al. (2024) and Grimm et al. (2016) on model specification with delta parameterization, to fit scalar, and strict invariance model solutions.

Additionally, library(rd3c3) provides wrapper functions to fit alignment methods. These are invariance model specification that relaxes the models parameter constrains among groups in search of the least discrepant model estimates across the compared groups (Muthén & Asparouhov, 2014).

The present guideline is structure as follows: we first review measurement invariance model specifications with the graded response model (section 1); we then proceed to discuss the limitations of partially invariant models (section 2); we described the library in general terms (section 3); we provide a full example of invariance analysis using the library (section 4); and finally we close the present guidelines with a section of intended uses (section 5).

1.1 References

Bauer, D. J. (2016). A More General Model for Testing Measurement Invariance and Differential Item Functioning. Psychological Methods. https://doi.org/10.1037/met0000077

Dimitrov, D. M. (2010). Testing for Factorial Invariance in the Context of Construct Validation. Measurement and Evaluation in Counseling and Development, 43, 121–149. https://doi.org/10.1177/0748175610373459

Kline, R. B. (2023). Principles and Practice of Structural Equation Modeling (5th ed.). Guilford Press.

Millsap, R. E. (2011). Statistical Approaches to Measurement Invariance. New York, NY: Routledge.

Meulders, M., & Xie, Y. (2004). Person-by-item predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory Item Response Models (pp. 213–240). Springer New York. https://doi.org/10.1007/978-1-4757-3990-9_7

Muthén, B., & Asparouhov, T. (2014). IRT studies of many groups: the alignment method. Supplemental Material. Frontiers in Psychology, 5, 23–31. https://doi.org/10.3389/fpsyg.2014.00978

Svetina, D., Rutkowski, L., & Rutkowski, D. (2020). Multiple-Group Invariance with Categorical Outcomes Using Updated Guidelines: An Illustration Using Mplus and the lavaan/semTools Packages. Structural Equation Modeling: A Multidisciplinary Journal, 27(1), 111–130. https://doi.org/10.1080/10705511.2019.1602776

Thissen, D. (2024). A Review of Some of the History of Factorial Invariance and Differential Item Functioning. Multivariate Behavioral Research, 0(0), 1–25. https://doi.org/10.1080/00273171.2024.2396148

Tse, W. W. Y., Lai, M. H. C., & Zhang, Y. (2024). Does strict invariance matter? Valid group mean comparisons with ordered-categorical items. Behavior Research Methods, 56(4), 3117–3139. https://doi.org/10.3758/s13428-023-02247-6

Wang, J., & Wang, X. (2020). Confirmatory Factor Analysis. In Structural Equation Modeling: Applications Using Mplus (pp. 33–117). John Wiley & Sons, Inc. https://doi.org/10.4324/9781315832746-25

Wu, H., & Estabrook, R. (2016). Identification of Confirmatory Factor Analysis Models of Different Levels of Invariance for Ordered Categorical Outcomes. Psychometrika, 81(4), 1014–1045. https://doi.org/10.1007/s11336-016-9506-0

Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational Measurement for Applied Researchers. Springer Singapore. https://doi.org/10.1007/978-981-10-3302-5