Determining the dimensionality and gender invariance of the MACE work-to-family enrichment scale using bifactor and approximate invariance tests

Copyright: © 2021. The Authors. Licensee: AOSIS. This work is licensed under the Creative Commons Attribution License. Introduction Over the past few decades, work–family research has been dominated by the conflict perspective (Greenhaus & Beutell, 1985) according to which the fulfilment of multiple work and family roles leads to experiences of conflict and stress and their concomitant detrimental effects (Eby, Casper, Lockwood, Bordeaux, & Brinley, 2005). The conflict perspective has also been the focus of most work–family studies conducted in Africa (Dubihlela & Dhurup, 2013; Koekemoer, Mostert, & Rothmann, 2010; Mostert, 2011; Opie & Henn, 2013). However, because of the growing attention given to positive psychology, international work–family researchers have come to realise that resources may be generated when multiple roles are occupied, resulting in positive outcomes for employees, organisations and families (Greenhaus & Powell, 2006; Voydanoff, 2002). As a result, international scholars, organisations and human resource practitioners increasingly focus on the positive aspects of the work–family interface. Nevertheless, the number of studies emphasising this positive interaction between work and family within the South African context is limited (De Klerk, Nel, Hill, & Koekemoer, 2013; Jaga, Bagraim, & Williams, 2013).


Introduction
Over the past few decades, work-family research has been dominated by the conflict perspective (Greenhaus & Beutell, 1985) according to which the fulfilment of multiple work and family roles leads to experiences of conflict and stress and their concomitant detrimental effects (Eby, Casper, Lockwood, Bordeaux, & Brinley, 2005). The conflict perspective has also been the focus of most work-family studies conducted in Africa (Dubihlela & Dhurup, 2013;Koekemoer, Mostert, & Rothmann, 2010;Mostert, 2011;Opie & Henn, 2013). However, because of the growing attention given to positive psychology, international work-family researchers have come to realise that resources may be generated when multiple roles are occupied, resulting in positive outcomes for employees, organisations and families (Greenhaus & Powell, 2006;Voydanoff, 2002). As a result, international scholars, organisations and human resource practitioners increasingly focus on the positive aspects of the work-family interface. Nevertheless, the number of studies emphasising this positive interaction between work and family within the South African context is limited (De Klerk, Nel, Hill, & Koekemoer, 2013;Jaga, Bagraim, & Williams, 2013).
'extent to which experiences in one role improve the quality of life (namely performance and affect) in the other role ' (p. 73). The main premise of their theory is that the generation of resources is a crucial driver for the enrichment process and that resources can be transferred from one domain to another, resulting in increased performance and affect in the receiving role (Greenhaus & Powell, 2006).
Based on this well-known model of Greenhaus and Powell (2006), Carlson, Kacmar, Wayne and Grzywacz (2006) developed their work-family enrichment scale (WFES), which, although widely used, has been criticised for not reflecting all the facets of resources that the WFE model proposes and for containing double-barrelled items (i.e. conveying different elements instead of single ideas) (Carlson, Grzywacz, & Zivnuska, 2009).
To improve on this well-known international instrument, WFES, De Klerk et al. (2013) developed the MACE WFE instrument using a South African sample and obtained initial validation for their instrument (MACE is an acronym for the names of the authors). The MACE WFE instrument consists of two distinct bidirectional scales that can be used independently of each other, namely the MACE work-tofamily enrichment scale (MACE-W2FE) and the MACE family-to-work enrichment scale . The distinction made between the two bidirectional scales is consistent with international WFE literature (Carlson et al., 2006;Frone & Yardley, 1997). In this article, we focus on the more widely used MACE-W2FE.
De Klerk et al. (2013), based on their conceptualisation of the WFE construct as: [T]he extent to which various resources from work and family roles have the capacity to encourage an individual and to provide positive experiences, and thereby enhance that individual's quality of life in the other role (i.e. performance and positive affect). (p. 4) Included items in the MACE-W2FE that reflected four categories of resources gained, namely perspectives, affect, social capital and time management.
However, previous studies show the MACE-W2FE's fourdimensional model might not be sufficiently supported by the data as evident in the dimensionality variations of the MACE-W2FE reported across studies. De Klerk, Nel and Koekemoer (2015) and Van Zyl (2020) reported that data supported a correlated four-dimensional factor model, whereas in other studies (Koekemoer, Strasheim, & Cross, 2017;Marais, De Klerk, Nel, & De Beer, 2014), a correlated four-dimensional measurement model was reported, but a good fitting second-order (SO) factor model was used to alleviate multicollinearity in the exogenous part of a structural equation model (SEM). The SO factor models showed that a strong common factor underlies the MACE-W2FE, which can be consistent with an approximate unidimensional factor model with trivial group-specific factors or a general factor underlying substantive groupspecific factors. Koekemoer et al. (2017) and Marias et al. (2014) did not indicate clearly which of the former or latter assumptions applied to the SO factor model reported. They argued that in the presence of multicollinearity, a goodfitting SO factor model justified the use of a single-aggregated variable in the exogenous part of an SEM model when supported in theory. However, according to Chen, West and Sousa (2006), the use of SO factor model often goes unchallenged or is glossed over in SEM studies and not helpful in resolving the dimensionality question. Yet, in another study, the MACE-W2FE subscale scores formed the manifest indicators to a single latent factor that was incorporated in an SEM model with external variables (Koekemoer, Olckers, & Nel, 2020). Similar dimensionality vacillations were reported for Carlson et al.'s (2006) WFES, indicating the possible existence of a common problem (Jiang & Men, 2017;Rastogi, Karatepe, & Mehmetoglu, 2018;Russo, Buonocore, Carmeli, & Guo, 2018;Siu et al., 2015;Timms et al., 2015).
Pertaining to the above-mentioned dimensionality issues, Garrido, González, Seva and Piera (2019) warn about treating substantively multidimensional scores as unidimensional (e.g. a single latent factor). Such factor scores are expected to lead to biased item parameter estimates and loss of information where they cannot be univocally interpreted. On the contrary, when factor scores can be univocally interpreted, treating the items as substantively multidimensional leads to factors of little theoretical interest and unclear interpretations. In conclusion, there is a clear need for clarity over the MACE-W2FE's dimensionality for it could impact negatively on the validity of score inferences and WFE theory development in general.
A bifactor model analysis can effectively resolve model dimensionality uncertainties because a bifactor model is theoretically consistent with a correlated first-order multidimensional factor model and a SO factor model (Rodriguez, Reise, & Haviland, 2016a). Where a strictly unidimensional model is rejected by model fit indices, bifactor analysis is useful in determining the strength of the general factor that underlies a multicomponent measure and the strength of each component after controlling for the common factor. A multidimensional measure may be assumed where one or more components show sufficient strength in terms of reliable variance. An approximate or essentially unidimensional (i.e. a single breadth factor) measure may be assumed to the extent that the factor score is univocal with ignorable biasing effects of the multidimensional components (Rodriguez et al., 2016a). Local item misspecification analysis allows for the evaluation of the extent to which misspecifications show ignorable biasing effects on the factor score of an assumed essentially unidimensional model.
Furthermore, findings about differences in gender group experiences of WFE have been contradictory, and therefore more research on the topic is much needed (Rothbard, 2001;Van Steenbergen, Ellemers, & Mooijaart, 2007). Moreover, gender studies require the MACE-W2FE to show at least approximate measurement invariance for gender groups.
We argue that the strong emphasis on 'golden rules' for goodness-of-fit proposed by Hu and Bentler (1999) in deciding model fit, in the absence of an in-depth analysis of the measurement model, is the likely reason for the different measurement models used in the WFES and MACE-W2FE studies (Greiff & Heene, 2017;McNeish, An, & Hancock, 2018;Ropovik, 2015). Solely relying on confirmatory factor analysis (CFA), goodness-of-fit indices without additional analyses has proved to be ineffective in determining the dimensionality of a measure (Rodriguez et al., 2016a). In order to contribute to existing WFE literature and resolve the MACE-W2FE dimensionality and invariance issues, we followed a: [S]ubstantive-methodology synergy approach where methodological advances [i.e. in-depth analyses techniques and alignment optimisation] are applied to substantive areas of research in order to obtain more precise answers to complex questions. (Marsh & Hau, 2007, p. 152).
For our study, we formulated the following research questions: (1) Are the MACE-W2FE's subdimensions substantively unique constructs? or (2) Is the MACE-W2FE an essentially unidimensional construct? (3) Is the MACE-W2FE second-order model theoretically plausible and clearly interpretable? (4) Can the MACE-W2FE be considered an approximate invariant measure to use across gender groups?
In seeking answers to the research questions, we demonstrated the usefulness of what we called 'extended CFA analyses' which included the following: bifactor testing, local indicator misspecification analysis, and approximate measurement invariance testing.
Our study aimed to contribute to work-family literature by providing rigorous evidence relating to the dimensionality and scale invariance of the MACE-W2FE within a South African sample.
Firstly, we discuss the substantive issues regarding the WFE theoretical framework, the development of the MACE instrument and related validity evidence. Thereafter, we discuss the methodological issues of CFA in testing model dimensionality, the use of extended analysis to resolve the MACE-W2FE's dimensionality vacillations and approximate invariance testing.

Substantive issues: Theoretical background and the development of the MACE instrument Theoretical background
In recent years, numerous researchers have shown interest in the measurement of WFE because of the realisation that organisations stand to benefit from recognising and accommodating employees' work-life needs (Shockley & Singla, 2011). Various models or frameworks to explain WFE have been put forward and the most prominent theories on which they are based are the theory of role accumulation (Sieber, 1974), the resource-gain-development perspective (Wayne, Grzywacz, Carlson, & Kacmar, 2007) and the workhome resources model (Ten Brummelhuis & Bakker, 2012).
When considering the theory of role accumulation, the literature attempts to explain how participation in multiple roles can produce positive outcomes for individuals, by putting forth three notions. The first notion being that work experiences and family experiences can have additive effects on wellbeing. In this sense, the argument is being made that individuals who participate in -and are satisfied withwork and family roles experience greater well-being than those who are dissatisfied with one or more of their roles. The second view researchers use to describe role accumulation is the idea that participation in both work and family roles can buffer individuals from distress in one of the roles. This notion dates back to the work of Sieber (1974) which stated that individuals who accumulate roles may compensate for failure in one role by falling back on gratification in another role. The third explanation put forward for role accumulation is that the experiences in one role can produce positive experiences and outcomes in the other role. It is also this specific explanation which Greenhaus and Powell (2006) utilised when developing their well-cited model of WFE. According to these authors this third mechanism best captures the concept of WFE as 'the extent to which experience in one role improve the quality of life in the other role' (p. 73).
When considering the resource-gain-development perspective (Wayne et al., 2007), the basic premise is that individuals have natural tendencies to grow and develop. When individuals engage in a role, they obtain resources so that they can experience positive gains. When gains from one domain are applied, sustained and reinforced in another, the end results are improved system functioning or facilitation.
It is against this backdrop that Greenhaus and Powell (2006) developed their WFE model. Work-family researchers agree that this model is one of the most comprehensive and systematic models of all that explains within-domain and cross-domain effects (Zhang, Xu, Jin, & Ford, 2018). As mentioned earlier, the generation of resources is crucial in the enrichment process. The main premise of the WFE model of Greenhaus and Powell (2006) is that the resources acquired in one role can enrich the other role through instrumental and/or affective paths. According to Greenhaus and Powell, a resource is an asset that may be drawn on when needed to solve a problem or cope with a challenging situation. Their WFE model identifies five types of resources that can be generated in a role: • Skills and perspectives: Skills refer to a broad set of task-related cognitive and interpersonal skills, coping skills, multi-tasking skills, and knowledge and wisdom derived from role experiences. Perspectives involve ways of perceiving or handling situations which, in short, allow one to expand one's 'world view'. • Psychological and physical resources: These include positive self-evaluations such as self-efficacy, self-esteem, personal hardiness, positive emotions about the future (e.g. optimism and hope) and physical health. • Social-capital resources: There are two social-capital resources -influence and information -and they are derived from interpersonal relationships in work and family roles that may assist individuals in achieving their goals. • Flexibility: This refers to the discretion to determine the timing, pace and location of meeting role requirements. • Material resources: These include money and gifts obtained whilst fulfilling work and family roles.
Various instruments were developed based on the Greenhaus and Powell's (2006)  De Klerk et al. (2013) found no differential item functioning between gender groups across the full set of items in the MACE-W2FE using Rasch modelling techniques. They also found that gender groups' mean score for all items on the MACE-W2FE did not differ.  (Hu & Bentler, 1999). It also showed high correlations between dimensions ranging from 0.54 to 0.63, suggesting shared variance ascribed to a common factor. Koekemoer et al. (2017)

Methodological issues: Extending confirmatory factor analysis for evaluating scale dimensionality Dimensionality issues
Theory testing in the social sciences is commonly associated with testing competing CFA measurement models (Marsh & Hau, 2007;Strauss & Smith, 2009). Routinely, a unidimensional measure model is tested, followed by the testing of multidimensional models, as dictated by plausible theoretical conceptualisations of the construct of interest. However, accepting the results of CFA analyses at face value is potentially dangerous. For example, a one-factor model (see Figure 1a) containing numerous items and allowing large degrees of freedom hardly ever describes real data and is routinely rejected based on the results of statistical model fit indices (Bentler, 2009). When applying theory, the prospect of finding a perfectly unidimensional model in assessment data is nil (Reise, Moore, & Haviland, 2010). In contrast, when the same data are subjected to correlated first-order CFA models (see Figure 1b), the multidimensional model will almost always be supported (Reise et al., 2010). Correlated first-order factor models often deceptively show good model fit and salient group-specific factors. The deception of a good-fitting correlated first-order factor model is created by a substantive general factor running amongst all the items and by the differentiating effect of parallel item wording or method artefacts (Reise et al., 2010). Highly correlated group-specific factors in firstorder factor models rarely reflect unique and substantive factor variance after partialling out the common variance from a substantive general factor (Rodriguez et al., 2016a). Cattell and Tsujioka (1964) argue that without skilful factor analysis, detecting pseudo-specific group factors consisting of narrow bloated specifics or systematic biases is hard. Bloated specifics with little substance are common occurrences in published scales and are difficult to detect (Rodriguez et al., 2016a).
Where the factors in multidimensional models correlate strongly, researchers often adopt a SO factor model (see Figure 1c) (Chen et al., 2006), especially when multicollinearity may be a concern, as was the case with the MACE studies in Koekemoer et al. (2017) and Marais et al. (2014). Gignac (2016) acknowledges that the SO (i.e. higher order) model is the only model where hypothesis with respect to the association between group-level factors and the general factor can be tested. However, a SO factor model constrains the first-order item loadings to be equal within each factor and is known as the proportionality constraint. Imposed proportionality constraints on items in a SO factor model may represent an unnatural and difficult-to-interpret model solution despite obtaining good model fit according to conventional standards (Gignac, 2016). Gignac (2016) argued that: [E]mpirically and theoretically, researchers may find it difficult to explain why the nature of the general factor in a second-order model is such that each and every item within a specific factor can contribute variance to the general factor and the specific factors' residual in a perfectly equal proportional manner. (p. 65) Moreover, SO factor models do not give clear answers on the extent to with a measure is unidimensional versus multidimensional (Rodriguez et al., 2016a).
Unless researchers are mindful of the dimensionality issues that we have pointed out and the limitations of global fit indices, it may result in defective measurement models being accepted as close-to-fitting. Hayduk (2014) urges researchers to do a diagnostic assessment of models before accepting a model as sufficiently supported by the data. We now turn our attention to bifactor and local indicator misspecification analyses techniques that can be applied for diagnosing and resolving of dimensionality issues.

Analyses to resolve dimensionality issues Bifactor analysis
Bifactor modelling (see Figure 1d) allows researchers to simultaneously investigate unidimensionality and multidimensionality by placing the common factor and group factors on an equal conceptual footing to compete for item variance (Reise et al., 2010). The bifactor model specifies that each item simultaneously explains a portion of a common factor and a portion of a single group factor (Reise et al., 2010). Bifactor modelling is an effective technique for resolving if a measure is essentially unidimensional or distinctly multidimensional. Obtaining clarity about the dimensionality of a measurement model can assist in avoiding multicollinearity problems when SEM or another form of multiple regression analysis with external variables is used (Rodriguez et al., 2016a).
Unlike SO factor models, the proportionality constraint does not apply in bifactor models and the item parameter estimates are freely estimated for both the general factor and the specific group factor. Where the data violate the proportionality constraint in the SO factor model, the bifactor model will always show a better fit that corresponds to the degree of violation (Gignac, 2016). Whereas the items in a SO factor model only indirectly affect the general factor via the specific group factor, the items in a bifactor model have a direct effect on both the general factor and the group factor.
Supporting bifactor strength indices (see the 'Methodology' section for the details on strength indices) can be applied to evaluate the extent to which the model supports essential unidimensionality and the plausibility of unique multidimensional factors after partialling out the common factors' variance (Reise et al., 2010;Rodriguez et al., 2016a;Rodriguez, Reise, & Haviland, 2016b). Unique multidimensional factors in bifactor models are also known as residualised factors (i.e. factors that show common variance after removal of the general factor's variance) (Rodriguez et al., 2016b).
Open Access

Local item misspecification analyses
Researchers warn against overreliance on simplistic global model fit indices when determining the dimensionality of measures in the social sciences (Greiff & Heene, 2017) because models in this field are always simplifications of reality because of the imperfect nature of data; consequently, these models are always misspecified to some extent (Saris, Satorra, & Van Der Veld, 2009). Therefore, it is important to supplement global fit indices with local item misspecification analyses to avoid substantively irrelevant misspecifications leading to rejecting a model or overlooking substantively relevant misspecifications in model acceptance (Saris et al., 2009). According to Sellbom and Tellegen (2019), correlated residuals are very important sources of misspecification in CFA models and should be examined to avoid biased results when evaluating global model (mis)fit. GF, general factor; P, work-family perspectives; A, work-family affect; TM, work-family time management; SC, work-family social capital; SOF, second-order factor. Note that the diagrams are only intended to be illustrative of the different model structures and providing larger labels would make models too large to present.

Approximate invariance testing
Proven measurement invariance and scalar invariance are prerequisites for making valid statistical conclusions about scale mean differences of groups under varied conditions (Sass, 2011). The alignment method for multiple-group CFA can be used to compare factor means and variance of groups without requiring exact measurement invariance (Asparouhov & Muthén, 2014). The conventional multiple-group CFA without the alignment method is inclined to be too strict in the identification of non-invariant parameters, leading to a series of model adaptions that may be data-specific or misspecified (De Bondt & Van Petegem, 2015). In the alignment method, measurement invariance is estimated without the need to constrain factor loadings and intercepts to being equal, for the optimal measurement invariant pattern is effectively discovered through alignment optimisation. The alignment optimisation procedure applies a simplicity function that works like the rotation criteria in exploratory factor analysis and retains the unrestricted configural model (model zero) but minimises noninvariance without compromising model fit. According to Asparouhov and Muthén (2014), up to 25% parameters may be non-invariant without adversely impacting on the reliable comparison of the factor means of groups. In other words, the alignment method does not require all differences in factor loadings (measurement invariance) and intercepts (scalar invariance) to be strictly zero before valid factor mean comparisons for groups can be made. The imperfect nature of item responses is a reality in the social sciences and this imperfection affects invariance and theory testing in SEM, but it can be accommodated through innovations such as the alignment optimisation method that allows for approximate measurement invariance (Asparouhov & Muthén, 2014).

Current study
In an attempt to stimulate future work-family studies from Africa, we investigated the dimensionality of the MACE W2FE instrument and gender invariance using extended CFA analysis techniques. Based on our literature discussion, we present the following hypothesis with respect to the South African sample surveyed: H1a: We hypothesised an essentially unidimensional measurement model for the MACE-W2FE (see Figure 1a and 1e).

H1b:
We further hypothesised that the multidimensional elements of the MACE-W2FE are not distinct and substantive constructs (see Figure 1b and 1d).

H2:
We also hypothesised that the proportionality constraints for the SO factor model for the MACE-W2FE have been violated (see Figure 1c).
H3: Finally, we hypothesised that the MACE-W2FE will show approximate configural, measurement and scalar invariances for gender groups.
Furthermore, we demonstrated the use and value of bifactor (see Figure 1d) and local indicator misspecification analyses in resolving the MACE-W2FE's dimensionality vacillations and we tested approximate gender invariance at different levels of measurement using the alignment optimisation technique.

Research design
Using a quantitative cross-sectional research design, we collected survey data to investigate the dimensionality, invariance and model specifications of the MACE-W2FE.

Research sample
Cross-sectional survey data were obtained from a convenience study sample (N = 786) of South African employees from industry sector such as mining, engineering, IT, manufacturing, finance and education. The majority of the sample consisted of Caucasian (86%) female employees (70%), of whom 67% was married, 85% had children and 14% was single. Of the sample, 50% possessed a degree or a postgraduate degree.
We used an anonymous web-based survey to obtain respondents' biographical information and to administer the MACE work-family instrument. We informed the participants that their participation was voluntary, and we obtained their informed consent. Ethical approval was obtained from the Research Ethics Committee of the relevant higher education institution.

Analyses
We used the Mplus Statistical Software Version 8.3 and the maximum likelihood estimation method with robust standard errors (MLR) to test the measurement models included in this study. The MLR compensates for deviations from the multivariate normality assumption associated with Likert-type scales Schmitt, 2011). To achieve the purpose of the study, we tested all the CFA models depicted within Figure 1, namely an essentially unidimensional model (Figure 1a), a four-factor model (Figure 1b), a SO factor model (Figure 1c), a bifactor model ( Figure 1d) and an essentially unidimensional model with method artefacts (Figure 1e). The model depicted in Figure  1a was used to test the gender invariance of the MACE-W2FE. We assessed the sample size adequacy for the purposes of the analyses using Kaiser-Meyer-Olkin measure of sampling adequacy (KMO) and Bartlett's test of sphericity (KMO > 0.70, p < 0.01) (Cerny & Kaiser, 1977).
To evaluate the plausibility of the CFA models, we used the chi-square goodness-of-fit test (χ 2 , p < 0.05), the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA) and standardised root mean square residual (SRMR) as global indices of model fit. Model fit, according to CFI and TLI indices, is considered acceptable and good when exceeding 0.90 and 0.95, respectively. The RMSEA and SRMR values of less than 0.05 and 0.08, respectively, reflect a close fit and a reasonable fit to the data (Hu & Bentler, 1999;Marsh, Hau, & Wen, 2004). In addition, we used the Akaike information criterion (AIC) to compare alternative models, whereby the model with the lowest AIC value is the better model. As indicators of a significant difference in model fit where nested models are compared (Chen, 2007), we relied on changes greater than 0.01 on CFI, TLI and RMSEA, and a statistically significant (p < 0.01) adjusted χ 2 with the Satorra-Bentler scaling correction formula (Chen, 2007).
We adopted the notion that the results of global model (mis) fit indices are preliminary and require an evaluation of local parameter misspecifications (which are a source of model misfit) before final conclusions on model fit can be made (Marsh et al., 2004). We used Jrule software for Mplus to evaluate the local parameter misspecifications on the correlated residuals (Oberski, 2009), being the most important source of misspecification in measurement models. According to the Saris-Satorra-Van der Veld approach (Saris et al., 2009), the statistically overly sensitive modification indices (MI) should be considered alongside Cohen's (1992) criterion for sufficient statistical power (1β > 0.80). Substantive local misspecification is evident in the presence of a statistically significant (p < 0.05) modification index and low statistical power (1β < 0.80). However, when the modification index is statistically significant (p < 0.05) and statistical power is high (1β > 0.80), the expected parameter changes (EPCs) for that indicator need to be outside the range of -0.10 to 0.10, to be considered substantively relevant. In the latter case, where EPC is small (e.g. within the range of -0.10 to 0.10), it can be concluded that no relevant misspecification is prevalent that deviates substantively from zero.
We used the bifactor model to evaluate the distinctiveness of the specific or group factors and the plausibility of an essential general factor for the MACE-W2FE. The bifactor analysis of measures is a good choice where both correlated factors and SO CFA models show a good fit (Reise, 2012). To determine whether the data sufficiently supported a distinct first-order group-factor model or whether a unidimensional model could be assumed, we used a variety of factor strength indices applicable for evaluating bifactor models (Reise et al., 2010;Reise, Bonifay, & Haviland 2013;Reise, Scheines, Widaman, & Haviland, 2013;Rodriguez et al., 2016aRodriguez et al., , 2016b. Detailed definitions, formulas and discussions of the factor strength indices are beyond the scope of this article and are available in Reise et al. (2010 and Rodriguez et al. (2016aRodriguez et al. ( , 2016b. These indicators were the following: explained common variance (ECV); McDonald's (1999) omega reliabilities; omega (ω), omega hierarchical (ωH/ωHS), construct replicability (H), factor determinacy (FD) and percentage of uncontaminated correlations (PUCs). We used the absolute average relative parameter bias (ARPB) index at factor level and ARPB-I at item level to evaluate bias on factor loadings attributed to factor misspecifications.
An ARPB of below 10% -15% between the factor loadings of the common factor of a bifactor model and a unidimensional model can be considered non-substantive suggesting an essentially unidimensional model where ECV, PUC and ωH values are 0.70 or higher. Percentage of uncontaminated correlations moderates ECV when considered concurrently. When PUC is high (> 0.80) and ECV is as low as 0.50, essential unidimensionality may still apply. However, when PUC is lower (< 0.80), ECV should be greater than 0.60 (e.g. PUC = 0.70 and ECV = 0.70) and ωH should be greater than 0.70 to assume essential unidimensionality. Where H and FD 2 are equivalent and exceed 0.80, its essential unidimensionality is supported. Factor determinacy should exceed 0.90 before the use of factor scores instead of latent variables in an SEM model is justified and H exceeding 0.80 suggests good factor replicability. However, H and FD can be bloated by very narrow factors or bloated specifics and should be interpreted with caution. H and FD 2 values exceeding 0.70 could signify plausible group factors or subscales. A minimum value of 0.50 and preferably closer to 0.75 for ωHS suggest a substantive group factor and multidimensionality.
By comparing the nested bifactor model to the SO factor model, we determined if proportionality constraints had been violated in the SO factor model, and we relied on model change statistics to confirm significant violations (Yung, Thissen, & McLeod, 1999).
Finally, we estimated the approximate invariance of the MACE-W2FE for gender groups using MLR estimation and the alignment optimisation method (Asparouhov & Muthén, 2014). In alignment optimisation, the configural invariance model is used as the baseline model. Next, we conducted the factor loading and intercept invariance tests where the total amount of non-invariance is minimised using a simplicity function for every pair of groups and for every intercept and loading using a component loss function from EFA rotations (Muthén & Asparouhov, 2018).

Ethical consideration
The approval is subject to the researcher abiding by the principles and parameters set out in the application and research proposal in the actual execution of the research. The approval does not imply that the researcher is relieved of any accountability in terms of the Codes of Research Ethics of the University of Pretoria if action is taken beyond the approved proposal. If during the course of the research it becomes apparent that the nature and/or extent of the research deviates significantly from the original proposal, a new application for ethics clearance must be submitted for review.

Results
In this section, we present a summary of the descriptive statistics and the results of estimating the CFA models (onefactor, four-factor, second-order factor and bifactor).

Descriptive statistics
The descriptive statistics showed item scores that varied between 3.5 and 3.9, the average score being 3.70. The standard deviations varied between 0.73 and 1.08; the mean deviation being 0.83. The item skewness varied between -1.01 and -0.50; the mean skewness being -0.72. The item kurtosis varied between -0.40 and 1.3; the mean being 0.35. The data signified a good approximation of the normal distribution (skewness and kurtosis between -1 and +1). The KMO and Bartlett's test of sphericity for sample size adequacy were, respectively, 0.94 and p < 0.000. Therefore, the sample size was considered adequate (KMO > 0.70, p < 0.01) to continue with the CFA analyses.

Estimated confirmatory factor analysis models
As shown in Table 1, all the global fit indices (i.e. CFI, TLI, RMSEA and SRMR) did not support a unidimensional (one-factor) model when using the golden rules for model fit (Hu & Bentler, 1999;Marsh, Hau, & Wen, 2004). The onefactor structure given in Table 2 can be considered well defined (λ = 0.63-0.81; mean (M) = 0.70). Cronbach's alpha reliability for the one-factor model was 0.94. However, the correlated four-factor model showed a good fit on all the indices, but the sample size sensitive χ 2 was significant (p < 0.01). As indicated in Table 2, we obtained a well-defined factor structure for the four-factor model with overall high loadings ( The bifactor CFA model (see Table 2) was further analysed using the appropriate factor strength indicators. The H indicator suggested that the GF was well defined ( The Jrule for Mplus analysis on the unmodified one-factor model showed a total of 16% (24/153) substantive correlated residuals exceeding an EPC of 0.10, of which only eight-item pairs (5%) exceeded an EPC of 0.10 on a 95% confidence interval (see Figure 2 for a depiction of ranked correlated residuals). Interestingly, the 24 substantive correlated residuals were all item pairs located within the 'My family life is improved by maintaining good relationships with my colleagues' vs. sc3: 'My family life is improved by having good relationships at work'). These measurement method artefacts were consequently specified as unconstrained correlated residuals in the one-factor model (see Figure 1e). The global fit indices improved significantly to obtain marginal to reasonable goodness of fit (see Table 1). The values obtained for the correlated residuals in the one-factor model were all statistically significant and had a moderate-to-large effect size (r = 0.41-0.54) (see Table  2). Values exceeding 0.20 should be regarded as noticeable and values around 0.30 as important in terms of classical test theory (Muthén & Asparouhov, 2012). The residual factors in the bifactor model can mostly be explained as itemspecific method artefacts (e.g. 'Ask the same question and get the same answer'). All of the remaining correlated   --------77  ------CFA, confirmatory factor analysis; P, work-family perspectives; A, work-family affect; TM, work-family time management; SC, work-family social capital; δ, residual variance; 1-fact, one-factor model; 1-fact †, one-factor model with correlated residuals; Corr. resid., correlated residuals; GF, general factor; ARPB-I, item level parameter bias; ARPB, absolute mean relative parameter bias; ω, coefficient omega; SOF, second-order factor loadings; H, construct replicability; FD, factor determinacy; ωHS, coefficient omega hierarchical; ECV, explained common variance; PUC, percentage of uncontaminated correlations. †, One-factor model with the eight freed correlated residuals (method artefacts). *, Significant loadings (p < 0.01). residuals (i.e. 146) showed trivial misspecifications with EPC values within the range of -0.10-0.10 after freeing the eight misspecified item pairs with the highest correlated residuals in the model (see Figure 3 for a depiction of ranked correlated residuals). Freeing the correlated residuals for the eight most important item-specific method artefacts had a trivial effect on the factor loadings of the one-factor model (see Table 2) and a large effect on the model fit indices (see Table 1). Thus, demonstrating the sensitivity of the model fit indices for the eight most important model misspecifications, ascribed to bloated specifics in the highly restricted unidimensional model. Freeing the correlated residuals improved the model's overall factor loading bias (ARPB) from 0.07 to 0.06 (see Table 2). However, specifying method factors are preferred over correlated residuals for they explicitly estimate construct-irrelevant sources of variance where correlated residuals simply partial them out (Morin, Katrin Arens, & Marsh, 2016). The group-specific factors in bifactor model effectively represent method factors in this study. Items p2-p6's ARPB-I values varied between -0.10 and -0.13 (M =-0.12), causing the most factor loading bias in the unidimensional model. These items from the workfamily perspectives factor shared unique variance not shared by the remaining items in the one-factor model. However, the factor strength indices showed that the unique variance was trivial and insufficient to be interpreted substantively as a distinct factor and could therefore be included as part of the model without biasing the score interpretations.
Overall, the evidence suggested that model misfit in the highly restricted one-factor CFA model could be attributed mainly to the cumulative and combined effect of trivial substantive multidimensionality, item-specific method artefacts and random noise (i.e. white noise) ascribed to imperfect indicators typically obtained in self-report questionnaire data (Asparouhov & Muthén, 2017). Thus, suggesting a plausible and parsimonious model was being rejected (i.e. type 1 error) by the goodness-of-fit indices because of large numbers of trivial model misspecifications aggravated by a large sample size.
In conclusion, adopting the more parsimonious essentially unidimensional factor model (i.e. the one with higher degrees of freedom = 135) for the MACE-W2FE instead of the more complex bifactor model (i.e. the one with lower degrees of freedom = 117) that contains an unbiased general factor can be considered justified and of practical value for applied researchers. Irrespective of showing weak global model fit, the MACE-W2FE onefactor model showed negligible bias and can be used with confidence in subsequent SEM modelling with external variables . The results showed that a rejection of the unidimensional one-factor CFA model based on the values of global model fit indices alone would have been unjustified.

Gender invariance testing
The one-factor models tested for invariance represented a well-identified CFA model with high factor loadings for each gender group (see Table 3). The one-factor loadings after alignment optimisation (see AL column in Table 3) was used for comparison purposes. The global model fit indices for the male group were observably lower than those for the female group, and this could be ascribed to the large difference in sample size (Kyriazos, 2018). The probability of global fit indices rejecting a non-substantively misspecified unidimensional model increases with decreasing sample sizes (Marsh et al., 2004). Having considered the likely cumulative effect of numerous trivial correlated residual misspecifications on model fit, it would be reasonable to conclude that the measurement model sufficiently represented both groups' data. The one-factor scale difference in means (X Females -X Males = 0.027) was negligible and statistically insignificant for gender groups. The factor loading of only one item (i.e. sc1) was flagged as being non-invariant, representing 6% of all items. With such a low level of non-invariance (< 25%), estimating group-specific factor means and variances can be expected to produce accurate results. Excluding the one item from the scale may also be considered without jeopardising scale validity for this item is in one of the item pairs showing a substantive correlated residual and, most likely, item redundancy. The omega reliability statistic (males = 0.934; females = 0.944) and the FD statistic (males = 0.971; females = 0.973) had approximately the same values for each group, showing high reliability and determinacy respectively.

Discussion
This study had three objectives within the South African sample surveyed: determine the dimensionality of the MACE-W2FE, test the scale for gender invariance and demonstrate the usefulness of extended CFA analysis techniques. This study supported hypotheses H1a in that the MACE-W2FE was essentially a unidimensional measurement model. Hypothesis H1b is supported in that the multidimensional elements of the MACE-W2FE are not distinct and substantive constructs. In addition, hypotheses H2 is supported in that the proportionality constraints of the SO factor model for the MACE-W2FE had been violated. Lastly, hypotheses H3 is supported in that the MACE-W2FE would show approximate configural, measurement and scalar invariances for gender groups.
In line with the substantive-methodology synergy framework, we discuss the substantive and methodological findings of this study in the 'Substantive findings' and the 'Methodological findings' sections, respectively. Thereafter we make concluding remarks about the study, refer to the study's limitations and make recommendations for further study.

Substantive findings
The study found that the unidimensional model best represented the general construct of WFE (Greenhaus & Powell, 2006) and can be defined as the extent to which a variety of resources from work and family roles have the capacity to encourage individuals and to provide positive experiences, which enhance the individuals' quality of life (performance and positive affect) in the other role.
It also found that Greenhaus and Powell's (2006) work-role resources (skills, perspectives, psychological, physical and social capital) that affected the family role were reflected in the four dimensions of the MACE-W2FE. The heterogeneous content from the four dimensions was reflected as shared variance in the unidimensional model of the MACE-W2FE and enhanced the construct validity of the scale. Moreover, the evidence suggested that the MACE-W2FE reflected a broader unidimensional construct and not the distinct multidimensional constructs for which it had been developed originally. In conclusion, the data indicated that the MACE-W2FE supported an essentially unidimensional model consisting of a variety of items that reflected the variety of resources proposed in the WFE model of Greenhaus and Powell (2006).
The high intercorrelations between homogeneous item groupings of the four-factor model of the MACE-W2FE suggested the WFE construct might be hierarchical (i.e. manifesting strong common variance for group-specific factors) -a characteristic accepted almost universally as inherent to correlated multifactor psychological constructs (Clark & Watson, 1995). The hierarchical nature of item variances can be ascribed to people's responding to items at multiple conceptual levels (i.e. general and specific levels) (Rodriguez et al., 2016a). As such, WFE can be understood as a general experience directed by particular events or outcomes. Concerning the hierarchical nature of item variances, it may be relevant to note that a researcher, when trying to measure a specific domain of a general construct, faces the challenge that the diversity of the manifestations of the construct in that specific domain diminishes quickly, resulting in the researcher running out of unique questions (Rodriguez et al., 2016a). Therefore, the researcher may include questions that differ little in content. Such subdomain item redundancy has been termed 'bloated specifics' (Cattell, 1978). This study showed that the group-specific factors in the four-factor model of the MACE-W2FE contained little substance after the common variance in the general factor had been partialled out. Such factors, which are (arti)factors with little substance, are common occurrences in published scales (Rodriguez et al., 2016a).
Previous criterion-related validation studies on the MACE also provided support for the plausibility of the MACE-W2FE consisting of an essentially unidimensional construct as opposed to four distinct constructs (De Klerk et al., 2015;Koekemoer et al., 2017;Marais et al., 2014). It was found that the group-specific four-factor model showed moderate correlates (i.e. r = 0.26-0.43) with measures of related constructs (i.e. job satisfaction and other WFE outcomes) (De Klerk et al., 2015). Consequently, researchers may argue that the subscales show differential correlates with related constructs and that they, therefore, show construct uniqueness. This contention is not accurate as any two variables that are not perfectly correlated will show differential correlates with a third variable as each is a mixture of the same general factor and a distinct groupspecific factor (Rodriguez et al., 2016a). The current study indicated that the group-specific factors of the MACE-W2FE showed little construct uniqueness and that any correlates with a third variable could be attributed to the underlying general factor. It would be reasonable to accept that the general factor in a higher order model depicts high levels of common variance shared by all the items in the group-specific factors and therefore shows high criterionrelated correlates. Koekemoer et al. (2017) and Marais et al. (2014) supported this notion by showing that the correlations (i.e. r = 0.50-0.66) between the general factor of the SO factor model of the MACE-W2FE and the third variables (i.e. job satisfaction and other WFE outcomes) were much higher than correlations obtained for the group-specific four-factor model (i.e. r = 0.26-0.43) (De Klerk et al., 2015).
Thus, it would be reasonable to suggest that the essentially unidimensional model of the MACE-W2FE with its underlying general factor is, when compared with the groupspecific four-factor model, a more robust representation of Greenhaus and Powell's (2006) conceptualisation of WFE theory.
The MACE-W2FE unidimensional measurement model clearly reflects Rodriguez et al.'s (2016a) notion that the social sciences can be best served by positing a strong theory for a general construct and having a thorough understanding of the construct and its links to the processes of item responses, thereby ensuring it is measured well.
The current study corroborated the findings of De Klerk et al. (2013) that gender groups were comparable on the MACE-W2FE and showed similar scores. Yet, Van Steenbergen et al. (2007) found that women experienced more WFE than men, whereas Rothbard (2001) found the opposite. It is clear that more studies are needed to obtain clarity about gender differences regarding WFE.

Methodological findings
The bifactor modelling, local indicator misfit analyses and approximate invariance testing proved to be useful tools for understanding the sources of item variances and the psychometric functioning of the proposed multidimensional or unidimensional model of the MACE-W2FE (Rodriguez et al., 2016a). Our study showed how the factor strength indices used in combination with bifactor analyses and local indicator analyses successfully resolved the dimensionality issues of the MACE-W2FE, whereas global CFA fit indices showed limited value in this regard. The data supported an essentially unidimensional model for the MACE-W2FE, although there was a minor element of multidimensionality. Our finding supported the finding of Rodriguez et al. (2016a) that the scores for the 50 measures of the unidimensional (reportedly multidimensional) models they studied were highly resilient to the biasing effects of multidimensionality. It is conceivable that researchers reject an essentially unidimensional model based on model fit indices alone because it contains a mixture of trivial multiple dimensional substantive elements, method artefacts and white noise, which, according to common belief, cause such a model to defy meaningful interpretation. However, Rodriguez et al. (2016a) alluded to the work of Gustafsson and Alberg-Bengtsson (2010) in stating that: [I]t is a myth [that essentially unidimensional models defy meaningful interpretation]: when correlated items are aggregated together, and they all share a single common factor, that the more items that are grouped, the more the total score reflects that common latent variable, regardless of the dimensionality (p. 232). Cronbach (1951) knew this -he demonstrated this principle in his original coefficient alpha paper, which has been widely cited. In addition, Bentler (2009) stated that global fit indices were unlikely to show good model fit for unidimensional CFA models where the number of items was large. A CFA unidimensional model has large degrees of freedom and may be considered a highly restrictive model, but, when compared with an alternative model (i.e. a bifactor model) with lower degrees of freedom, it is the more parsimonious model.
We further showed how thoroughly considering local misspecification information could assist in adjudicating model fit. More specifically, we found that the accumulative effect of trivial correlated residual misspecifications could explain the misfit on the global fit indices for the one-factor model. Moreover, the statistical power of the correlated residual misspecifications was all acceptable (> 0.8), making type 1 error and the need for a verification study sample an issue of lesser concern. After considering all the information, we could make the reasonable conclusion that the one-factor model represented the data reasonably well. This finding is consistent with arguments against the simplistic conceptualisation of the dimensionality of psychological data and the value of global fit indices as a sole means of adjudicating model (mis)fit (Rodriguez et al., 2016a) arguments that deemed especially relevant in the case of highly restrictive unidimensional models consisting of numerous items.
Moreover, evidence was compelling that the eight most important item residual correlates in the one-factor model were method artefacts reflected in the residual factors of the bifactor model and contributed to the four-factor model of the MACE-W2FE being pseudo-specific and deceptive. Some researchers may argue that a good global model fit can be obtained by reducing the number of items in the one-factor model. However, it would be counter-productive to shorten a measure of a broadly defined construct such as WFE simply to comply with goodness-of-fit indices' cut-off criterion (Marsh et al., 2004). This would surely jeopardise the coverage of all the subdomains of importance in a general construct such as WFE.
In addition, we provided strong methodological arguments and empirical evidence that the violation of proportionality constraints and the related challenges associated with score interpretation could make the use of SO factor models of the MACE-W2FE in particular (Koekemoer et al., 2017;Marais et al., 2014) and the WFE in general (Rastogi et al., 2018;Russo et al., 2018) less ideal.
The approximate invariance test technique proved helpful in making valid comparisons between gender groups without having to make questionable model modifications to obtain exact measurement invariance or seek partial invariance, which can be a cumbersome process (Asparouhov & Muthén, 2014).

Practical recommendations
This study showed that an essentially unidimensional measurement model of the MACE-W2FE should be included in further studies on WFE with external variables. However, the essentially unidimensional measurement model's low goodness of fit indices may adversely reflect on SEM models overall model fit. However, the factor strength indicators showed the unidimensional model can be incorporated as an aggregated score in SEM models with negligible biasing effects on regression paths or a reduction in measurement precision. Researchers may also consider forming item parcels through collapsing highly correlated item pairs or triplets from similar content subdomains so as to simplify the one-factor model for use in subsequent SEM analyses (Rodriguez et al., 2016a). Where model complexity and convergence are not an issue in an SEM model with external variables, researchers may consider including the bifactor measurement model and treat the group-specific factors as method factors.

Study limitations
An important limitation of the study is that a convenience sample was used and that the participants were limited to employees in the South African work environment. Therefore, sample homogeneity was promoted at the cost of external validity. A larger and randomly selected sample stretching across nationalities, industries, job types, work conditions and cultures would have better served the purposes of the study.
Furthermore, confirming the dimensionality and gender invariance of the MACE-W2FE does not render it a valid measure of WFE. The MACE-W2FE items may need reviewing for redundancy, and ongoing construct-and criterion-related research will be beneficial for the future use of the measure.

Conclusion
In this study, we thoroughly investigated the MACE-W2FE at different levels of analysis and used various statistical indicators. The rigor of analyses enabled us to make an informed choice about a robust MACE-W2FE measurement model that best reflected the WFE theory.
With this study, we hoped to inspire applied researchers in South Africa to pursue a 'substantive-methodology synergy' approach by utilising advanced statistical tools with the power and flexibility to facilitate an in-depth and thorough analysis of hypothesised measurement models. Such rigor in scientific endeavour can only benefit the quality of the quantitative measures used for research in the management sciences.
Work-family enrichment research is on the increase because WFE has been shown to not only improve people's quality of life but also enhance work engagement, job satisfaction, work vigour, job dedication and general career satisfaction, which all contribute to human performance (De Klerk et al., 2013, 2015Marais et al., 2014;Van Steenbergen et al., 2007). The need for a robust WFE measure backed by strong theory that will allow further studies to be conducted in the field has been well articulated (De Klerk et al., 2013). Finally, the MACE-W2FE appears to be gender invariant, which opens up opportunities for further research on gender differences in the domain of WFE in the future world of work.