A PSYCHOMETRIC ANALYSIS OF THE UTRECHT WORK ENGAGEMENT SCALE IN THE SOUTH AFRICAN POLICE SERVICE 62

The objectives of this research were to validate the Utrecht Work Engagement Scale (UWES) for the South African Police Service (SAPS) and to determine its construct equivalence and bias in different race groups. A cross-sectional survey design was used. Stratified random samples (N = 2396) were taken of police members of nine provinces in South Africa. The UWES and a biographical questionnaire were administered. Structural equation modelling confirmed a 3-factor model of work engagement, consisting of Vigour, Dedication and Absorption. These three factors have acceptable internal consistencies. Exploratory factor analysis with target rotations showed equivalence of the three factors for different race groups in the SAPS. No evidence was found for uniform or non-uniform bias of the items of the UWES for different race groups. OPSOMMING Die doelstellings van hierdie navorsing was om die Utrecht-werksbegeesteringskaal (UWES) te valideer vir die SuidAfrikaanse Polisiediens (SAPD) en die konstrukekwivalensie daarvan vir verskillende rassegroepe te bepaal. ’n Dwarssnee opname-ontwerp is gebruik. Gestratifiseerde ewekansige steekproewe (N = 2396) is van polisielede uit nege provinsies geneem. Die UWES en ’n biografiese vraelys is afgeneem. Strukturele vergelykingsmodellering het ’n 3-faktormodel, bestaande uit Energie, Toewyding en Absorpsie, aangetoon. Hierdie drie faktore het aanvaarbare interne konsekwentheid getoon. Eksploratiewe faktoranalise met teikenrotasies het konstrukekwivalensie vir die drie faktore vir verskillende rassegroepe in die SAPD getoon. Bewyse is nie gevind vir uniforme of nie-uniforme sydigheid van die items van die UWES vir verskillende rassegroepe nie.

positive way.This could be done by focusing on the concept of work engagement or the different levels of engagement experienced by police officers.
It is important to use a valid and reliable instrument when work engagement is measured.Schaufeli, Salanova, González-Romá and Bakker (2002) developed the Utrecht Work Engagement Scale (UWES) and found acceptable reliability for it.Two recent studies using confirmative factor analysis demonstrated the factorial validity of the UWES (Schaufeli et al., 2002;Schaufeli, Martinez, Pinto, Salanova, & Bakker in press).However, the UWES has not yet been standardised for police officers in the SAPS and no information is available on its reliability and validity (see Rothmann, 2002).This makes it difficult to assess the levels of engagement of police officers and to compare the levels of engagement in various demographic groups, as well as to place research results in context.Therefore, it is necessary to validate the UWES for police officers in the SAPS.
South Africa is a multicultural society and the SAPS employs individuals of diverse cultural backgrounds.Within the South African context it cannot be taken for granted that scores obtained in one culture can be compared across cultural groups.Before comparing scores across cultural groups, equivalence and bias should be tested (Van de Vijver & Leung, 1997).Without a test of equivalence and bias it is impossible to know to what extent scores or constructs underlying an instrument can be compared across cultures.
The objectives of this study were to determine the construct validity and internal consistency of the UWES and to test its construct equivalence and bias for different race groups in the SAPS.

Work engagement
Research on the work engagement concept has taken two different but related paths.Maslach and Leiter (1997) rephrased burnout as an erosion of engagement with the job.Work that started out as important, meaningful and challenging, becomes unpleasant, unfulfilling and meaningless.In the view of these authors, work engagement is characterised by energy, involvement and efficacy, which are considered the direct opposites of the three burnout dimensions, namely exhaustion, cynicism and lack of professional efficacy respectively.Therefore, they also assess work engagement by the opposite pattern of scores on the three Maslach Burnout Inventory (MBI) dimensions -low scores on exhaustion and cynicism, and high scores on efficacy are indicative for engagement.
Schaufeli and his colleagues partly agree with Maslach and Leiter's (1997) description, but take a different perspective and define and operationalise work engagement in its own right.Schaufeli et al. (2002) consider burnout and work engagement to be opposite concepts that should be measured independently with different instruments.Furthermore, burnout and engagement may be considered two prototypes of employee well-being that are part of a more comprehensive taxonomy constituted by the two independent dimensions of pleasure and activation (Watson & Tellegen, 1985).Activation range from exhaustion to vigour, while identification range from cynicism to dedication.According to this framework, burnout is characterised by a combination of exhaustion (low activation) and cynicism (low identification), whereas engagement is characterised by vigour (high activation) and dedication (high identification).
Based on this theoretical reasoning and after in-depth interviews were carried out with engaged employees, Schaufeli and his colleagues have defined engagement as a positive, fulfilling, work-related state of mind that is characterised by vigour, dedication, and absorption.Rather than a momentary and specific state, engagement refers to a more persistent and pervasive affective-cognitive state that is not focused on any particular object, event, individual or behaviour.Work engagement consists of the following dimensions (Schaufeli et al., 2002): Vigour is characterised by high levels of energy and mental resilience while working, the willingness to invest effort in one's work, not being easily fatigued, and persistence even in the face of difficulties.
Dedication is characterised by deriving a sense of significance from one's work, by feeling enthusiastic and proud about one's job, and by feeling inspired and challenged by it.
Absorption is characterised by being totally and happily immersed in one's work and having difficulties detaching oneself from it.Time passes quickly and one forgets everything else that is around.
Work engagement is also distinct from other established constructs in organisational psychology, such as organisational commitment, job satisfaction or job involvement (Maslach, Schaufeli & Leiter, 2001).Organisational commitment refers to an employee's allegiance to the organisation that provides employment.The focus is on the organisation, where engagement focuses on the work itself.Job satisfaction is the extent to which work is a source of need fulfilment and contentment, or a means of freeing employees from hassles or things causing dissatisfaction; it does not encompass the person's relationship with the work itself.Job involvement is similar to the involvement aspect of engagement with work, but does not include the energy and effectiveness dimensions (Maslach et al., 2001).Lastly, engagement (especially absorption) comes close to what has been called "flow", a term used by Csikszentmihalyi (1990) that represents a state of optimal experience that is characterised by focused attention, a clear mind and body unison, effortless concentration, complete control, loss of self-consciousness, distortion of time and intrinsic enjoyment.However, flow is a more complex concept that includes many aspects and refers to rather particular, short-term "peak" experiences instead of a more pervasive and persistent state of mind, as is the case with engagement (Schaufeli et al., 1999).

The measurement of work engagement
Regarding the measurement of work engagement, Schaufeli et al. (2002) disagree with Maslach and Leiter (1997), who stated that engagement is adequately measured by the opposite profile of MBI scores.Schaufeli et al. (2002) argue that, by using the MBI for measuring work engagement, it is impossible to study its relationship with burnout empirically since both concepts are considered to be opposite poles of a continuum that is covered by one single instrument (the MBI).Although they agree that work engagement is the positive antithesis of burnout, they acknowledge that the measurement and the structures of both concepts differ.Schaufeli et al. (2002) developed a self-report questionnaire to assess work engagement (the Utrecht Work Engagement Scale -UWES), which includes items such as: "I am bursting with energy in my work" (vigour); "My job inspires me" (dedication); "I feel happy when I'm engrossed in my work" (absorption).
Regarding the psychometric qualities of the UWES, preliminary results show that the three engagement scales have sufficient internal consistencies (Schaufeli et al., 2002;in press).For samples one (314 undergraduate students) and two (619 employees) respectively, the Cronbach a's were as follows: Vigour (9 items), a = 0,68 and 0,80; Dedication (8 items), a = 0,91 (both samples); Absorption (7 items), a = 0,73 and 0,75.In the student's sample, the value of a could be improved for Vigour when three items were eliminated (a = 0,78).The three scales are moderately to strongly related (mean r = 0,63 in Sample 1 and mean r = 0,70 in Sample 2).Also, the fit of the hypothesised three-factor model to the data is superior to a onefactor solution (Maslach et al., 2001;Schaufeli et al., 2002).
When  , 1997).Uniform bias refers to influences of bias on scores that are more or less the same for all score levels.Non-uniform bias refers to influences that are not identical for all score levels.
The above discussion leads to the following hypotheses: H1:Work engagement, as measured by the UWES, is a three dimensional construct and the UWES shows high internal consistency.H2:Work engagement is an equivalent and unbiased construct for White, Black, Coloured and Indian police members.

METHOD Research design
A survey design was used to reach the research objectives.The specific design is the cross-sectional design, where a sample is drawn from a population at one time (Shaughnessy & Zechmeister, 1997).

Study population
Random samples (N = 2396) were taken from police stations in the Limpopo Province, Gauteng, Free State, Mpumalanga, Northern Cape, Western Cape, Eastern Cape, KwaZulu-Natal and North-West Province.Stations were divided into small (fewer than 25 staff members), medium (25 -100 staff members) and large (more than 100 staff members) stations.All police members at randomly identified small and medium stations in each of the provinces were asked to complete the questionnaire.
In the large stations stratified random samples were taken according to sex and race.Table 1 presents some of the characteristics of the participants.The sample was mostly male (77,08%), married, and had a high school education.The mean age of participants was 34,53 years, while the mean length of work experience was 12,96 years.

Measuring battery
The Utrecht Work Engagement Scale (UWES) (Schaufeli et al., 2002) was used to measure the levels of engagement.Although work engagement is conceptually seen as the positive antithesis of burnout, it is operationalised in its own right.Work engagement is a concept that includes three dimensions: vigour, dedication and absorption.Engaged workers are characterised by high levels of vigour and dedication, and they are immersed in their jobs.It is an (empirical) question whether engagement and burnout are endpoints of the same continuum or if they are two distinct but related concepts.The UWES is scored on a seven-point frequency rating scale, varying from 0 ("never") to 6 ("always").The alpha coefficients for the three sub-scales varied between 0,68 and 0,91.The alpha coefficient could be improved (a varies between 0,78 and 0,89 for the three sub-scales) by eliminating a few items without substantially decreasing the scale's internal consistency.

Statistical analysis
The statistical analysis was carried out by means of the SAS program (SAS Institute, 2000).Cronbach alpha coefficients and inter-item correlation coefficients were used to assess the reliability of the UWES (Clark & Watson, 1995).Descriptive statistics (e.g.means, standard deviations, skewness and kurtosis) were used to analyse the data.
Construct (structural) equivalence was used to compare the factor structures of the UWES for the different cultural groups included in the study.Exploratory factor analysis and target (Procrustean) rotation were used to determine construct equivalence (Van de Vijver & Leung, 1997).According to Van de Vijver and Leung (1997), it is not acceptable to conduct factor analyses for different cultural groups to address the similarity of factor-analytic solutions because the spatial orientation of factors in factor analysis is arbitrary.Rather, prior to an evaluation of the agreement of factors in different cultural groups, the matrices of loadings should be rotated with regard to each other (i.e., target rotations should be carried out).The factor loadings of separate groups are rotated either to one target group or to a joint common matrix of factor loadings.After target rotation had been carried out, factorial agreement was estimated using Tucker's coefficient of agreement (Tucker's phi).This coefficient is insensitive to multiplications of the factor loadings, but is sensitive to a constant added to all loadings of a factor.The following formula is used to compute Tucker's phi (Van de Vijver & Leung, 1997): This index does not have a known sampling distribution hence it is impossible to establish confidence intervals.Values higher than 0,95 are seen as evidence of factorial similarity, whereas values lower than 0,85 are taken to point to nonnegligible incongruities (Van de Vijver & Leung, 1997).This index is sufficiently accurate to examine factorial similarity at a global level.However, if construct equivalence is not acceptable, bias analyses should be carried out to detect inappropriate items.
An extension of Cleary and Hilton's (1968) use of analysis of variance was applied to identify item bias (Van de Vijver & Leung, 1997).Bias was examined for each item separately.The item score was the dependent variable, while race groups (four levels) and score levels were the independent variables.Score groups were composed on the basis of the total score on the UWES.A total of ten score levels were obtained by making use of percentiles identified through SAS UNIVARIATE.This made it possible to use score groups with at least 50 persons each.Two effects were tested through analysis of variance, namely the main effect of culture and the interaction of score level and culture.When both the main effect of culture and the interaction of score level and culture are non-significant, the item is taken to be unbiased.
Structural equation modelling (SEM) methods as implemented by AMOS (Arbuckle, 1997) were used to test the factorial model for the UWES, using the maximum likelihood method.Before performing SEM, the frequency distributions of the UWES were checked for normality and multivariate outliers were removed.However, the data did not have a multivariate normal distribution, one of the critically important assumptions associated with SEM.One approach to handling the presence of multivariate non-normal data is to use a procedure known as "the bootstrap" (West, Finch, & Curran, 1995;Yung & Bentler, 1996;Zhu, 1997).
Bootstrapping serves as a resampling procedure by which the original sample is considered to represent the population.Multiple subsamples of the same size as the parent sample are then drawn randomly, with replacement, from this population and provide the data for empirical investigation of the variability of parameter estimates and indexes of fit (Byrne, 2001).The underlying concept of the bootstrap technique is that it enables one to create multiple subsamples from an original database in order to examine parameter distributions relative to each of these spawned samples, thereby reporting values with a greater degree of accuracy (Byrne, 2001).
Hypothesised relationships are tested empirically for goodness of fit with the sample data.The c 2 statistic and several other goodness-of-fit indexes summarise the degree of correspondence between the implied and observed covariance matrixes.Jöreskog and Sörbom (1993) suggest that the c 2 value may be considered more appropriately as a badness-of-fit, rather than as a goodness-of-fit measure in the sense that a small c 2 value is indicative of good fit.However, because the c 2 statistic equals (N -1)F min , this value tends to be substantial when the model does not hold and the sample size is large (Byrne, 2001).A large c 2 relative to the degrees of freedom indicates a need to modify the model to better fit the data.
Researchers have addressed the c 2 limitations by developing goodness-of-fit indexes that take a more pragmatic approach to the evaluation process.One of the first fit statistics to address this problem was the c 2 /degrees of freedom ratio (CMIN/DF) (Wheaton, Muthén, Alwin & Summers, 1977).These criteria, commonly referred to as "subjective" or "practical" indexes of fit are typically used as adjuncts to the c 2 statistic.
The Goodness of Fit Index (GFI) indicates the relative amount of the variances/co-variances in the sample predicted by the estimates of the population.It usually varies between 0 and 1, and a result of 0,90 or above indicates a good model fit.In addition, the Adjusted Goodness-of-Fit Index (AGFI) is given.
The AGFI is a measure of the relative amount of variance accounted for by the model, corrected for the degrees of freedom in the model relative to the number of variables.
Although both indexes range from zero to 1,00, the distribution of the AGFI is unknown, therefore no statistical test or critical value is available (Jöreskog & Sörbom, 1986).The parsimony goodness-of-fit index (PGFI) addresses the issue of parsimony in SEM (Mulaik et al., 1989).The PGFI takes into account the complexity (i.e., number of estimated parameters) of the hypothesised model in the assessment of overall model fit and provides a more realistic evaluation of the hypothesised model.Mulaik et al. (1989) suggested that indexes in the 0,90's accompanied by PGFI's in the 0,50's are not unexpected, however, values > 0,80 are considered to be more appropriate (Byrne, 2001).
The Normed Fit Index (NFI) is used to assess global model fit.
The NFI represents the point at which the model being evaluated falls on a scale running from a null model to perfect fit.This index is normed to fall on a 0 to 1 continuum.Marsh, Balla and Hau (1996) suggest that this index is relatively insensitive to sample sizes.The Comparative Fit Index (CFI) represents the class of incremental fit indexes in that it is derived from the comparison of a restricted model (i.e., one in which structure is imposed on the data) with that of an independence (or null) model (i.e., one in which all correlations among variables are zero) in the determination of goodness-of-fit.The Tucker-Lewis Index (TLI) (Tucker & Lewis, 1973), which is a relative measure of covariation explained by the model that is specifically developed to assess factor models.For these fit indexes (NFI, CFI and TLI), it is more or less generally accepted that a value of less 2 2 i i xy i i x y p x y

= ∑ ∑
than 0,90 indicates that the fit of the model can be improved (Hoyle, 1995), although a revised cut-off value close to 0,95 has recently been advised (Hu & Bentler, 1999).
To overcome the problem of sample size, Browne and Cudeck (1993) suggested using the Root Mean Square Error of Approximation (RMSEA) and the 90% confidence interval of the RMSEA.The RMSEA estimates the overall amount of error; it is a function of the fitting function value relative to the degrees of freedom.The RMSEA point estimate should be 0,05 or less and the upper limit of the confidence interval should not exceed 0,08.Hu and Bentler (1999) suggested a value of 0,06 to be indicative of good fit between the hypothesised model and the observed data.MacCallum, Browne, and Sugawara (1996) recently elaborated on these cut-off points and noted that RMSEA values ranging from 0,08 to 0,10 indicate mediocre fit, and those greater than 0,10 indicate poor fit.

RESULTS
Structural equation modelling (SEM) methods as implemented by AMOS (Arbuckle, 1997) were used to test two factorial models for the UWES, a three-factor as well as a one-factor model of work engagement.It was assumed that the c 2 goodness-of-fit statistics are not likely to be inflated if the skewness and kurtosis for individual items do not exceed the critical values of 2,0 and 7,0, respectively (West et al., 1995).Data-analyses proceeded as follows: First, a quick overview of each model fit was done by looking at the overall c 2 value, together with its degrees of freedom and probability value.Global assessments of model fit were based on several goodness-of-fit statistics (GFI, AGFI, PGFI, NFI, TLI, CFI and RMSEA).Secondly, given findings of an ill-fitting initially hypothesised model, analyses proceeded in an exploratory mode using both EFA and CFA.Possible misspecifications as suggested by the so-called modification indexes and standardised residuals values were looked for and eventually a revised, re-specified model was fitted to the data.

Hypothesised three-factor model
The full hypothesised 3-factor model consisting of all 17 items was tested initially.The SEM analyses showed that the 3-factor solution was not admissible.Furthermore, the statistically significant c 2 value of 1978,79 (df = 116; p = 0,00) revealed a poor overall fit of the originally hypothesised 3-factor UWES model.However, both the sensitivity of the likelihood ratio test to sample size and its basis on the central c 2 distribution, which assumes that the model fits the population perfectly, have been reported to lead to problems of fit.Jöreskog and Sörbom (1993) pointed out that the use of c 2 is based on the assumption that the model holds exactly in the population, which is a stringent assumption.A consequence of this assumption is that models that hold approximately in the population will be rejected in a large sample.Furthermore, the hypothesised model (Model 1) was also not that good from a practical perspective.The PGFI value of lower than 0,80, NFI, TLI and CFI values of lower than 0,95 and the RMSEA value of higher than 0,05 are indicative of failure to confirm the hypothesised model.Thus, it is apparent that some modification in specification is needed in order to determine a model that better represents the sample data.

Post hoc analyses
The fit statistics in Table 3 indicate a better fit for the respecified model.Although the c 2 value (df = 85; p = 0,00) is still high, it is considerably lower than those in Model 1.All the other fit statistics indicate acceptable fit of the measurement model to the data, although the RMSEA value is still a bit high.Since this model fit was satisfactory and the results agreed with the theoretical assumptions underlying the structure of the UWES according to Schaufeli et al. (2002), no further modifications of the model were deemed necessary.The correlations between the three engagement dimensions were high.Vigour and Dedication show the highest correlation of 0,97, followed by Vigour and Absorption with a correlation of 0,96, and Dedication and Absorption with a correlation of 0,90.The re-specified three-factor model is illustrated in Figure 1.
Following Schaufeli et al. (in press), a unidimensional model was assessed as well.This model assumes that all 17 UWES items load on one single factor.Table 4 presents fit statistics for the test of the original one-factor model.The statistically significant c 2 value of 2250,37 (df = 119; p = 0,00) revealed a poor overall fit of the originally hypothesised UWES model.Again, this could be as a result of the large sample size (Jöreskog & Sörbom, 1993).Furthermore, the PGFI value of lower than 0,80, NFI, TLI and CFI values of lower than 0,95 and a high RMSEA value of 0,09 are indicative of failure to confirm the hypothesised model.Therefore, modification indexes as well as standardised residuals were examined.

Post hoc analyses
Based on the high standardised residuals, it was decided to re-specify the 1-factor model with four items deleted (Items 3,11,15 and 16).After reviewing the modification indexes, it was decided that the model fit might be further improved by allowing error terms to correlate between Item 4 and Item 5 and between Item 8 and Item 9.In summary, this model was based on 13 of the original 17 items and included correlated errors.In reviewing results bearing on the analysis of this model, Table 5 summarises the goodness-offit statistics.The descriptive statistics, alpha coefficients and inter-item correlations of the three factors of the UWES are given in Table 6.The Cronbach alpha coefficients of the scales are considered to be acceptable compared to the guideline of a < 0,70 (Nunnally & Bernstein, 1994).Furthermore, the inter-item correlations are considered acceptable compared to the guideline of 0,15 < r < 0,50 (Clark & Watson, 1995).It appears that the scales have acceptable levels of internal consistency.
Although it seems as if the 1-factor model fitted the data better than the 3-factor model, this is based only on slightly better goodness-of fit indices, and after four items were deleted.Therefore, these results provide support for Hypotheses 1.
Next, exploratory factor analysis and target (Procrustean) rotation were used to determine the construct equivalence of the UWES.The factor loadings of race groups were rotated to one target group.Factorial agreement was estimated using Tucker's coefficient of agreement (Tucker's phi).The Tucker's phi-coefficients for the four race groups are given in Table 7. Inspection of Table 7 shows that the Tucker's phi coefficients for White, Blacks, Coloured and Indian police members were acceptable.Consequently, further bias analyses were carried out on the items of the UWES.
The results of the item bias analyses that were carried out through analysis of variance for the 15 items of the adapted UWES are reported in Table 8.

DISCUSSION
The current study examined, for the first time in South Africa, the psychometric properties of the UWES, an instrument constructed to measure the engagement levels of employees.
The objectives were to determine the construct validity and internal consistency of the UWES and to test its construct equivalence and bias for different race groups in a sample of police officers.
In order to obtain a factor structure that best represents the UWES, exploratory factor analysis was used to assess the factorial structure.However, the solution yielded factors that could not be interpreted meaningfully.Because the preliminary research of Schaufeli and colleagues (2002, in press) concluded that work engagement is a multidimensional construct comprising three dimensions, it was decided to test a three-factor model, using structural equation modelling.
The hypothesised three-factor model of the UWES fitted the data, albeit after removing two unsound items, based on their high standardised residuals, and after allowing some error terms to correlate.The two items that were deleted in the three-factor model were item 4 ("I feel strong and vigorous in my job") and item 14 ("I get carried away by my work").
Because the specification of correlated error terms for purposes of achieving a better-fitting model is not an acceptable practice and error terms were allowed to correlate between items belonging to different subscales (vigour and absorption), the fit of an alternative unidimensional model was assessed as well.
This model was also rejected on both substantive and statistical grounds.Additional exploratory work revealed substantial improvement in model fit with the deletion of four items (item 3, "Time flies when I'm working", item 11, "I am immersed in my work", item 15, "I am very resilient, mentally, in my job" and item 16, "It is difficult to detach myself from my job").Error terms were also allowed to correlate in order to improve model fit (Byrne, 2001).
Although Schaufeli et al. (2002, in press) confirmed a threedimensional construct in previous studies, the three-factor structure is by no means to be considered self-evident in this sample of police officers.The three-factor model represented the data quite well.However, the one-factor model that included a specification of correlated errors to account for the shared domain-specific variances fitted the data better than the revised three-factor model.This is evident from the lower c 2 value and goodness-of-fit indexes that indicated better fit, as well as better construct equivalence for the proposed onefactor model.
These results are in contrast to the findings of Schaufeli et al. (in press).Although their hypothesised three-factor model did also not fit well to the data of any of the three samples, the fit of a one-factor model was inferior in comparison with a threefactor model in all three samples.It must be mentioned that they allowed error terms to correlate in all three subscales.
In examining the factor structure, some undesirable psychometric characteristics were found to be associated with several items in the UWES.Items 4 and 14 (in the three-factor model) and items 3, 11, 15, and 16 (in the one-factor model) showed high standardised residual errors.Additionally, these items had the highest modification indexes.These findings suggest that the items may require either deletion or content modification, in which the latter must rather be considered.
The particular items may be problematic because they do not correspond to the conceptual domain of the particular dimension (in the case of the three-factor model).However, it is more likely that they are somewhat ambiguous, or that they are either sample-or country-specific.Also, the problems with some of these items may be related to difficult words that some of the participants could have found difficult to understand and/or interpret (e.g.vigorous, immersed and resilient).This is highly likely, because only 11 percent had English as mother tongue.
The prominent correlated errors in this study present an important problem.In general, the specification of correlated error terms for the purpose of achieving a better-fitting model is not an acceptable practice.Correlated error terms in measurement models represent systematic, rather than random, measurement error in item responses.They may derive from characteristics specific either to the items or the respondents (Aish & Jöreskog, 1990).For example, if these parameters reflect item characteristics, they may represent a small omitted factor.
However, as may be the case in this instance, correlated errors may represent respondent characteristics that reflect bias such as yea-/nay-saying, social desirability (Aish & Jöreskog, 1990), as well as a high degree of overlap in item content (when an item, although worded differently, essentially asks the same question) (Byrne, 2001).
However, previous research with psychological constructs in general (e.g.Jöreskog, 1982;Newcomb & Bentler, 1988;Tanaka & Huba, 1984), and with measuring instruments in particular (Byrne, 1988(Byrne, , 2001)), has demonstrated that the specification of correlated errors can often lead to substantially better fitting models.Bentler and Chou (1987) also argue that the specification of a model that forces these error parameters to be uncorrelated is rarely appropriate with real data.Therefore, it was considered more realistic to incorporate the correlated errors in this study, rather than to ignore their presence.
It is believed that this confusing state of affairs regarding the UWES does not reflect weaknesses inherent in the instrument, but is rather due to more general factors.First, the UWES is a recently constructed measuring instrument.Therefore, relatively few studies have critically reviewed its psychometric properties.In order to study the construct validity of work engagement in greater detail, additional theory-driven research is needed.Secondly, the UWES is an instrument that was originally constructed from data based on samples of individuals in the Netherlands (Schaufeli & Bakker, 2001).Therefore, valid research that compares levels of work engagement in South Africa is lacking and a thorough psychometric evaluation of this instrument in our specific national context will be influenced by the specific culture of the country (or more specifically, the culture of the police organisation).Schaufeli et al. (in press) also found that the hypothesised three-factor model of work engagement was invariant across Spanish, Dutch and Portuguese samples.Also, the dimensionality of the UWES could be influenced because of the high reported correlations between the three dimensions.Explicit theory indicating exactly how the three sub-scales relate to one another and to other variables must be developed before one can evaluate thoroughly the theoretical validity of a three-component conceptualisation.
Internal consistencies were computed for the three engagement scales, which revealed that all three subscales are sufficiently internally consistent according to the guideline of Nunnally and Bernstein (1994).The alpha coefficient of 0,92 for the one-factor model was considerably higher.
Construct (structural) equivalence was used to compare the factor structures of the UWES for different cultural groups included in the study.Equivalence was acceptable for White, Black, Coloured and Indian police members.Furthermore, bias analyses were carried out on the items of the UWES.Bias was examined for each item separately.In this analysis, it was found that the means of the race groups did not differ in a systematic way.It can be deduced that the UWES items do not show uniform or non-uniform bias.Therefore, it seems acceptable to use the UWES to compare work engagement of different race groups.
In conclusion, the data strongly suggest that the one-factor model better fits the data than the three-factor model.However, there is, as yet, insufficient evidence to suggest that a one-factor model is superior to a three-factor model.Thus, although a onefactor model fits the data better, a three-factor model will also fit the data well.Based on the results obtained in this study, it seems as if the UWES must undergo intensive psychometric evaluation before it could be used as a suitable instrument for measuring engagement of police members in the SAPS.
This study had several limitations.First, self-report measures were exclusively relied upon.This causes a particular problem in validation studies that use self-report measures exclusively because at least part of the common variance of the measures has to be attributed to method variance (Schaufeli, Maslach & Marek, 1993).The use of a cross-sectional study design also represents a limitation, i.e. that of the ability to test causal assumptions regarding the engagement syndrome.Longitudinal data would allow for forming a better understanding of the true nature of work engagement.Also, items were allowed to correlate in the model specification.This may impose interpretation problems because as correlated error terms are added to the model, the correspondence between the posited construct of interest and the empirically defined factor becomes unclear (Gerbing & Anderson, 1984).

RECOMMENDATIONS
There appear to be several research issues that flow from this study and which require attention in increasing both our understanding of work engagement and the usefulness of this concept.Clearly, further construct validity research is needed to establish more fully the factorial validity of the UWES.None of the solutions could be regarded either as effectively confirming the authors' proposed three-subscale structure, or as an adequate replication of the factor structures found in their studies (Schaufeli et al., 2002, in press).
The second issue relates to problem items.Individual items of the UWES may need to be carefully examined when they are used in South African samples.This issue can also be clarified in future research that compares samples from different occupations.Because different problem items emerged with different models, it is more evident that further construct validity research is needed in order to establish more fully the psychometric soundness of the UWES.The findings of this study also suggest the need for possible improvement to item content.This implies that the wording of certain items must be modified in order to make them more appropriate for the specific context.It also seems important to work towards improving the UWES for South African circumstances by identif ying a core set of items that could most validly measure the concept of work engagement.
Five suggestions for future research derive from the present findings.Research is needed to determine the reliability and validity of the UWES in other samples in South Africa.
Research is needed in other occupations to establish norms for engagement levels other than police officers.Future studies should use large samples and adequate statistical techniques (e.g.structural equation modelling).Large sample sizes might provide increased confidence that study findings would be consistent across other similar groups.Researchers contemplating future validation of the UWES are urged to utilise statistical programs that can yield a measure of multivariate normality, and provide appropriate estimation procedures, given findings of non-normal data.Fourthly, in order to overcome the problem of systematic measurement error in item responses, it is recommended that the items of the MBI-GS and UWES be combined in a single questionnaire for research purposes.Finally, in future studies structural equation modelling could be used to test the construct equivalence of the UWES.In testing for these equivalencies, sets of parameters (i.e.factor loading paths, factor variances/covariances and structural regression paths) could be tested by increasing restrictions in every step.
Table 2 presents fit statistics for the test of the original model.

Table 8
shows no practical significant eta square values.This indicates that the means of the race groups for the different score levels do not differ from zero in a systematic way.No uniform or non-uniform bias exist regarding the items of the UWES for Whites, Blacks, Coloureds and Indians.These results provide support for Hypotheses 2.