PROBLEMS WITH THE FACTOR ANALYSIS OF ITEMS : SOLUTIONS BASED ON ITEM RESPONSE THEORY AND ITEM PARCELLING

The factor analysis of items often produces spurious results in the sense that unidimensional scales appear multidimensional. This may be ascribed to failure in meeting the assumptions of linearity and normality on which factor analysis is based. Item response theory is explicitly designed for the modelling of the non-linear relations between ordinal variables and provides a strong alternative to the factor analysis of items. Items may also be combined in parcels that are more likely to satisfy the assumptions of factor analysis than do the items. The use of the Rasch rating scale model and the factor analysis of parcels is illustrated with data obtained with the Locus of Control Inventory. The results of these analyses are compared with the results obtained through the factor analysis of items. It is shown that the Rasch rating scale model and the factoring of parcels produce superior results to the factor analysis of items. Recommendations for the analysis of scales are made. OPSOMMING Die faktorontleding van items lewer dikwels misleidende resultate op, veral in die opsig dat eendimensionele skale as meerdimensioneel voorkom. Hierdie resultate kan dikwels daaraan toegeskryf word dat daar nie aan die aannames van lineariteit en normaliteit waarop faktorontleding berus, voldoen word nie. Itemresponsteorie, wat eksplisiet vir die modellering van die nie-liniêre verbande tussen ordinale items ontwerp is, bied ’n aantreklike alternatief vir die faktorontleding van items. Items kan ook in pakkies gegroepeer word wat meer waarskynlik aan die aannames van faktorontleding voldoen as individuele items. Die gebruik van die Rasch beoordelingskaalmodel en die faktorontleding van pakkies word aan die hand van data wat met die Lokus van Beheervraelys verkry is, gedemonstreer. Die resultate van hierdie ontledings word vergelyk met die resultate wat deur ‘n faktorontleding van die individuele items verkry is. Die resultate dui daarop dat die Rasch ontleding en die faktorontleding van pakkies meer bevredigende resultate lewer as die faktorontleding van items. Aanbevelings vir die ontleding van skale word gemaak. PROBLEMS WITH THE FACTOR ANALYSIS OF ITEMS: SOLUTIONS BASED ON ITEM RESPONSE THEORY AND ITEM PARCELLING Requests for copies should be addressed to: D De Bruin, Institute for Child and Adult Guidance, RAU, PO Box 524, Auckland Park, 2006 16 SA Journal of Industrial Psychology, 2004, 30 (4), 16-26 SA Tydskrif vir Bedryfsielkunde, 2004, 30 (4), 16-26

In the second place, the relations between items are often nonlinear, which violates the assumption of linearity and normality underlying factor analysis (Bernstein & Teng, 1989;Waller et al., 1996). The problems with non-linearity, which is reflected in significant univariate skewness, univariate kurtosis and multivariate kurtosis, manifest in so called "difficulty factors", where items with similar distributions tend to form clusters or factors irrespective of their content (Finch & West, 1997;Gorsuch, 1997;McDonald, 1999). Such factors are often spurious with little if any psychological meaning.
In the third place, the intervals between the scale points of items are likely to be fewer, larger, and less equal than that of scales (Little et al., 2002). Bandalos (2002) described the intervals between scale points of items as "coarse categorisations". The lack of equal intervals violates the assumption that the input variables are linear and measured on at least an interval scale level (Finch & West, 1997).
In confirmatory factor analysis the consequences of the violation of the assumptions are reflected in inflated likelihood chi-square tests of fit, reduced standard errors, and inflated error variances (Finch & West, 1997). However, these consequences become less acute when the item response scales contain more scale points or categories. For instance, an item with an ordered seven-point response scale is more likely to approximately satisfy the assumptions of factor analysis than a dichotomous item. Byrne (2001) pointed out that when categorical variables approximate a normal distribution the number of categories does not appreciably influence the chi-square test of fit between the model and the data. Furthermore, under these conditions factor loadings and factor correlations are only modestly underestimated. Overall, research suggests that items with five or more ordered response categories perform relatively well in confirmatory factor analyses when responses to these items follow an approximate normal distribution (Byrne, 2001).
Two approaches to dealing with non-normality and nonlinearity in the analysis of items will be discussed in the paragraphs that follow, namely (a) using measurement models from item response theory, and (b) using item parcels rather than individual items as the basic units of factor analysis. Item response theory techniques are useful in the analysis of unidimensional scales, whereas the factor analysis of item parcels is appropriate when the research focuses on the relations between latent variables or factors rather than the items themselves.
Item analysis using item response theory based methods Item response theory focuses explicitly on the non-linear relations between items and the hypothetical latent trait that underlies the items. There are several competing item response theory models, of which the most popular are (a) Rasch's (1960) logistic model, which is also sometimes called the oneparameter logistic model, (b) the two-parameter logistic model, and (c) the three-parameter logistic model (Embretson & Reise, 2000). In the present study the focus falls on the Rasch model.
The Danish mathematician, Georg Rasch, developed a mathematical model where the probability of a correct or incorrect response to a dichotomous item may be predicted as a function of an individual's standing on the latent trait (or ability) that is measured by the items. The probability that an individual will endorse or correctly answer an item depends on two aspects only, namely (a) the ability, or whatever is being measured, of the individual (q), and (b) the difficulty of the item (b). In Rasch analysis person ability and item difficulty are expressed on the same logit scale, which allows for a direct comparison of persons and items. If an individual's ability matches the difficulty of an item, the Rasch model predicts that he or she will have a 50% probability of answering the item correctly or endorsing the item. If, however, the individual's ability exceeds the item difficulty, there is a greater than 50% chance that he or she will answer the item correctly or endorse the item. Similarly, if the item's difficulty exceeds the individual's ability, there is a less than 50% chance that he or she will answer the item correctly or endorse the item (Bond & Fox, 2001). These relationships can be mathematically expressed by the following formula: where P ni (x ni = 1/q n , b i ) is the probability of person n on item i scoring a correct (x = 1) response given person ability (q n ) and item difficulty (b i ), and e is the natural log function. Andrich (1978aAndrich ( , 1978b) extended the Rasch model for dichotomous items to a rating scale model for ordered category items. In the rating scale model each item is described by a single item location or difficulty parameter (b). In addition, an item with m + 1 ordered categories or response options is modelled as having m thresholds or category intersection parameters (d). Each threshold corresponds with the difficulty of making the step from one category to the next. In the rating scale model the same set of category intersection parameters is estimated for all the items in the scale (this requires that all items must have the same number of categories). The item difficulty parameter serves to move the item thresholds up or down the logit scale (b). The probability of person n endorsing category j on item i is estimated by the following formula: where P ni (x ni = 1/q n , b i ) is the probability of person n on item i endorsing category j (x = j), given person ability (q n ), item difficulty (b i ) and the category intersection parameter (d j ), and e is the natural log function.
Person ability and item difficulty may be estimated by joint, marginal, or conditional maximum likelihood procedures. In the present study all parameters are estimated with the Winsteps programme (Linacre, 2003), which uses an unconditional or joint maximum likelihood method. One of the attractive theoretical features of the Rasch model is that the raw scores for persons and items are sufficient statistics for the estimation of person and item parameters (Embretson & Reise, 2000). The property of sufficient statistics leads to a condition called specific objectivity, which holds that person ability can be estimated separately from item difficulty and vice versa. This means that an individual's ability estimate is independent of the particular sample of items that were chosen and that an item's difficulty estimate is independent of the particular persons that were chosen for the calibration of the items (Andrich, 1989;Embretson & Reise, 2000;Fischer, 1995).
The estimated person and item parameters can be used to estimate the probability of each individual endorsing a particular item. These probabilities may then be compared with the actual data and on the basis of this comparison the fit of the items and persons to the rating scale model may be computed. Commonly used fit statistics are the INFIT mean square and the OUTFIT mean square (Wright & Masters, 1982). For each individual an expected item score, E ni , is calculated which is then subtracted from the observed item score, X ni to produce a score residual, Y ni , which is standardised to give a standardised score residual Z ni . By summing the squared standardised residuals a chi-square statistic is obtained, which when divided by N, gives the OUTFIT mean square. The INFIT statistic weighs the squared standardised individual items by their standard deviations, rendering it more sensitive to deviations from the measurement model for on-target items (i.e. when the difficulty of an item ( 1/ , ) 1 i n ni ni n i n i e P x e θ β θ β = θ β = + matches the ability of an individual). In contrast, the unweighted OUFTIT mean square is more sensitive to deviations from the measurement model for off-target items. INFIT and OUTFIT mean squares range between zero and infinity and have an expected value of 1,0. Values below 1,0 indicate that the person or item overfits the model (i.e. there is less variation in the observed responses than were modelled), whereas values above 1,0 indicate a less than desirable fit (i.e. there is more variation in the observed responses than was modelled). Generally, fit values below 1,0 are of less concern than fit values greater than 1,0. The Rasch model presents a mathematical ideal and it is unrealistic to expect that items or persons will fit the model exactly. Hence, following the recommendations of Linacre and Wright (1994) for the analysis of rating scales, items with INFIT and OUTFIT mean squares between 0,7 and 1,4 may be regarded as demonstrating adequate fit. When the items fit the model it provides evidence that all the items are measuring the same latent trait.
Note that the Rasch model does not include an item discrimination parameter to be estimated. Hence, the model proceeds on the requirement that all items discriminate equally well. Items that do not satisfy this requirement may be measuring something in addition to the trait of interest and will not fit the rating scale model properly (as indicated by INFIT and OUTFIT). Wright (1999) demonstrated that the introduction of a discrimination parameter destroys the property of specific objectivity and therefore the separation of item and person parameters.
The properties described in the paragraphs above suggest that the Rasch model may be fruitfully applied in the analysis of items. Specifically, a Rasch analysis can show whether (a) the items in a scale fit the requirements of the model and therefore measure the same trait, (b) the categories of the rating scale function appropriately, (c) the items succeed in separating individuals with different standings on the trait of interest, and (d) the items form a meaningful hierarchy in terms of the probability of endorsement. Furthermore, a Rasch analysis produces standard errors for each item calibration and person measure, which may be used to construct confidence intervals around individual observations. The standard errors for persons may be plotted against the person measures to show how precisely the scale measures over different levels of the latent trait.

Item parcelling
Although a Rasch analysis may shed important light on the functioning of items within a unidimensional scale, researchers may be interested in the multidimensional structure of a set of items. A common strategy is to subject the items of the scales to a factor analysis (Gorsuch, 1997). As pointed out in the preceding paragraphs, however, items violate the assumptions of factor analysis because they are ordinal and have non-linear relations with each other, and are relatively unreliable.
Some researchers deal with the problems associated with the factor analysis of items by using item parcels rather than individual items as the basic units of analysis. An item parcel may be defined as "an aggregate-level indicator comprised of the sum (or average) of two or more items …" (Little et al., 2002, p. 152). Parcels are more reliable than individual items, have more scale points, and are more likely to have linear relations with each other and with factors (Comrey, 1988;Little et al., 2002;Kishton & Widaman, 1994). Hence, one would expect the factor analysis of parcels to provide more satisfactory factor analytical results with improved model-data fit.
The proponents of parcelling view it as an attempt to iron out the inevitable empirical "wrinkles" caused by the unreliability of items, the non-linear relations between items, the unequal intervals between scale points, the smaller ratio of common variance to unique variance, and the tendency for unique variances to be correlated in confirmatory factor analyses. Such "wrinkles" may lead to unsatisfactory factor analytic results and the rejection of useful measurement models (Little et al., 2002).
When items are aggregated their shared variance is pooled, which means that the proportion of common variance increases relative to the proportion of unique variance. This leads to stronger factor loadings and communalities. Furthermore, the distributions of parcels are likely to be more normal than the distributions of individual items. Further advantages are that the number of scale points in parcels is increased and that the distances between scale points are likely to be reduced. Bandalos (2002) demonstrated that when items within a particular scale have a unidimensional structure, the factor analysis of parcels leads to improved model-data fit and less biased structural parameters. When the items have a multidimensional structure, however, the factor analysis of parcels may mask the multidimensionality and lead to the acceptance of misspecified models. Furthermore, under these conditions parcelling may lead to biased structural parameters. Hence, it is recommended that parcelling should be used only when the items within a scale have a unidimensional structure (Bandalos, 2002;Little et al., 2002).
The general practice of parcelling is criticised by some authors (see Bandalos, 2002;Little et al., 2002). The critics, whom Little et al. (2002) described as philosophically empirical-conservative, argue that parcelling distorts the reality and that it serves as a smoke screen that clouds the issues of incorrect model specification and/or poor item selection. These critics believe that all sources of variance in an item should be reflected in a confirmatory factor analysis. In contrast, the proponents of parcelling, described as philosophically pragmatic-liberal, take the view that it is impossible to account a priori for every possible source of variance in each item (Little et al., 2002).
Three methods of parcelling are briefly described in the paragraphs that follow, namely (a) random assignment of items to parcels, (b) a priori parcel construction, and (c) empirical assignment of items to parcels. Random assignment of items to parcels is justified when the items form an essentially unidimensional scale. Under this condition each item may be seen as an alternative and equivalent indicator of the construct or factor. Here the researcher first decides on the number of parcels he or she prefers and then randomly assigns (without replacement) items to the parcels. The random assignment of items to parcels is the method used in the present study.
A second approach to parcelling is to intentionally construct homogenous sets of items that are aggregated to form parcels. This approach requires of the researcher to first specify the number of parcels and the content or meaning of the parcels. Homogeneous sets of items are then written for each parcel. Comrey (1970) followed this approach in the construction of the Comrey Personality Scales (note, however, that Comrey used a combined empirical and rational approach in determining the content of each parcel).
In the last place, parcels may also be formed empirically, where the total pool of items is subjected to a factor analysis. Clusters of highly correlating items are then combined to form parcels, which then serve as the input variables for further analyses (see Cattell & Burdsal, 1975;Gorsuch, 1997;Schepers, 1992).
The primary aim of the empirical part of this study is to demonstrate techniques that may be used to deal with the problems associated with the factor analysis of items. The techniques are demonstrated in terms of responses to the items of the Locus of Control Inventory (Schepers, 1995). A secondary aim, therefore, is to shed more light on the construct validity of the Locus of Control Inventory.

METHOD Participants
Participants were 1662 first-year students who completed the Locus of Control Inventory (Schepers, 1995) as part of a larger test battery. The test results are used for counselling and research purposes and are dealt with confidentially.

Instrument
The Locus of Control Inventory (Schepers, 1995) consists of 80 items that measure three constructs, namely External Control, Internal Control, and Autonomy. On the basis of a previous item analysis, three items were rejected due to poor item characteristics, resulting in a total of 77 items (J.M. Schepers, personal communication). The reliabilities of the three scales for the present group of participants, as estimated by means of Cronbach's coefficient alpha, may be described as satisfactory: External Control (25 items), a = 0,84; Internal Control (26 items), a = 0,83; and Autonomy (26 items), a = 0,87. Each item is endorsed on a seven-point scale. All negatively phrased items were reflected for the purposes of the Rasch analyses in the present study.

RESULTS
The first step in the analysis process was to investigate the distributions of the items. The Mardia coefficient of multivariate kurtosis for the items was 914,99 (normalised multivariate kurtosis = 169,09), which clearly indicated a violation of the assumption of multivariate normality. Table 1 shows the skewness and kurtosis coefficients for each of the items. Inspection of Table 1 shows that several of the items were not normally distributed.
Principal axis factor analysis of the Locus of Control Inventory items To provide a basis for comparison, the 77 selected items of the Locus of Control Inventory were subjected to an unrestricted principal axis factor analysis. The eigenvalues-greater-than-unity criterion, which is often used as a guide to the number of factors that should be extracted, suggested that 19 factors should be extracted from the intercorrelations of the 77 items. However, on theoretical grounds, as reflected in the scoring key, one would have expected only three factors.
Separate factor analyses of the items within each of the three scales obtained by Schepers (1995) were also conducted. The eigenvalues-greater-than-unity criterion suggested five factors for the Autonomy items, six for the External Control items, and five for the Internal Control items. On face value, these findings suggest that the three scales are multi-dimensional and that the existing scoring key of the Locus of Control Inventory, which treats each of the three scales as unidimensional, might be inappropriate. As explained in the introduction, however, these results may reflect methodological artefacts rather than psychologically meaningful and replicable factors.

Rasch rating scale analysis
An important goal of the Rasch rating scale analysis was to determine whether the items of each of the three Locus of Control Inventory scales form a unidimensional scale. From the Rasch perspective, the investigation of unidimensionality proceeds by diagnosing idiosyncratic response patterns using item fit statistics. The item calibrations and fit statistics for the Autonomy items are given in Table 2. Inspection of the INFIT and OUTFIT mean squares shows that only one item did not fit the rating scale model, namely item 62 (OUTFIT mean square = 1,44). This item should be scrutinised to identify the reason for the misfit. The mean INFIT value was 1,01 (SD = 0,20) and the mean OUTFIT value was 1,04 (SD = 0,22), suggesting a reasonable fit between the data and the model as a whole. The difficulty calibrations of the 25 items ranged between -0,91 (item 66) and 0,74 (item 72), indicating a reasonable spread of item difficulties. The standard error of each item difficulty calibration was low (either 0,02 or 0,03), indicating that the calibrations were precise. The range of item-score correlations was relatively small (between 0,33 and 0,57), which shows that the items related similarly to the latent trait. The person separation reliability, which is similar in interpretation to Cronbach's alpha coefficient, was 0,85, suggesting that the items succeeded in separating individuals with different trait levels.
Three items of the External Control Scale had INFIT or OUTFIT mean squares greater than 1,40, namely items 4, 78, and 52 (see Table 3). Note that these items had relatively low item-score correlations, suggesting that they measure something different from the other items in the scale. The mean INFIT value was 1.02 (SD = 0,23) and the mean OUTFIT value was 1,03 (SD = 0,23), which showed good overall fit between the External Control items and the rating scale model. The item difficulty calibrations ranged between -0,99 (item 9) and 0,64 (item 52) and the standard errors of the calibrations were low (0,02 for each item). The person separation reliability was 0,82, which may be described as satisfactory.
Five items of the Internal Control Scale had INFIT or OUTFIT mean squares greater than 1,40, namely items 16, 59, 26, 76 and 60 (see Table 4). Note that the OUTFIT mean square for item 16 was particularly high (OUTFIT mean square = 1,83), suggesting that this item detracts from the measurement quality of the Internal Control scale. The mean INFIT value was 1,03 (SD = 0,22), which might be described as satisfactory. The mean OUTFIT value was 1,11 (SD = 0,26), which is less satisfactory and shows that some of the items were responded to in an inconsistent way. The item difficulty calibrations ranged between -0,73 (item 19) and 0,78 (item 26) and the standard errors of the calibrations ranged between 0,02 and 0,03. The person separation reliability was 0,79, which although lower than that of the Autonomy and External Control scales, might still be regarded as satisfactory.
Overall, the Rasch rating scale analysis suggested that the majority of the Locus of Control Inventory items showed adequate fit to the Rasch model. A reasonable spread of item difficulty calibrations was observed for each scale and the standard errors of the calibrations were very small. Furthermore, the person separation reliabilities of the three scales were satisfactory. Hence, it was concluded that each scale measures an essentially unidimensional construct. However, some items were identified that did not fit the model very well. Although one might decide to eliminate these items, it may be more fruitful to study them in order to identify the reasons for their poor fit. Close scrutiny of these items may reveal the reasons for the misfit and may provide some illumination as to the meaning of the constructs that are measured by the scales.
Unrestricted maximum-likelihood factor analysis of the item parcels On the basis of the Rasch analyses each of the three Locus of Control Inventory scales was treated as unidimensional. Within each scale the items were randomly assigned to one of five item parcels, giving a total of 15 parcels (parcels A1 to A5 represented the Autonomy items, parcels E1 to E5 the External Control items, and parcels I1 to I5 the Internal Control items). Each parcel contained between five and seven items.   Mardia's coefficient of multivariate kurtosis for the 15 parcels was 35,72 (normalised multivariate kurtosis = 32,24), which showed that the violation of multivariate normality was less extreme than for the items. The skewness and kurtosis coefficients of each of the 15 parcels are reflected in Table 5. Comparison of this table with Table 1 also shows that the parcels deviated less severely from normality than did the items.
The 15 item parcels were subjected to an unrestricted maximum-likelihood factor analysis with oblique Promax rotation (k = 4). The Scree-plot suggested that three factors should be extracted (see Figure 1), which jointly explained 63.52% of the variance. Although the significant likelihood chi-square suggested that more factors might be extracted, c 2 (63) = 229,61, p < 0,001, inspection of the residual matrix showed only two residuals > 0,05 (see Table 6). The overall smallness of the residuals showed that the extraction of more factors was not warranted. Moreover, the extraction of only three factors was consistent with the theoretical measurement model that underlies the Locus of Control Inventory.
The Promax rotated factor pattern matrix is presented in Table  7. Inspection of this table shows that each factor was well defined: Factor 1 by item parcels A1 to A5, Factor 2 by item parcels I1 to I5, and Factor 3 by item parcels E1 to E5. The primary factor pattern coefficients ranged between 0,53 (I1 on Factor 2) and 0,82 (A2 on Factor 1). The highest secondary factor pattern coefficient of any parcel was 0.16 (A1 on Factor 2), suggesting that each parcel was a relatively pure indicator of its respective factor.
Overall, the findings of the unrestricted factor analysis of the item parcels are consistent with the postulated structure of the Locus of Control Inventory and provide support for the construct validity of the three scales.

Figure 1. Scree plot of eigenvalues for the item parcel solution
Maximum-likelihood confirmatory factor analysis The construct validity of the postulated factor structure of the Locus of Control Inventory was also examined with a maximum-likelihood confirmatory factor analysis. The first  step in the confirmatory factor analysis was to specif y the measurement model (see Figure 2). This model, which was labelled Model 1, postulated that parcels E1 to E5 were indicators of an External Control factor, parcels I1 to I5 were indicators of an Internal Control factor, and parcels A1 to A5 were indicators of an Autonomy factor. Model 1 is consistent with the scoring key of the Locus of Control Inventory. In accordance with common factor theory, each parcel was also influenced by a unique factor that represented error variance and specific variance. The unique variances and the loadings of the factors on their respective indicators were freely estimated from the data. The loadings of a factor on the parcels that do not serve as indicator of that factor were constrained to zero (for instance the loading of Autonomy on parcel E1 was hypothesised to be equal to zero). The correlations between the factors were also freely estimated from the data. To statistically identif y the model, the variances of the factors and the regression weights of the parcels on the unique factors were fixed to unity. In the last place, the correlations between all unique factors were constrained to be equal to zero.   Although the hypothesis of an exact fit was rejected, the GFI, AGFI, TLI, CFI, RMSEA, and SRMR suggested satisfactory fit bet ween Model 1 and the data. The rejection of the hypothesis of exact fit was not unexpected, because with a sample size of 1662 the chi-square was rendered so powerful that even very small discrepancies would have led to a significant chi-square.
Inspection of the standardised residual matrix shows that, for the most part, the residuals were small (see Table 8). It does seem, however, that the External Control and Autonomy parcels share some variance that is not adequately modelled. The statistical fit of Model 1 could be improved by estimating the correlations bet ween the External Control and Autonomy unique variances. This was not done, however, because in the absence of theoretical justification for such correlations, they would have been difficult to explain.
The standardised estimated factor loadings of Model 1 are summarised in Table 9. Each of the factors had high and statistically significant loadings on their respective parcels, which shows that the parcels are good indicators of the constructs. The loadings ranged from 0,65 (I1 on the Internal Control factor) to 0,84 (A5 on the Autonomy factor). Note that these loadings, and therefore also the communalities, are higher than what would have obtained  E1  E2  E3  E4  E5  I1  I2  I3  I4  I5  A1  A2 A3 A4  To provide a further basis for comparison, the confirmatory factor analysis was also conducted with the 77 items as the units of analysis (Model 2). Each item was assigned to a factor in accordance with the scoring key. Factor loadings and error variances were freely estimated, but the variances of the factors and the regression of the parcels on the unique factors were fixed to unity. The goodness of fit indices for Model 2 were as follows: c 2 (2846) = 12374,31; GFI = 0,80; AGFI = 0,79; TLI = 0,66; CFI = 0,67; RMSEA = 0,052 (0,052 -0,053); and SRMR = 0,06. For all the indices, except the RMSEA and the SRMR, Model 2 (item model) fit the data substantially poorer than Model 1 (parcel model). To allow for quick comparison, the fit of the two models is summarised in Table 10. Overall, the confirmatory factor analysis of the item parcels revealed a good fit between the model and the data. The item parcels were shown to be strong indicators of their respective factors. The correlations between the factors were moderately high to high. Note that Autonomy and Internal Control shared approximately 50% of their reliable variance, suggesting that they might be combined into a single factor. Inspection of the Modification Indices, however, showed that the fit of the model could not be improved by allowing the indicators of the two factors to have cross loadings or by allowing the unique factors of these indicators to be correlated. Moreover, a model where the correlation between the Autonomy and Internal Control factors was constrained to unity (Model 3), showed relatively poor fit: c 2 (88) = 1170,46; GFI = 0,91; AGFI = 0,88; TLI = 0,88; CFI = 0,90; and RMSEA = 0,086 (,082 -0,090). Because Model 3 was nested within Model 1, the difference in their respective chi-squares could be interpreted for significance. This difference was statistically significant, c 2 (1) = 542,13, suggesting that the original threefactor model (Model 1) fit the data significantly better than Model 3. These findings provide support for the construct validity of the three scales of the Locus of Control Inventory.

DISCUSSION
The purpose of this article was to examine problems encountered in the factor analysis of items and to demonstrate two methods that may be used to address these problems, namely item response theory models, and the factor analysis of item parcels rather than individual items. It was pointed out in the introduction that the problems might be attributed to the violation of some of the assumptions on which factor analysis is based. The first assumption is that the input data are continuous and measured on an interval level, but items provide ordinal data that typically contain only a limited number of ordered categories. Secondly, the distributions of items are often nonnormal, which violates the assumption of normality. Thirdly, the relations between items and the traits that underlie them are nonlinear, which violates the assumption of linear relations. Furthermore, in comparison to scales items are unreliable, which leads to low communalities, poor factor solutions, and correlated unique factors.
The factor analysis of items, the Rasch rating scale model, and the factoring of item parcels were applied to the items of the Locus of Control Inventory. This inventory consists of three scales, namely Autonomy, Internal Control and External Control.
A central focus of this study was to examine the degree to which the different analytic methods support the construct validity of the three scales.
Unrestricted principal axis factor analysis of the Locus of Control Inventory items On theoretical grounds one would expect three factors to explain the covariances of the 77 items of the Locus of Control Inventory. However, when subjected to a principal factor analysis the eigenvalues-greater-than-unity criterion suggested 19 factors. When the items of the three scales were analysed separately the eigenvalues-greater-than-unity criterion suggested five factors for the Autonomy scale, five factors for the Internal Control scale, and six factors for the External Control scale. On the basis of these results one might conclude that the scales are multidimensional rather than unidimensional and that their scoring keys may have to be revised to reflect this multidimensionality. However, it should be kept in mind that the observed multidimensional structure might be a methodological artefact. Nunnally and Bernstein (1994) warned in this regard: Ordinary approaches to factoring items (i.e. those that may be appropriately applied to scale-level analyses) are almost guaranteed to produce spurious results. Such spurious results may lead to inappropriate criticism of sound scales or, what is basically the same thing, lead an investigator to falsely believe that the scale that he or she has developed is inappropriately multidimensional when in fact it is not (Nunnally & Bernstein 1994, p. 316).
Rasch rating scale analysis The Rasch model represents a mathematical ideal for measurement, which requires that all the items should relate in a consistent way to the trait of interest. Only two factors influence an individual's response to an item in the Rasch model: (a) the individual's standing on the latent trait that the item measures, and (b) the difficulty or endorsability of the particular item.
From this it follows that if the data fit the model, then the items constitute an essentially unidimensional scale.
The Rasch rating scale analysis of the Locus of Control Inventory items showed that with the exception of a small number of items, the fit between the data and the model was satisfactory for all three scales. Hence, it is concluded that each of the Autonomy, Internal Control and External Control scales measure an unidimensional trait and that the items in each scale function properly. These results are in contrast to that of the principal factor analysis of the same data, which suggested that the scales are multidimensional. A possible reason for the different results might be that the Rasch model was explicitly designed for the analysis of ordinal items and explicitly models non-linear relations between items and the latent trait that they measure, whereas factor analysis is more appropriate for the analysis of continuous, normally distributed data.
Some authors argue that it is not necessary to employ the Rasch or other item response theory models, because the person measures produced by these models correlate very strongly with ordinary summated total scores (Fan, 1998). One should note, however, that a very strong correlation is only observed if the data fit the Rasch model. Under these conditions the total score contains all the information necessary to estimate a person's standing on the latent trait (Andrich, 1989). When the data do not fit the model, the total score is not a sufficient statistic for the estimation of a person's standing on the latent trait and the correlation between total scores and the Rasch person measures will be lower. From this perspective the Rasch model provides justification for the calculation of total scores if the data fit the model. Rasch measures are to be preferred over total scores because total scores represent the ordinal-scale measurement, whereas Rasch measures are at an interval level. Furthermore, Rasch measures are independent of the particular sample of items, and as a consequence are not adversely affected by missing data. In the last place a Rasch analysis allows for the identification of individuals whose responses do not fit the model and for whom the total score might not be an adequate indicator of his or her standing on the latent trait.
Unrestricted factor analysis of the item parcels The 77 items of the Locus of Control Inventory were reduced to 15 parcels through the random assignment of items within a particular scale to a parcel. Each parcel contained five or six items and each of the External Control, Internal Control, and Autonomy scales was represented by three parcels. Note that the parcelling was only performed after the Rasch analysis had supported the unidimensionality of the three scales. Hence, each parcel might be considered to be a mini-version of the full scale to which it belongs.
The unrestricted factor analysis of the 15 item parcels with Promax rotation provided strong support for the validity of a three-factor solution to the Locus of Control Inventory. These factors corresponded with the External Control, Internal Control, and Autonomy scales. The residual covariances of the parcels were very small indicating that no additional factors with substance could be extracted from them. These results are in contrast with those of the unrestricted principal axis factor analysis of the 77 items as described in the paragraphs above. The factor analysis of the parcels produced results that are in accordance with the theory that underlies the Locus of Control Inventory.

Confirmatory factor analysis of the item parcels
The effect of item parcelling in confirmatory factor analysis was investigated by comparing the results of an item-level confirmatory model with those of a parcel-level confirmatory model. In the item-level model the 77 items served as indicators of the Internal Control, External Control, and Autonomy factors, and in the parcel-level model the 15 parcels served as the indicators of these three factors. A comparison of the two models showed that the parcel-level model fit the data much better than the item-level model. The fit of the parcel-level model was very good, indicating that the covariances of the parcels were adequately explained by the three postulated factors of the Locus of Control Inventory. In contrast, the results of the item-level analysis indicated poor fit.
The superior fit of the parcel-level model in the unrestricted and confirmatory factor analyses may be ascribed to the fact that parcels are more reliable, have more scale points, more closely approximate an interval-scale level, more closely approximate normality, and therefore more closely satisfy the assumptions of factor analysis than do individual items. In addition, parcels have proportionally smaller unique variances, which lessens the likelihood of correlations between unique factors.
One should consider the possibility that the parcelling procedure might have masked poorly fitting items and model misspecification, but in this study the parcels were formed after the Rasch analysis had confirmed that a common thread runs through all the items in a particular scale. Hence, it appears safe to conclude that the random assignment of items to parcels was justified.

RECOMMENDATIONS
Taking into consideration the results of this study and the work of others, the following three-step strategy is recommended for the analysis of questionnaires or inventories with more than one scale (of which the Locus of Control Inventory is an example). This recommendation is based on the assumption that the scales were constructed on the basis of strong theory and that the researcher has a very clear idea of the constructs that each of the items serve to indicate.
As a first step, one should determine whether each of the scales measure a single dominant trait. A satisfactory fit between the data and the Rasch model, which was explicitly designed for the analysis of items, provides strong justification for the presence of such a dominant trait. The Rasch model may also be used to identify weak items that are in need of revision or items that should be eliminated from the scale.
As a second step, researchers may randomly assign the items within a scale to parcels. Note that the random assignment of items to parcels is only justified if the items within a scale measure a unidimensional or dominant trait. In the absence of unidimensionality it is not clear what parcels formed by random assignment represent and any further analysis of the parcels will be meaningless.
As a third step, the parcels may serve as the input variables for unrestricted or confirmatory factor analysis. If the results of the factor analysis correspond with the anticipated structure it provides support for the construct validity of the scales. In addition, the preceding analyses will have confirmed the quality of the items that comprise the scales. However, if the results do not correspond with the anticipated structure the construct validity of the scales should be questioned. It is possible that the original items do not serve as adequate indicators of the relevant constructs or it may be that the theory on which the scales are based may need to be revised.