A PSYCHOMETRIC INVESTIGATION INTO THE USE OF AN ADAPTATION OF THE GHISELLI PREDICTABILITY INDEX IN PERSONNEL SELECTION

The magnitudes of validity coefficients typically encountered in validation studies are disappointingly low. Validity coefficients typically fall below 0,50 and only very seldom reach values as high as 0,70. Numerous possibilities have been considered on how to affect an increase in the magnitude of the validity coefficient. A thought-provoking alternative to the usual multiple-regression based attempts may be found in the work of Ghiselli (1956, 1960a, 1960b). The objective of this article is to propose and evaluate a modification to the original Ghiselli procedure. Encouragingly positive results were obtained. Recommendations for future research are made.

The validity coefficients typically encountered in validation studies are, however, disappointingly low. Validity coefficients typically fall below 0,50 and only very seldom reach values as high as 0,70 (Campbell, 1991;Guion, 1998). Typically selection instruments thus explain only 25% of the variance in the criterion (Campbell, 1991). The validity ceiling first identified by Hull (1928) seemingly still persists. Numerous possibilities have been considered on how to affect an increase in the magnitude of the validity coefficient (Campbell, 1991;Ghiseli, Campbell & Zedeck, 1981;Guion, 1991;Guion, 1998;Wiggens, 1973). Most of these attempts revolved around modifications and/or extensions to the regression strategy (Gatewood & Feild, 1994).
A though-provoking alternative to the usual multipleregression based attempts may be found in the work of Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960b. Rather than elaborating on the basic mathematical model of multiple-regression, Ghiselli has chosen to attack the problem of improved prediction directly by the use of empirical procedures (Ghiselli, 1956(Ghiselli, , 1960a(Ghiselli, , 1960b. The essence of the proposed procedure revolves around the development of a composite predictability index that explains variance in the prediction errors or residuals resulting from an existing prediction model. It would, however appear as if the procedure has found very little if any practical acceptance. The actuarial nature of the procedure could probably to a large extent account for it not being utilized in the practical development of selection procedures. The lack of general acceptance must, however, also be attributed in part to the fact that the predictability index originally proposed by Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960b failed to significantly explain unique variance in the criterion when added to a model already containing one or more predictors (Wiggens, 1973). The predictability index only serves the purpose of isolating a subset of individuals for whom the model provides relatively accurate criterion estimates. The selection problem, however, requires the assignment of each and every member of the total applicant sample (and not only a subset of the applicant group) to at least an accept or a reject treatment (Cronbach and Gleser, 1965) based on their estimated criterion performance.
Based on the original idea proposed by Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960b, the objective of this research is to investigate the possibility that the differentiation between subjects on the basis of the predictability of their criterion performance could be used to increase the accuracy of the criterion estimates for the total applicant sample. More specifically, the objectives of the study are (a) to propose a modification to the Ghiselli procedure that would solve the aforementioned problem experienced by Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960b in his original studies, (b) to corroborate the earlier finding of Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960b that the development of a predictability index that significantly explains variance in the criterion residual is practically possible, (c) to demonstrate that the proposed modification to the Ghiselli procedure did in fact solve the problem experienced by the predictability index (based on absolute residuals) originally proposed by Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960b, (d) to examine the factor structure of the modified predictability index to establish whether substantive theoretical meaning could be attached to it, (e) to examine the incremental validity resulting from the inclusion of the modified predictabilit y index in the prediction model, and, (f) to examine the impact of the inclusion of the modified predictabilit y index in the prediction model on selection utility.
Theoretical rationale for the development of a predictability index Measurement data, once obtained, are translated into decisions in accordance with some strategy for decision-making (Cronbach, 1960). A decision strategy describes how scores from tests are to be combined with non-test information, and what decision will be made for any given combination of facts. A strategy is thus a rule for arriving at selection decisions used by a decision maker in any possible contingency (Cronbach & Gleser, 1965). It consists of a set of specified conditional probabilities (typically either zero or unity), which reflects the policy of decision-maker. In the final analysis it is the selection decision strategy that should be evaluated in terms of its predictive validity -in other words in terms of the correspondence that exists between the criterion referenced inferences made via the decision rule from the available predictor information and the actual criterion performance achieved (Gatewood & Feild, 1994).
Several selection decision-making strategies exist that range from pure clinical to pure mechanical combinations of data available to the decision maker (Gatewood & Feild, 1994;Grove & Meehl, 1996;Kleinmutz, 1990;Murphy & Davidshofer, 1988). Clinical prediction involves combining information from test scores and measures obtained from interviews and observations covertly in terms of an implicit combination rule imbedded in the mind of a clinician to arrive at a judgment about the expected criterion performance of the individual being assessed (Gatewood & Feild, 1994;Grove & Meehl, 1996;Murphy & Davidshofer, 1988). Mechanical prediction involves using the information overtly in terms of an explicit combination rule to arrive at a judgment about the expected criterion performance of the individual being assessed (Gatewood & Feild, 1994;Murphy & Davidshofer, 1988). An actuarial system of prediction represents a mechanical method of combining information to arrive at an overall inference about the expected criterion performance of an individual that was objectively derived via statistical or mathematical analysis from actual criterion and predictor data sets (Meehl, 1957;Murphy & Davidshofer, 1988). An actuarially derived decision rule should, therefore, more likely reflect the nature of the relationship that exists between the various predictor variables and the criterion construct. Regression analysis provides the basis of an actuarial decision-making strategy by regressing performance assessments on a weighted linear combination of predictors. The multiple regression strategy minimizes error in prediction and combines the predictors optimally to yield the most efficient estimate of criterion status (Berenson, Levine & Goldstein, 1983;Hair, Anderson, Tatham & Black, 1995;Gatewood & Feild, 1994;Howell, 1992).
The accuracy with which prediction models estimate criterion performance can be enhanced in a number of ways. Essentially two classes of approaches can be distinguished. The first category of approaches could be termed substantive theory approaches in as far as they originate from contemplating the manner in which variance in performance could be substantively explained in terms of theory. The second category of approaches could be termed operational design approaches in as far as they originate from reflecting on the degree of success with which the validation design measures the relevant latent variables and samples the relevant applicant population. The various arguments falling under these two categories of approaches essentially describe different but probably simultaneously operating processes that explain why existing prediction models make prediction errors and thus why the criterion performance of some individuals are predicted more accurately than the performance of others.
Under a substantive theory approach it would be argued that effective selection is possible because the performance level achieved by any individual on the job or in training is not a random event. There exists a systematic, albeit complex, relationship between specific person-centred characteristics, specific variables characterizing the job or training situation, and the level of success achieved on the job or in training. Effective selection is possible under a construct-orientated approach (Binning & Barrett, 1989) to the extent to which the identity of the person centred determinants of job or training performance are known and the manner in which they collectively combine in the criterion is accurately captured in a nomological network or latent structure (Campbell, 1991;Kerlinger & Lee, 2000). These person-centred determinants of criterion performance, could serve in combined form as a suitable substitute measure for the, still to be realised, actual criterion scores. The way measures of these determinants of performance should be combined is suggested by the way these determinants are linked in the nomological network (Theron, 1999). Typically the assumption is made that the linkages in the nomological network are linear. This need, however, not necessarily be the case.
To the extent that the linearity assumption is in error, the accuracy of prediction will suffer. To the extent that influential determinants of criterion performance are excluded from the prediction model, the accuracy of prediction will suffer. The accuracy with which prediction models estimate criterion performance can therefore be enhanced by building additional determinants of criterion performance into the model and/or by making provision for non-linearity in the model by including product or quadratic terms in the regression equation, which allows the model to remain linear in the partial regression coefficients (making provision for moderator variables would be a specific example of this strategy) or by formulating an equation which is non-linear in the regression coefficients (Berenson, Levine & Goldstein, 1983;Hair, Anderson, Tatham & Black, 1995;Gatewood & Feild, 1994;Howell, 1992).
The assumption that the relationship between the latent predictor variables and the latent criterion is linear (so as to simplify analysis) or at worst curvilinear, but expressible in terms of a familiar and solvable mathematical function could, however, still be insufficient to accurately model the relationship. If a highly contorted hyperplane defining the value of an endogenous criterion latent criterion variable (h) over a space of n exogenous latent predictor variables () would be assumed, such that for any combinations of conditions of the exogenous predictor variables the endogenous criterion latent variable has a specific value, the reaction of h to changes in i would seem random, even though h is strictly determined by i . One would thus have strict determinism masquerading as chaos so to speak (Theron, 2001). Should such a situation exist it would suggest the building of neural networks as the methodological avenue to pursue, rather than the conventional approach of fitting known, normally linear, mathematical models, via regression analysis, to the data (Abdi, Valentin & Edelman, 1999;Smith 1993).
An operational design approach, however, would attack the problem on how to enhance the accuracy with which prediction models estimate criterion performance differently. Under this approach the argument would be that when developing a selection procedure the objective is to model the relationship between the latent criterion construct and fallible measures of the predictor constructs that determine job performance as it exists in the applicant population on which the selection procedure will eventually be used. In reality, however, the relationship between a fallible measure of the criterion construct and fallible measures of the predictor constructs is modelled on a biased sample selected from the applicant population. The extent to which the operationalized criterion and/or the operationalized predictor contain systematic measurement error (i.e., bias) will distort the validity coefficient (Nunnally & Bernstein, 1994;Thorndike, 1982). The nature of the effect will depend on the patterns of correlations found between the contaminating variable, the predictor and the intermediate criterion. Hierarchical regression analysis, suppressor variables and partial correlation coefficients constitute options to address measurement bias, provided the source of the bias can be measured (Berenson, Levine & Goldstein, 1983;Hair, Anderson, Tatham & Black, 1995;Howell, 1992). The extent to which the operationalized criterion contains random measurement error and the extent to which the validation sample is a too homogenous and thus an unrepresentative, biased, sample from the applicant population, will adversely affect the validity coefficient (Campbell, 1991;Crocker & Algina, 1986;Lord & Novick, 1968;Messick, 1989;Schepers, 1996). Both of the latter factors will attenuate the validity coefficient. It thus follows that, to the extent that the aforementioned two factors did operate in the validation study but do not apply to the actual area of application, the obtained validity coefficient cannot, without formal consideration of these factors, be generalised to the actual area of application. The obtained validity coefficient thus cannot, without appropriate corrections, be considered an unbiased estimate of the actual validity coefficient of interest.
Appropriate formulas to correct the validity coefficient for criterion unreliability and restriction of range have been derived from classical measurement theory (Crocker & Algina, 1986;Lord & Novick, 1968;Kaplan & Saccuzzo, 2001;Schepers, 1996;Theron, 1999). If these corrections would be applied, the validity coefficient would be adjusted, but that would still leave the prediction equation, in terms of which the criterion estimates are derived, unaffected. The prediction equation actually used to derive the expected criterion estimates for decision-making is thus still the one derived from the validation study data, which, however, is not fully representative of the actual applicant population (Theron, 1999).
The approach suggested by Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960b seems to straddle the aforementioned two categories of approaches in terms of which the accuracy of prediction models can be enhanced. Classic psychometric theory holds that errors of measurement and of prediction are characteristics of the measuring device rather than the testee and that these errors are distributed randomly across individuals. Interactive effects between the measuring device and the person being assessed are not recognized, and the psychological structure of all individuals is taken to be the same. To increase reliability and validity of measurement, attention is then entirely focused on the improvement of measurement devices. However, a substantial body of evidence indicates there are systematic individual differences in error, and in the importance that a given trait has in determining a particular level of performance (Ghiselli, 1963). Ghiselli (1960b) proposed a method whereby a moderator variable may be developed for a specific prediction situation. Ghiselli (1956) investigated the possibility of differentiating by some other means, perhaps another test, those individuals whose predicted and actual criterion scores show small absolute discrepancies from those individuals whose predicted and actual criterion scores are markedly different. In a derivation sample, the absolute differences between predicted and actual criterion scores are obtained. Correlation analysis is subsequently performed to identify items from a separate item pool that discriminate between high and low predictability (i.e., items that correlate with the absolute differences between predicted and actual criterion scores). The items that correlate significantly with the absolute residual are then linearly combined in a predictability index. To the extent that the predictability index correlates with the absolute residuals, it should be possible to separate those subjects for whom the regression model provides accurate criterion estimates from those for whom the model performs less well. The index of predictability should therefore function as a moderator (Anastasi & Urbina, 1997;Wiggens, 1973). Knowledge of the predictability of an individual's criterion score should have considerable practical value. In an actual applicant sample, applicants would be ordered on the predictability index, and predictions would be made from the original predictors for the most predictable subset of applicants only. As predictions would be limited to an increasingly smaller proportion of the applicant sample, the validity of the predictor should approaches unity. Selection procedures, therefore, can be improved not only by the addition of highly valid predictors to present procedures, but also by the addition of devices to screen out individuals whose levels of aptitude and job proficiency show little correspondence. Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960bGhiselli ( , 1963 has provided a number of convincing demonstrations of the utility of this approach and of variations on it (Wiggins, 1973).
However, it appears (Wiggens, 1973) that a combination of predictor and predictability index scores in multiple regression does not improve prediction over that given by the predictor scores alone. The value of predictability index scores lies solely in providing an index of the extent to which prediction of criterion scores from a particular test will be in error. The method does not provide for an alternative means of predicting those individuals who have been screened out because of their low predictability. Personnel selection, however requires that each and every applicant should be assigned to either an accept or a reject treatment (Cronbach & Gleser, 1965).
An important aspect in the original Ghiselli proposal that could hold the key to overcoming this shortcoming is the direction of the differences between actual and predicted scores of performance. Ghiselli viewed this as unimportant, as both overand underestimates count as "errors" (Wiggens, 1973). However, the question arises whether the direction of the prediction error should not be taken into account when developing a predictability index? The addition of such an index to a selection battery could conceivably add to the predictive validity of the battery. What is required to improve predictive accuracy is the addition of a predictor to the regression model which functions by way of analogy like a an observation post adjusting the distance and angle of mortar or artillery fire onto a target. The predictors in the model provide criterion estimates that are in most cases too high or too low. If a predictive index could be developed which would provide feedback on the magnitude of the prediction error derived from the regression model as well as the direction of the error, then the inclusion of such an index in the regression equation as an additional main effect should logically enhance the predictive validity of the selection battery. This would, however, mean that the predictive index should be developed from the real differences between actual and predicted criterion scores of subjects, rather than the absolute difference as Ghiselli (1956Ghiselli ( , 1960aGhiselli ( , 1960bGhiselli ( , 1963 originally proposed. If the direction of the prediction error would be taken into account when developing a predictability index, large positive values on the index would signal large positive residuals (underestimation) and large negative values (or low positive values) would signal large negative residuals (overestimation), assuming a positive correlation between the predictability index and the real residuals (Y-E[Y|X i ]).
The addition of this index to a regression model should enhance the predictive validity of the selection procedure because its values would provide feedback on the magnitude of the prediction error derived from the regression model as well as the direction of the error. The partial regression coefficient associated with the predictability index in the expanded regression model should be positive. An initial estimate derived from the original model, which is too low (underestimate) should therefore be elevated in the subsequent estimate derived from the expanded regression model due to the influence of the positive predictability index value. Conversely an initial estimate derived from the original model, which is too high should be lower in the subsequent estimate derived from the expanded regression model due to the influence of the negative predictability index value. The same principle should still apply even if the predictability index scale would be linearly transformed to run from zero to some positive upper limit.
The foregoing argument, however, still provides no substantive theoretical explanation as to why the proposed modification to the original Ghiselli procedure would assist in enhancing the predictive accuracy of an existing prediction model. The proposed modification to the original Ghiselli procedure is implicitly based on an argument as to why an existing prediction model predicts the criterion performance of some applicants more accurately than the performance of other applicants? Neither does the foregoing argument shed light on the related question why specific items would demonstrate the ability to reflect and even anticipate the prediction errors made by an existing prediction model? Systematic variance in the criterion is induced by systematic differences in a complex nomological network of person-centred and situational latent variables. Criterion performance is determined by the push and pull forces of a large number of variables. Criterion performance is a hyper plane responding to changes in p-1 performance determinants in a p dimensional space. To the extent that influential determinants of criterion performance are excluded from the prediction model, the accuracy of prediction will suffer because the push and/or pull effect of numerous influential variables on criterion performance is ignored. The extent to which prediction accuracy will suffer will, however, vary across individuals. For some individuals the omitted variables exerted a marked push or pull force to dramatically adjust the effect of the predictor(s) currently taken into account by the prediction model on criterion performance. For others the effect of the omitted variables on criterion performance is less dramatic. Could it be that the proposed modification to the original Ghiselli procedure essentially sniffs out item indicators of some of the latent variables that were not included in the prediction model but that do in fact influence performance?
Accuracy of prediction in and by itself is not the ultimate objective of research in personnel selection. The ultimate purpose of personnel testing is to arrive at substantiated qualitative decisions (Cronbach & Gleser, 1965). The challenge for any study into the improvement of personnel testing therefore ultimately lies in demonstrating that the quality of decision-making benefits from the proposed improvement. Several utility models can be distinguished to determine the total utility of a selection procedure, whereby the best known models are those of Taylor-Russell (1939), Naylor-Shine (1965), Brogden (1946) and Cronbach and Gleser (1965). Brogden (1946;1949a;1949b) and Cochran (1951) have shown that selection utility is a linear function of test validity, and that total selection utility could therefore be enhanced by an improvement in total validity. This increase in utility would in the final analysis determine whether the use of the proposed predictability index would contribute to the ultimate aim of effective selection in organisations, namely to contribute to the efficiency of the business in terms of monetary value.

METHOD Participants
To serve the analytical purposes of this study, the data had to meet a number of specific requirements. The data set, firstly, had to contain an explicit criterion measure and a predictor measure, which correlates significantly with the criterion. The data set, secondly, had to contain the results of a second predictor, but in this case measures were required on the item level. The items of the second predictor had to provide the data from which the predictability indices would be harvested. No specific requirements were posed with regards to the nature of the latent variable measured by the second donor predictor. It was thus not required that the donor predictor measure should measure one or latent variable that could theoretically be expected to explain variance in the criterion construct. This rather liberal approach should, however, probably be questioned as somewhat naïve in as far as it completely ignored the question why specific items correlate with real or absolute residuals. The data set, thirdly, had to be large enough to allow the formation of a derivation sample on which the predictability index would initially be developed, and a holdout sample on which the predictability index would be cross-validated.
A data set was obtained from the data archives of Psytech SA that satisfied the first two of the aforementioned requirements. Psytech SA obtained data from the Gordon's Institute of Business (GIBS) on 101 MBA students between 1990 and 1991. A highly selected non-probability sample was chosen from students with average or above average interim MBA performance levels. The variance on the MBA examination scores was therefore typically low. Average interim MBA performance was utilized as the criterion in the study. The Ability, Processing of Information and Learning battery (Apil-B) (Taylor, 1994) was utilized as the predictor. Descriptive statistics on the criterion and the predictor is shown in Table 1. The Organisational Personality Profile (OPP) Questionnaire (Psytech, 2003), along with the Critical Reasoning Test Battery Version 2 (CRTB2) (Psytech, 2003) was also administered to the sample. The initial intention was to use only the items of the OPP for the development of the two predictability indices. It, however, subsequently become necessary to also use the items the CRTB2 for the development of the predictability index based on absolute residuals. More detailed information regarding the sampling methodology was not available from Psytech. The nature of the sampling methodology is, however, not critical in arriving at valid and credible conclusions on the merits of the modifications proposed to the original Ghiselli procedure.
The data set obtained from Psytech was too small to permit the formation of a derivation sample and a holdout sample. In terms of Cohen's statistical power tables (Cohen, 1988), however, the sample size of 101 for the derivation sample can be regarded as adequate. The required number of participants to achieve statistical power of 0,80 in testing the significance of a sample product moment r, given a medium effect size of r = 0.30, a 5% significance level and a directional alternative hypothesis, is n = 68. At a 1% significance level the required n increases to 107. For a non-directional alternative hypothesis the Cohen tables recommend sample sizes of 84 (p = 0,05) and 124 (p = 0,01), assuming the same effect size as before.

Statistical hypotheses
Hypothesis 1: Average MBA performance (Y) is significantly related to learning potential as measured by the Apil-B (X 1 ).
Hypothesis 2: A predictability index (X 2 ) can be developed from the items of a personality measure that shows a significant correlation with the real, algebraic residuals (Y -E[Y|X 1 ]) (Y res ) computed from the regression of the criterion (Y) on a learning potential predictor (X 1 ).
Hypothesis 3: The addition of the predictability index, based on the real, algebraic values of the residuals (X 2 ), to the regression model will significantly explain unique variance in the criterion measure (Y) that is not explained by the learning potential predictor (X 1 ).
Hypothesis 4: A predictability index (X 3 ) can be developed from the items of a personality measure that shows a significant correlation with the absolute residuals |(Y -E[Y|X 1 ])| (|Y res |) computed from the regression of the criterion (Y) on a learning potential predictor (X1).
Hypothesis 5: The addition of the predictability index, based on the absolute values of the residuals (X 3 ), to the regression model will not significantly explain unique variance in the criterion measure (Y) that is not explained by the learning potential predictor (X 1 ).
Postulate 1: The factor structure underlying the items comprising the predictability index (X 2 ) provides evidence that a clear substantive theoretical interpretation could be attached to the predictability index.
Postulate 2: If the addition of the predictability index, based on the real, algebraic values of the residuals (X 2 ), to the regression model significantly explains unique variance in the criterion measure (Y) that is not explained by the learning potential predictor (X 1 ) and thereby increases the predictive validity of the selection procedure, the addition of the predictability index, based on the real, algebraic values of the residuals (X2) will increase selection utility.

Statistical analyses
The Statistical Package for Social Sciences (SPSS) version 11.0 was used to analyse the data. The specific analyses performed and the logic underlying the sequence of analyses will be outlined below.

RESULTS
To be able to investigate the feasibility of the proposed modifications to the original Ghiselli procedure, a significant linear relationship between a criterion and at least one predictor is required. It had been hypothesized that MBA performance should be systematically related to learning potential as measured by the Apil. Hypotheses 1 was tested by calculating the zero-order product-moment correlation between average MBA performance and performance on the Apil and the corresponding conditional probabilities P[|r ij | ³ r c |H 0 : r[Y,X 1 ] = 0]. Given a 5% significance level and directional alternative hypotheses, H 01 will be rejected if P[|r ij | ³ r c |H 01 : r[Y,X1] = 0] < 0,05. The matrix of zero-order Pearson correlation coefficients and the corresponding conditional probabilities is portrayed in Table 2.
The convention proposed by Guilford (cited in Tredoux & Durrheim, 2002, p. 184) has been used to interpret sample correlation coefficients. Although somewhat arbitrary and although it ignores the normative question about the magnitude of values typically encountered in a particular context, it nonetheless fosters consistency in interpretation.
The moderate positive correlation of the Apil-B ability test (X 1 ) and the MBA performance (Y) (r = 0,46; p < 0,05) confirmed that the Apil-B can be used as the primary predictor of MBA performance. H 01 can therefore be rejected. The substantial relationship between learning potential and MBA performance can thus be used as a platform to empirically investigate the proposed modifications to the original Ghiselli procedure. .000 Apil general learning Pearson Correlation 0,416 1 potential score (X1) Sig. (1-tailed) .000 .
Average MBA performance was subsequently regressed on the Apil-B ability test (X1) by fitting the following regression model on the data: The results of the standard regression analysis are presented in Table 3. Approximately 17% of the variance in the criterion (MBA performance) can be explained in terms of performance on the Apil-B (the primary predictor). The real, algebraic unstandardized residuals (Y -E[Y|X 1 ]) and the absolute unstandardized residuals (|Y -E[Y|X 1 ]|) were subsequently derived from the fitted regression model and written to the active data file. The real, algebraic unstandardized residuals are plotted against the predictor in Figure 1. From Figure 1 it appears as if the linearity, normality and homoscedasticity assumption underlying the linear model have been reasonably well satisfied (Tabachnick & Fidell, 1989). Satisfaction of the homoscedasticit y assumption would, moreover, imply that accuracy of prediction is not a function of learning potential. Accuracy of prediction is, however, a (linear) function of criterion performance, with the strength of the relationship inversely related to the predictive validity of the predictor. Large positive real residuals tend to be associated with high MBA averages while high negative real residuals are associated with low MBA averages (not shown). Knowing this, however, has very little practical value in improving prediction accuracy other than to underline the need to increase predictive validity. The absolute unstandardized residuals are plotted against the predictor in Figure 2.   The 98 individual items of the OPP personality questionnaire were subsequently correlated with the real and absolute residuals computed from the fitted regression model. The OPP items that correlated significantly with the real residuals at the 0,05 level were flagged for inclusion in the predictability index (X 2 ). Nine items correlated significantly with the real residuals at this level (minimum r = 0,196; maximum r = 0,315; average r = 0,220). In the case of the absolute residuals, however, only a single OPP item presented itself as a significant predictor of the absolute prediction errors made by the fitted regression model. This clearly created a dilemma as far as the calculation of the second predictability index (X 3 ) is concerned. The possibility of harvesting items from the Critical Reasoning Test Battery (CRTB2) was consequently examined. The 62 items of the CRTB2 subtests were therefore correlated with the absolute residuals in a similar fashion to the OPP items. Again the yield was rather disappointing. Only three CRTB2 items correlated significantly with the absolute residuals at the 0,05 level; two items from the Verbal subscale and one item from the Numerical subscale (minimum r = 0,208; maximum r = 0,388; average r = 0,329). It is worthy of note that the CRTB2 items yielded eight significant predictors of the real residuals (minimum r = 0,245; maximum r = 0,362; average r = 0,273). A further sobering fact is that although the number of items in the OPP and the CRTB2 that correlate significantly with the real residuals exceeded the number of significant correlations one could expect by chance on a 0,05 significance level (4,9 and 3,1 respectively), this is not the case with regards to the absolute residuals. Since approximately five of the nine items harvested from the OPP could have been selected by chance alone, the danger exists that the predictability index took advantage of idiosyncrasies in the specific data set that would unlikely repeat itself a subsequent samples taken from the same population 2 . The likelihood that the predictability index would cross-validate successfully thus diminishes.
The selected nine OPP items correlating with the real residuals were subsequently combined in an unweighted linear composite by taking the mean of the qualifying items, to form the predictability index (X 2 ) based on real residuals. The selected three CRTB2 items were likewise combined in an unweighted linear composite by taking the mean of the qualifying items, to form the predictability index (X 3 ) based on absolute residuals. The eight CRTB2 items significantly correlating with the real residuals and the single OPP item correlating significantly with the absolute residuals could also have been utilized in the formation of X 2 and X 3 respectively. It was, however, decided to restrict the harvesting of items to a single donor instrument so as to not run the risk of uncovering an obvious underlying factor structure reflecting nothing more than the nature of the instruments contributing items to the index when investigating postulate 1.
The predictability index based on the real residuals (X 2 ) and the predictability index based on the absolute residuals (X 3 ) were subsequently correlated with the unstandardized real and absolute residuals to determine the success with which the two predictability indices have been developed. In anticipation of the addition of the predictability indices to the basic regression model, the correlation of the two indices with the primary predictor and with the criterion was determined as well. The results are presented in Table 5. Table 5 shows that the predictability index based on real residuals, (X 2 ), did correlate moderately (0,509) and significantly (p<0,05) with the real residuals derived from the regressing the MBA averages on the Apil-B ability predictor. H 02 can therefore be rejected in favour of H a2 , It is possible to develop a predictability index (X 2 ) from the items of a personality measure that shows a significant correlation with the real, algebraic residuals (Y -E[Y|X 1 ]) computed from the regression of the criterion on a learning potential predictor. Table 5, in addition, reveals that the absolute residual predictability index based on the absolute residuals (X 3 ) did correlate moderately (0,508) and significantly (p<0,05) with the absolute residuals. H 04 can therefore be rejected in favour of H a4 , if the initial assumption that the OPP would yield a sufficient number of items for the index could be wavered. It is possible to develop a predictability index (X 3 ) from the items of a critical reasoning measure that shows a significant correlation with the absolute residuals (|Y -E[Y|X 1 ]|) computed from the regression of the criterion on a learning potential predictor.
As expected, the predictability index based on real residuals (X 2 ), correlated low (-0,002) and insignificantly (p > 0,05) with the absolute residuals derived from regressing the MBA averages on the Apil-B ability predictor. Likewise the predictability index based on absolute residuals (X 3 ), correlated low (-0,047) and insignificantly (p > 0,05) with the real residuals. Table 5, furthermore, suggests that that the inclusion of X 2 alongside X 1 in a multiple regression model is more likely to be meaningful than the addition of X 3 to a regression model already including X 1 . X 2 correlated low (0,056) and insignificantly (p > 0,05) with the Apil-B results while correlating moderately (0,487) with the criterion. The predictability index based on real residuals (X 2 ) therefore seems to explain unique variance in the criterion not explained by the primary predictor. X 3 correlates low (0,242) but statistically significantly (p < 0,05) with the predictor while correlating low (0,058) and statistically insignificantly (p > 0,05) with the criterion. The predictability index based on absolute residuals (X 3 ) therefore seems not to explain unique variance in the criterion. Table 5 indicates that the unstandardized real residuals correlate very high (0,909) and statistically significantly (p < 0,05) with the MBA average. This could be interpreted to mean that the real residual and the criterion is essentially the same variable. Since the modified predictability index is constructed from items correlating with the real residual, one could argue that the whole exercise essentially boils down to using a variable to predict itself. This line of reasoning, however, ignores the fact that the total criterion sum of squares (S(Y i -E[Y])²) can be partition into a sum of squares due to regression (S(E[Y|X i ]-E[Y])²) and a residual sum of squares (S(Yi-E[Y|X i ])²). The total variance can thus be partitioned into a proportion criterion variance that can be explained in terms of the Apil-B (0,416²) and a proportion criterion variance that cannot be explained in terms of the Apil_B (1-0,416²). The very high correlation observed between MBA average and the real residual is therefore simply an alternative expression of the fact that Apil_B only explains a small proportion (0,416² = 0,173) of the variance in MBA average performance. The remaining proportion of the variance in MBA average performance (0,909² = 0,827) is explained by the real residual.  Table 5 finally also indicates that learning potential is not related to the accuracy of prediction (0,000; p > 0,05). This is also graphically portrayed in Figure 1 through the rectangular spread of real residuals across the range of Apil-B scores observed.
Descriptive statistics for the two predictability indices are provided in Table 6. Two dummy variables (X 2 D and X 3 D) were subsequently created by dichotomising the index distributions into high and low prediction accuracy groups. Since X 2 reflects the magnitude and direction of prediction error (i.e., real residuals), a low prediction error group, centred on zero had to be isolated. X 3 , in contrast reflect only the magnitude of prediction error and thus to isolate a high prediction accuracy, the cases falling below the median were flagged. On X 2 the cases falling between the twenty-fifth and seventy-fifth percentiles were classified as high prediction accuracy cases. On X 3 cases with an index score on or below the fiftieth percentile were classified as high prediction accuracy cases. The relationship between the criterion and the predictor was subsequently graphically portrayed in Figure 3 and Figure 4 for the two levels of the dummy variable separately. Figure 3: MBA average performance as a function of learning potential depicted for high (X 2 D=1) and low predictability (X 2 D=0) groups separately (predictability index based on real residuals). Figure 4: MBA average performance as a function of learning potential depicted for high (X 3 D=1) and low predictability (X 3 D=0) groups separately (predictability index based on real residuals).
Figures 3 and 4 seem to suggest that the predictability index based on the absolute residuals (X 3 ) is more effective in isolating a subset of individuals for whom the model provides more accurate criterion estimates than the predictability index based on real residuals (X 2 ). The two indices both correlate moderately strongly (0,51) with the residuals from which it is derived. The superiority of one index over the other in separating the more accurately predictables from the less accurately predicables thus is somewhat surprising. Table 7 reveals that the addition of the predictability index, based on the real values of the residuals (X 2 ), to the basic regression model significantly (p < 0,05) explains unique variance in the criterion measure that is not explained by the learning potential predictor. H03 can thus be rejected in favour of H a3 . The original predictor still significantly (p < 0,05) explains variance in the criterion not explained by the predictability index. The expanded regression model explains approximately 39% of the variance in the criterion, compared to the approximately 17% explained by the basic model. The addition of the predictability index thus affected a substantial increase in the proportion of criterion variance explained.  Table 7 reveals that the unique variance in the predictability index (X 2 ) explains approximately 26% (0,510²) of the unique variance in the criterion after controlling for variance due to the Apil. The unique variance in the predictability index (X 2 ) explains approximately 22% (0,464²) of the total variance in the criterion. Judged by the standardized partial regression coefficients and the partial and semi-partial correlation coefficients the predictability index is the more influential predictor in the regression model. No convincing substantial theoretical explanation for this finding could be offered. Table 8 reveals that the addition of the predictability index, based on the absolute values of the residuals (X 3 ), to the basic regression model does not significantly (p > 0,05) explain unique variance in the criterion measure that is not explained by the learning potential predictor. H 05 can thus not be rejected in favour of H a5 . It could, however, be contended that the analysis is inappropriate in as far as an X 3 learning potential interaction effect should have been added to the model rather than an index main effect. Although no supporting evidence is presented here, this study also finds that the addition of a term representing the interaction between X 3 and Apil, also does not significantly (p > 0,05) explain unique variance in the criterion measure that is not explained by the learning potential predictor. Apil general 0,182 0,040 0,427 4,512 0,000 0,416 0,415 0,414 learning potential score (X 1 ) X 3 -0,436 0,910 -0,045 -0,479 ,633 0,058 -0,048 -0,044 Given that the addition of the predictability index, based on the real values of the residuals (X 2 ) to the basic regression model significantly explains unique variance in the criterion measure that is not explained by the learning potential predictor (X 1 ), the question arises whether substantive meaning could be attached to the index scores. The objective was to determine if any theoretical meaning could be attached to the common factors underlying the index, if any were identified, and whether these interpretations would make sense in terms of the criterion. To shed light on this matter an exploratory principle component analysis was performed on the OPP items combined in the predictability index. The rotated component matrix should indicate whether the items comprising the predictability index systematically measured one or more underlying common construct(s), which could be linked to specific personality construct(s) or whether the predictability index is nothing more than an incoherent, meaningless collection of items that have nothing more in common than their correlation with the regression residuals. The eigenvalue greater than one rule was be used to decide on the number of factors to extract. Varimax rotation was used to rotate the obtained solution to simple structure.
Based on the eigenvalue greater than one rule and the scree plot four factors were extracted and orthogonally rotated (Table 9). The first four factors account for approximately 63% of the variance in the items. These results, however, fail to provide a clear, convincing, and credible answer to the question whether substantive meaning could be attached to the index scores. The borderline Kaiser-Maier-Olkin measure of sampling adequacy value (0,552) casts some doubt on the factorability of the correlation matrix (Tabachnick & Fidell, 1989). Extracting this many factors from only nine items and a sample size of 101 also seem somewhat questionable, especially given the unconvincing KMO statistic. No clear-cut picture moreover emerges from Table 9. Although each item loads reasonably high on single factor only, the common theme amongst the items loading on the same factor tends to be somewhat debatable. The first principle component could possibly be interpreted as a focus-intensity factor, the second principle component possibly as a compulsiveness factor and the third principle component possibly as a driven factor. These suggestions are, however, at best tenuous. Despite their questionable nature, these themes could conceivably play a role in the level of performance MBA students achieve. With the wisdom of hindsight this could, however, probably have been said for any of the OPP items. It should finally be conceded that it probably would have been more appropriate to have performed a common factor analysis rather than principal component analysis, given the intention to identify common factors. The nature of the pattern matrix obtained through principal axis factor analysis with oblique rotation roughly replicates the structure obtained through the principal component analysis, though somewhat less clean-cut. The small entries in the factor correlation matrix (< |0,20|) suggest that a single second-order factor is highly unlikely. The available evidence thus seems to suggest that the items combined in the predictability do not reflect a single underlying factor but fails to convincingly rule out the possibility that the predictability index is very little more than an incoherent, meaningless collection of items that have nothing more in common than their (possibly chance) correlation with the regression residuals. The most prudent option would probably be to regard the available evidence as too ambivalent to take any definite decision on postulate 1. Item analyses were nonetheless performed on the set of nine items derived from the correlation bet ween the OPP personality measurement and the real residuals taking into consideration the results of the principle component analysis. The results of the item analyses (a(C 1 ) = 0,5241 a(C 2 ) = 0,4919 a(C 3 ) = 0,5289) indicate modest internal consistency for the three sets of items loading on the first three principle components. This finding is, however, not surprising given the limited number of items involved. Given the findings on the underlying structure it would not be meaningful to directly calculate a coefficient alpha for the nine items combined in the predictability index. The reliability of an unweighted linear composite (Nunnally & Bernstein, 1994) comprising the eight items loading on the first three principle components could be calculated though from the reliabilities and the variances of the three components. As could be expected a rather modest value of 0,601 is obtained.
A definite increase in the proportion of criterion variance explained was found when adding the predictability index based on real residuals to the basic regression model. The question is what the effect of this increase in predictive validity is on the quality of selection decision-making. The Taylor-Russell (Cascio, 1991), Naylor-Shine (Cascio, 1991) and Brogden-Cronbach-Gleser (Brogden, 1949;Cascio, 1991;Cronbach & Gleser, 1965) utility models were subsequently employed to describe the effect of the incremental validity of the predictive index on the quality of selection decisionmaking.
The addition of the predictability index resulted in an increase in predictive validity from 0,416 (Table 5) to 0,623 (Table 7). To translate this increase in predictive validity to increases in decision quality in terms of the aforementioned three utility models, however, requires additional data on the other selection parameters characterizing the three models. Since such data was not available for the validation sample, realistic illustrative values had to be assumed for the other parameters affecting the improvement in the quality of selection decision-making in each of the utility models to describe the effect of the incremental validity of the predictive index on the quality of selection decision-making. The choice of specific parameter values was essentially an arbitrary one. An applicant pool of 2000 and 100 vacancies was consequently assumed. Average tenure was assumed to be 5 years. The perapplicant cost associated with the Apil battery was assumed to be R250 and that of the OPP, R350. The standard deviation of the criterion distribution expressed in a R-c metric was assumed to vary between 35% and 45% of average salary (Cascio, 1991). Average salary was arbitrarily set at R100 000 per annum. It was assumed that 50% of the applicant pool could succeed if selected. Bivariate normality was assumed.
The selection ratio f would therefore equal 0,05 and the resulting l value, obtained from the standardised normal probability table would equal 0,103. The base rate (BR) would be 0,50.
The improvement in the proportion of the selected applicants succeeding on the criterion (i.e., the success ratio, S v ) affected by the inclusion of the predictability index in the regression model, would under the aforementioned assumptions be given by equation 1: S v [X 1 ,X 2 ] and S v [X 1 ] were calculated via SPSS by calculating P[Z y ³0 and Z x ³1,64485]/P[Z x ³1,64485] for the two validity coefficients, assuming multivariate normality. The addition of the predictability index (X 2 ) to the basic regression model would therefore, under the abovementioned scenario, result in an approximate 12% increase in the percentage selectees successful. This percentage would increase if larger increases in the validity coefficient could be affected.
The improvement in the mean standardized criterion performance of the selected group affected by the inclusion of the predictability index in the regression model, assuming a selection ratio of f = 0,05, will under the abovementioned scenario be given (in standard deviation units) by equation 2: The addition of the predictability index (X 2 ) to the basic regression model would therefore, under the abovementioned scenario, result in an increase in average performance of approximately 0,43 standard deviation units. This might seem rather trivial but when extrapolated over selectees, time periods, and when multiplied by the performance unit value of one standard deviation, could amount to an impressive quantity.
The R-c value of the improvement in the mean standardized criterion performance of the selected group affected by the addition of the predictability index to the basic regression model is be given (in R-c) by equation 3: DU = TN s R(Y, E[Y|X 1 ,X 2 ]SD y (l/f) -(C 1 +C 2 )N a -TN s r(Y,X 1 )SD y (l/f) -C 1 N a = TN s SD y (l/f)(R(Y, E[Y|X 1 ,X 2 ] -r(Y,X 1 )) -N a (C 2 + 2C 1 ) Where: DU = the increase in utility due to the addition of the predictability index; T = the average predicted tenure of the selected applicants; N s = the number of people selected for a position using a selection battery to which the index computed in the study has been added; R(Y, E[Y|X 1 ,X 2 ] = the correlation coefficient obtained by adding the index to a selection battery already containing the ability predictor; SD y = the standard deviation of the criterion distribution expressed in a R-c metric; l = the height of the ordinate cutting off an area under the standardised normal distribution corresponding to a selection ratio f; f = the selection ratio; C 1 = the per applicant cost for the Apil; r(Y,X 1 ) = the validity coefficient of the basic regression model; and C 2 = the per applicant cost of the OPP.
The addition of the predictability index (X 2 ) to the basic regression model would therefore, under the abovementioned scenario, result in an increase in average performance worth R8 443 400-00 over the average tenure of 5 years. This is a somewhat overoptimistic estimate in as far as it fails to reflect the time value of future earnings and the tax liability higher performance earnings would imply (Cascio, 1991). The estimate, in conjunction with the other two utility estimates, nonetheless provides support for postulate 2.
To illustrate the linear relationship between the increase in validity affected by the predictability index and utility, equation 3 has been solved for a range of possible values for SD y and R(Y, E[ Y|X 1 ,X 2 ], while fixing the remaining utility parameters at their initially chosen values. Schmidt and Hunter's (in Cascio, 1991) estimate of the standard deviation of the criterion distribution expressed in a R-c metric as 40 % of annual salary was varied with five percent up and down, resulting in the use of three values, i.e. 35%, 40% and 45%. The value of R(Y, E[Y|X 1 ,X 2 ] was essentially varied in steps of 0,10 (see Table 10 and Figure 5). Figure 5: Incremental utility as a function of R(Y, E[Y|X 1 ,X 2 ] and SD y Figure 5 illustrates the resultant increase in the monetary utility as the correlation coefficient R(Y, E[Y|X 1 ,X 2 ] increased from 0,416 , as well the acceleration in the increase in the utility when the standard deviation of the criterion distribution expressed in a R-c metric increased from 35% of annual salary to 40% to 45%.

DISCUSSION
The main findings of this study regarding the development of a predictability index are fourfold. It is possible to develop a predictability index, which correlates with the real, algebraic residuals derived from the regression of a criterion on one or more predictors. The addition of such a predictability index to the original regression model can produce a significant increase in the correlation between the selection battery and the criterion. This increase can trigger a substantial and useful increase in the utility of the selection battery. The potential benefits especially apply to companies selecting large numbers of employees per year at small selection ratios from even larger applicant pools. Although it is possible to develop a predictability index, which correlates with the absolute residuals derived from the regression of a criterion on one or more predictors, the addition of such a predictability index to the original regression model does not produce a significant increase in the correlation between the selection battery and the criterion.
To be able to convincingly demonstrate the feasibility of enhancing selection utility through the use of predictability indices would require the cross validation of the results obtained on a derivation sample on a holdout sample selected from the same population. The following two vital issues are at stake. The predictability index, developed on the derivation sample should still correlate significantly with the real, algebraic residuals obtained from fitting a new basic regression model on a representative holdout sample taken from the same population. Furthermore, the addition of the predictability index, developed on the derivation sample, to the holdout regression model should still significantly explain unique variance in the criterion measure that is not explained by the predictor(s) in the basic model. The first aspect is probably the Achilles heel of the proposed procedure. If the predictability index developed on the derivation sample would succeed in predicting the real prediction errors made by a newly fitted regression model on a second sample taken from the same population, then the second issue most likely will not present a problem. This study failed to investigate these two rather crucial aspects due to the limited size of the data set it had at its disposal. There is, moreover, a related question, which this study also failed to investigate. More in line with traditional cross validation of regression equations the question also arises to what extent the expanded regression model developed on the derivation sample would accurately predict the criterion when applied on the holdout sample data. In terms of the eventual regular use of predictability indices in selection research this clearly is an important issue.
The possibility of using bootstrapping to solve the problem of finding large enough initial samples to allow the division into derivation and holdout samples should be considered (Diaconis & Efron, 1983;Efron, 1982;Efron & Tibshirani, 1993). This procedure seems to present a feasible way of investigating the first two issues mentioned above. Whether it presents a solution to the more traditional cross validation problem seems somewhat more debatable.
Predictability indices most likely are highly situation specific. Each prediction model would most likely require the development of a unique predictability index. The fact that it was possible a predictability index for one prediction model does not necessarily mean it would practically be possible to do so for another. The question, therefore, also arises how common the occurrence of successful predictability index development actually is? Moreover it is not clear whether any criteria should be set for the type of donor predictor that would increase the likelihood of finding suitable donor items, and if so, what these criteria should be?