Technology-based simulation exercises are popular assessment measures for the selection and development of human resources.

The primary goal of this study was to investigate the construct validity of an electronic in-basket exercise using computer-based simulation technology. The secondary goal of the study was to investigate how re-sampling techniques can be used to recover model parameters using small samples.

Although computer-based simulations are becoming more popular in the applied context, relatively little is known about the construct validity of these measures.

A quantitative

Support was not found for the entire model, but only for one of the dimensions, namely, the Interaction dimension. Multicollinearity was found between most of the dimensions that were problematic for factor analyses.

This study holds important implications for assessment practitioners who hope to develop unproctored simulation exercises.

This study aims to contribute to the existing debate regarding the validity and utility of assessment centres (ACs), as well as to the literature concerning the use of technology-driven ACs. In addition, the study aims to make a methodological contribution by demonstrating how re-sampling techniques can be used in small AC samples.

A number of selection instruments are available for the selection of personnel. They include personality questionnaires, targeted interviews, situational interviews, situational judgement tests, aptitude and ability tests, previous job roles and simulated exercises. Some of these instruments are more effective than others, while some have higher predictive validity than others (Sackett, Lievens, Van Iddekinge, & Kuncel,

This may be the reason why organisations still employ ACs when selecting and developing their employees. Although literature supports the strong link between standardised tests of general mental ability and job performance, behaviour-based assessment provides a richer and more nuanced view of managerial potential (Lievens & Thornton,

Despite the large-scale application of electronic in-baskets, applied research has not kept up with the prolific changes in industry. Relatively little is still known regarding the internal structure of electronic in-baskets compared to traditional in-baskets. More specifically, are the construct-related problems associated with traditional ACs still problematic within the technology-enabled simulations? The current study aims to answer these research questions through the investigation of a large-scale electronic in-basket used for development purposes. Finally, the study demonstrates how re-sampling techniques can be used to augment small samples that typically plague AC research. Bias-corrected bootstrapped confidence intervals and Monte Carlo re-sampling strategies were used to produce parameter estimates from empirically derived bootstrapped confidence intervals.

The popularity of ACs in practice stems largely from the large body of literature that supports the link between dimension ratings and on-the-job performance (Arthur et al.,

The consistent finding that exercise effects dominate AC ratings has prompted numerous researchers to put forth plausible explanations for the findings. Lievens (

In light of these persistent mixed findings, three notable large-scale reviews dealing with construct-related validity have aimed to find solutions for the statistical challenges facing AC research using Confirmatory Factor Analysis techniques (Bowler & Woehr,

Given the mixed findings it is difficult to anticipate which source of variance will be dominant in the current investigation. This may also not be the most important question. Rather, judgements regarding the validity of ACs are largely dependent on the design intention of the simulation. If the AC was developed to tap into a distinct aspect of that dimension’s construct space across exercises, finding strong support for dimension effects will probably underscore the construct validity of the assessment. However, if the simulation was developed to tap into different elements of the dimensions across exercises, finding strong dimension factors at the expense of systematic dimension–exercise interaction effects would not be evidence of construct validity. Based on the fact that the current AC was designed to tap dimension-specific behavioural consistency across exercises, the construct validity of the assessment should rightly be investigated from the perspective of correlated dimensions.

Although the debate has recently moved beyond the exercise-versus-dimension debate (Lievens & Christiansen,

An additional problem with AC research is that samples are typically small because of the cost of administration. Normally, ACs are administered to a small number of participants at the end of a multiple-hurdle assessment approach. The lack of construct validity, when defined as cross-exercise dimension-based behaviour congruence, may be explained in part by the lack of statistical power associated with the small sample sizes. Modern multivariate statistical techniques, especially confirmatory factor analytical approaches, require large and normally distributed data (Byrne,

The main research objective of this study was to design and implement computer-based simulation technology (CBST) as an electronic in-basket exercise (depicting the day-to-day activities of a supervisor) in an assessment development centre (ADC) for a major manufacturing enterprise in the United States. The goal of the ADC was to identify leadership potential – those individuals who may be ready to be promoted to higher levels in the organisation. This process incorporated group and individual online simulation exercises, which were used to measure behavioural and organisational competencies and performance areas on strategic and tactical levels. A secondary goal of the study was to investigate if re-sampling techniques can be used to assess model fit and to estimate confidence intervals around model parameters. Thus, the overarching research question can be described as follows:

Based on the foregoing research question, the primary objectives of this study are to:

examine the construct validity of the CBST in-basket exercise;

demonstrate the use of re-sampling techniques in evaluating the quality of model parameters.

Recent research in employee selection has shifted the focus from traditional selection paradigms to more dynamic and flexible delivery methods. This is mainly driven by the higher fidelity of technology-enabled platforms and their associated cost savings. There is an increased interest in different selection methods such as situational judgement tests and the role of technology and the Internet in recruitment and selection. Social networking websites and video résumés have become part of selection procedures (Nikolaou, Anderson, & Salgado,

This study made use of a computer-delivered in-basket exercise. By its very nature we can think of this assessment as a situational judgement test (SJT) rather than an AC because the assessment was made up of only a single exercise. Based on best practice guidelines for the use of the AC method in South Africa (Meiring & Buckett,

The electronic in-basket contained a computer-based in-basket exercise with multiple case studies, some of which had open-ended response formats while others made use of machine-driven scoring options. The scoring key was developed by an independent team of behavioural experts in collaboration with line managers and human resources in the given organisation. The response options reflect the desired behaviours of the supervisor in degrees of appropriateness (1 – least appropriate to 5 – most appropriate). Most of the responses were machine scored. There were some open-ended sections in the in-basket exercise that required direct input from the respondents. These responses were scored by a team of trained behavioural experts. The final overall assessment rating was a weighted combination of the scores achieved in the two sections of the same in-basket exercise. Thus, the in-basket exercise complies with the criteria for a traditional AC, at least as far as data integration is concerned, although only one simulation exercise was used.

As the acceptance and widespread use of competency-based assessments have increased in the last two decades, various interest groups have published practice and research guidelines (International Task Force on Assessment Centre Guidelines,

The popularity of ACs is because of their many strengths, including that they have little adverse impact and predict a variety of performance criteria (Thornton & Rupp,

However, research evidence concerning the internal structure of ACs shows much less support for the construct validity of AC dimensions

In the context of ACs, the International Task Force (

Assessment centre research has largely used multitrait–multimethod (MTMM) analyses as a framework for the analysis of the internal structure of AC ratings (Campbell & Fiske,

Within the AC literature, Sackett and Dreher (

Studies focusing on dimension-based ACs (DBACs) indicate that results relating to dimensions across exercises are most meaningful in decisions pertaining to candidates. The dimension-based focus is the most commonly used, most commonly researched and most commonly discussed AC perspective (Thornton & Rupp,

More recently, authors have argued that cross-exercise correlations of dimensions will remain elusive because behaviour is exercise-dependent (Hoffman,

Because the construct validity of ACs is largely concerned with the internal structure of the AC and is related to either a task-based orientation or a dimension-based orientation, Campbell and Fiske’s (

Despite the frequent use of MTMM matrices, Hoffman (

More recently, approaches such as generalisability theory and other variance decomposition approaches have been used to investigate the relative contributions and interactions of assessor, dimension or exercise variance as sources of legitimate variance in AC ratings (Bowler & Woehr,

Nontheless, the CFA approach has dominated investigations of the internal construct validity of ACs and is ideal for large samples of data (

However, ACs have generally been plagued by small samples (Lievens & Christiaansen,

This method was independently tested by Hoffman, Melchers, Blair, Kleinmann and Ladd (

The Monte Carlo family of re-sampling techniques may be fruitfully used to test the appropriateness of model parameters, standard errors, confidence intervals and even fit indices under various assumptions. Because of the low statistical power in small samples, standard errors may be overestimated, which may lead to significant effects being missed. In contrast, if standard errors are underestimated, significant effect may be overstated (Muthén & Muthén,

Monte Carlo features include saving parameter estimates from the analysis of real data to be used as population and/or coverage values for data generation in a Monte Carlo simulation study. Monte Carlo simulations involve identifying a mathematical model of the activity or process to be researched and defining the parameters such as mean and standard deviation for each factor in the model (Lance et al.,

An alternative re-sampling technique known as residual bootstrapping (Bollen & Stine,

Bootstrapping complements traditional confidence intervals by estimating standard errors of parameter estimates over a large number of hypothetical sample draws (Bollen & Stine,

In the literature review, we focused on the historical debate regarding the construct validity and internal structure of ACs. Internet-delivered simulated exercises closely resemble the features of traditional sample-based assessment, yet the delivery and scoring platform differs significantly. However, relatively little is known about the internal structure of electronic simulations in general and in-baskets in particular. Thus, the overarching goal of the study is to assess the internal structure of an electronic in-basket. Furthermore, the potential benefits of re-sampling techniques were discussed in the context of ACs. The section concluded with a discussion of re-sampling techniques and the use of Monte Carlo and bootstrapping techniques.

A non-experimental, quantitative research design was used in the current study to empirically test the main research objectives. More specifically, an

Initially, the data were screened for multivariate outliers and out of range responses. Descriptive statistics were generated to investigate the distribution and central tendency of PEDR scores for each of the competency dimensions. Inferential statistics were generated by specifying a confirmatory factor analytical model. The internal structure of the electronic in-basket can be operationalised through the specification of fixed and freely estimated model parameters.

More specifically, the CBST measurement model can be defined in terms of a set of measurement equations, expressed in matrix algebra notation (see

Where:

Λ_{X} is a 19 × 5 matrix of factor loadings;

In addition, all the off-diagonal elements of the phi covariance matrix, denoting the covariance between the five latent competencies, were freed up to be estimated. Model parameters of the CFA model were estimated using maximum likelihood with robust standard errors and fit indices because of the non-normality of the sample data. For identification purposes, each of the five latent competencies was standardised and all error variances were specified to be uncorrelated. Fit indices and model parameters were estimated using Mplus 7.2 (Muthén & Muthén, ^{2}, the Comparative Fit Index (CFI; Bentler,

A convenience sample of 89 supervisors were selected in a non-random fashion from a large multinational manufacturing organisation operating in the petroleum and rubber industry in North America. The sample was selected from incumbent supervisors, who were earmarked to partake in a larger leadership developmental programme in the organisation. The first step in the development programme was to complete the CBST to gain more insight into the strengths and development areas of each supervisor.

Six broad competencies were identified by the client organisation for inclusion in the CBST based on their proposed link to job performance as identified through the job analysis process. An external consulting organisation was contracted to develop the behavioural indicators and scoring method for each competency. A summary of the six meta-competency clusters and sub-dimensions is presented in

Meta-competencies and sub-dimensions.

Meta-competencies | Sub-dimensions | Abbreviation |
---|---|---|

Vision | Visionary thinking | VIS_VT |

Strategic orientation | VIS_SO | |

Innovation | VIS_INN | |

Leading change | VIS_LS | |

Drive | Initiative | DRI_INI |

Leading and steering | DRI_LS | |

Self-determination | DRI_SD | |

Passion and commitment | DRI_PAS | |

Execution | Problem-solving | EXE_PS |

Decision-making | EXE_DM | |

Delivering results | EXE_DR | |

Assertiveness | EXE_ASS | |

Entrepreneurship | Customer orientation | ENT_CUST |

Profit orientation | ENT_PROF | |

Quality orientation | ENT_QUAL | |

Integrity (in business) | ENT_INT | |

Learning | Building up business acumen | LRN_BA |

Self-reflection | LRN_SR | |

Handling feedback | LRN_HF | |

Coaching others | LRN_CO | |

Interaction | Clear and open communication | INT_COM |

Networking | INT_NET | |

Fostering teamwork | INT_FT | |

Motivating others | INT_MO | |

Intercultural sensitivity | INT_IS | |

Promoting diversity | INT_PD |

Because all the meta-competencies were operationalised within the online in-basket, there was only one simulation format. For this reason, the current research cannot be regarded as an AC because competencies were not measured within multiple exercises (Meiring & Buckett,

The electronic in-basket was scored using a combination of multiple-choice machine scoring and manual scoring by trained raters of the open-ended video vignettes. The scores were integrated according to equal weighted averages for open-ended and multiple-choice response options. The open-ended responses were scored by a team of trained behavioural experts, who examined the responses in accordance with the conceptual definitions. The assessors attended frame-of-reference training to accurately observe, record, classify and assess the responses to the open-ended questions. During the training session, examples were provided with a range of responses, ranging from appropriate to less appropriate behavioural examples, and how to use the five point behaviourally anchored rating scale (BARS) to assess responses. As part of the training, all assessors had to complete the CBST.

In this regard, the simulated electronic in-basket complied with the criteria of traditional ACs insofar as each competency was observed and scored by multiple raters and integrated into an overall score. Because all the competencies were operationalised in a single simulation format, the in-basket cannot be regarded as a traditional AC. However, we believe that the results of the study hold important implications for sample-based assessment, and specifically for those simulations that are delivered on an electronic platform. Completion of the computer-based simulation in-basket exercise took around 40 min. All participants completed the task in the allocated time. For this reason, there were no missing values in the data.

Managers who participated in the AC were identified for future promotion because of strong performance and competence in their incumbent positions. Because the purpose of the AC was for development, all the managers who participated in the study consented to partake in the AC. All participants were informed that their data may be used for research purposes. The identity of all participants was kept anonymous by converting the raw data into an encrypted file that was shared with the researchers. Thus, the final dataset contained no personal information other than the race, age and gender of the participants.

The internal structure of the CBST was assessed by specifying a confirmatory factor analytical model with Mplus 7.2 (Muthén & Muthén,

In addition, it was important to evaluate the overall fit of the proposed model to the observed data. If strong support was found for the model parameters and overall fit of the model to the data, it would be possible to conclude that the CBST has construct validity and may be used for diagnostic and selection purposes (Lievens & Christiansen,

Monte Carlo simulations were used to extract 1000 simulated datasets with model statistical characteristics similar to the sample data. This approach used in the current study can be regarded as an external Monte Carlo study, insofar as parameters saved from the real data analyses are used for population values for the simulated data. Thus, a two-step approach is used to calculate the model parameters and then to use these values as input to generate data in step 2. The fact that the simulated data use the model parameters estimated from the real data may not be sufficient to capture the non-normality in the simulated data. However, when working with skew data, the robust maximum likelihood estimation (MLE) can be used.

This may be particularly important when examining the critical value chi-square fit statistic in the parent and simulated samples (Curran, West, & Finch,

In addition, the Bollen–Stine bootstrap (residuals bootstrap) produces a correct bootstrapped sampling distribution for chi-square, and thus a correct bootstrapped

The Bollen–Stine bootstrap can be used to correct for standard error and fit statistical bias that occur in SEM applications because of non-normal data. Bollen and Stine (

It is also possible to use confidence intervals and bootstrapping to gain greater confidence in findings. This involves investigating ‘real’ sampling variability without assuming specific distribution for the data. Bollen–Stine bootstrap (residuals bootstrap) produces correct bootstrapped sampling distribution for chi-square, and thus correct bootstrapped

In each case, the results obtained from the original data were compared to the results generated with the Monte Carlo simulations and Bollen–Stine bias-corrected bootstrapping. The comparative results may contribute to the AC literature by demonstrating the utility of re-sampling techniques when working with relatively small sample sizes. It is important to emphasise that the re-sampling methods are not a `silver bullet’ for small sample sizes, as any sampling error contained in the sample from which re-sampling is drawn will be included in the bootstrapped sample (Enders,

This article followed all ethical standards for a research without direct contact with human or animal subjects.

The primary objective of this study was to examine the validation of a CBST in-basket exercise within an ADC. This objective involves proving the behavioural validity of the workplace simulation. It further implies that if construct validity is intact, then the exercises comply with the principles of the ADC and may lead to valid development and selection decisions.

The results of the study are discussed according to the following structure:

frequencies and descriptive statistics;

screening the data;

examining the appropriateness of the data for multivariate CFA;

specification and estimation of CFA model;

evaluating the model according to goodness-of-fit indices;

evaluating the model according to model parameters;

using Monte Carlo estimates;

using bootstrap (BS) bias-corrected bootstraps.

As with other multivariate linear statistical procedures, CFA requires that certain assumptions must be met with regard to the sample. Therefore, prior to formally fitting the CFA model to the data, the assumptions of multivariate normality, linearity and adequacy of variance were assessed. In general, no serious violations of these assumptions were detected in the data. However, the data did not follow a multivariate normal distribution and therefore robust maximum likelihood (RML) was specified as the estimation technique. Basic descriptive statistics were generated to assess the variability and central tendency of PEDRs. The means and standard deviations of PEDRs are presented in

Descriptive statistics.

Meta-competencies | Abbreviation | Mean | Standard deviation | |
---|---|---|---|---|

Interaction | INT_PD | 89 | 2.280 | 0.3792 |

INT_IS | 89 | 2.285 | 0.3859 | |

INT_MO | 89 | 2.774 | 0.4575 | |

INT_FT | 89 | 2.774 | 0.4575 | |

INT_NET | 89 | 2.720 | 0.3720 | |

INT_COM | 89 | 3.016 | 0.6144 | |

Learning | LRN_CO | 89 | 2.774 | 0.4575 |

LRN_HF | 89 | 2.774 | 0.4574 | |

LRN _SR | 89 | 2.613 | 0.4179 | |

LRN_BA | 89 | 3.484 | 0.6144 | |

Entrepreneurship | ENT_INT | 89 | 2.887 | 0.6085 |

ENT_QUAL | 89 | 2.608 | 0.4160 | |

ENT_PROF | 89 | 2.382 | 0.5129 | |

ENT_CUST | 89 | 2.608 | 0.4160 | |

Execution | EXE_ASS | 89 | 3.446 | 0.6188 |

EXE_DR | 89 | 3.484 | 0.6144 | |

EXE_DM | 89 | 3.484 | 0.6144 | |

EXE_PS | 89 | 2.849 | 0.4650 | |

Drive | DRI_PAS | 89 | 3.059 | 0.6463 |

DRI_SD | 89 | 3.059 | 0.6463 | |

DRI_LS | 89 | 2.780 | 0.4512 | |

DRI_INI | 89 | 2.565 | 0.4676 | |

Vision | VIS_LC | 89 | 2.882 | 0.6097 |

VIS_INN | 89 | 2.575 | 0.4420 | |

VIS_SO | 89 | 2.790 | 0.6269 | |

VIS_VT | 89 | 2.785 | 0.6273 |

INT_PD, Promoting Diversity; INT_IS, Intercultural Sensitivity; INT_MO, Motivating Others; INT_FT, Fostering Teamwork; INT_NET, Networking; INT_COM, Clear and Open Communication; LRN_CO, Coaching Others; LRN_HF, Handling Feedback; LRN_SR, Self-Reflection; LRN_BA, Building up Business Acumen; ENT_INT, Integrity (In Business); ENT_QUAL, Quality Orientation; ENT_PROF, Profit Orientation; ENT_CUST, Customer Orientation; EXE_ASS, Assertiveness; EXE_DR, Delivering Results; EXE_DM, Decision-Making; EXE_PS, Problem-Solving; DRI_PAS, Passion and Commitment; DRI_SD, Self-Direction; DRI_LS, Leading and Steering; DRI_INI, Initiative; VIS_LS, Leading Change; VIS_INN, Innovation; VIS_SO, Strategic Orientation; VIS_VT, Visionary Thinking.

The results in

When the total CBST model was specified as CFA model, MPLUS issued a warning that the sample covariance matrix may be singular and that the model could not converge. Based on the singular covariance matrix, it was impossible to specify and assess the total CBST. One possible remedy would be to collapse highly correlated sub-dimensions into broader competencies. In previous studies, Hoffman et al. (

Collapsing dimensions into broader competencies may make sense from a theoretical and methodological perspective. From a methodological perspective, treating dimension scores (PEDRs) as indicators of broader dimensions will increase the indicator to dimension ratio. Monahan et al. (

In

Bivariate correlations of the Interaction meta-competency.

Variable | INT_PD | INT_IS | INT_MOT | INT_FT | INT_NET | INT_COM |
---|---|---|---|---|---|---|

INT_PD | 1 | 0.954 |
0.603 |
0.603 |
0.502 |
0.715 |

INT_IS | 0.954 |
1 | 0.584 |
0.584 |
0.504 |
0.634 |

INT_MO | 0.603 |
0.584 |
1 | 1.000 |
0.663 |
0.632 |

INT_FT | 0.603 |
0.584 |
1.000 |
1 | 0.663 |
0.632 |

INT_NET | 0.502 |
0.504 |
0.663 |
0.663 |
1 | 0.626 |

INT_COM | 0.715 |
0.634 |
0.632 |
0.632 |
0.626 |
1 |

INT_ PD, Interaction Promoting Diversity; INT_ COM, Interaction Clear and Open Communication; INT_ NET, Interaction Networking; INT_ FT, Interaction Fostering Teamwork; INT_ MOT, Interaction Motivating Others; INT_ IS, Intercultural Sensitivity.

, Correlation is significant at the 0.01 level (2-tailed).

Revised Interaction meta-competency bivariate correlations.

Variable | INT NET | INT PDIS | INT MOFT | INT COM |
---|---|---|---|---|

INT_NET | 1 | 0.708 |
0.671 |
0.714 |

INT_PDIS | 0.708 |
1 | 0.613 |
0.651 |

INT_MOFT | 0.671 |
0.613 |
1 | 0.690 |

INT_COM | 0.714 |
0.651 |
0.690 |
1 |

INT_NET, Interaction Networking; INT_PDIS, Interaction Promoting Diversity and Intercultural Sensitivity; INT_MOFT, Interaction Motivating Others and Fostering Teamwork; INT_COM, Interaction Communication.

, Correlation is significant at the 0.01 level (2-tailed).

Because non-normal data can lead to bias fit indices and standard errors in the simulated data when using Monte Carlo, the normality of the observed variables was assessed with SPSS (Version 25, IBM,

Skewness of observed variables for the revised Interaction dimension.

Variable | Skewness | Standard error | Statistic |
---|---|---|---|

INT_NET | 0.366 | 0.255 | 1.43 |

INT_PDIS | −0.077 | 0.255 | 0.30 |

INT_MOFT | 0.640 | 0.255 | 2.50 |

INT_COM | −0.047 | 0.255 | 0.18 |

INT_NET, Interaction Networking; INT_PDIS, Interaction Promoting Diversity and Intercultural Sensitivity; INT_MOFT, Interaction Motivating Others and Fostering Teamwork; INT_COM, Interaction Communication.

Results from

The correlations in

Goodness-of-fit indices for the revised interaction dimension.

Variable | Category | Value |
---|---|---|

Chi-square test of model fit | Value | 2.553 |

Degrees of freedom | 2.000 | |

0.279 | ||

Scaling correction factor for MLR | 0.982 | |

RMSEA | Estimate | 0.056 |

90% CI | 0.000–0.226 | |

Probability RMSEA £0.05 | 0.031 | |

CFI | 0.997 | |

TLI | 0.991 | |

SRMR | Value | 0.011 |

RMSEA, root mean square error of approximation; SRMR, standardised root mean square residual; CI, confidence interval; CFI, comparative fit index; TLI, Tucker-Lewis Index.

The overall model fit can be regarded as satisfactory based on the criteria and cut-off rules reported in the methodology section. The CFI and Tucker-Lewis Index (TLI) were in excess of 0.95, and the RMSEA and SRMR are close to the normative cut-off value of 0.05.

The unstandardised and standardised results demonstrated that most of the model parameters were indicative of good model fit. This provides further support for the revised Interaction measurement model. A summary of the model parameters is presented in

Unstandardised and standardised parameter estimates of the revised Interaction dimension.

Variable | Model results | Estimate | SE | Est/SE | Two-tailed |
---|---|---|---|---|---|

Two-tailed | |||||

INT_MOFT | 0.947 | 0.081 | 11.684 | 0.000 | |

INT_NET | 0.467 | 0.043 | 10.853 | 0.000 | |

INT_COM | 0.375 | 0.054 | 6.940 | 0.000 | |

INT_PDIS | 0.913 | 0.098 | 9.294 | 0.000 | |

INT_NET | 2.742 | 0.051 | 53.817 | 0.000 | |

INT_COM | 2.798 | 0.051 | 54.519 | 0.000 | |

INT_PDIS | 5.461 | 0.111 | 49.065 | 0.000 | |

INT_MOFT | 5.584 | 0.103 | 54.351 | 0.000 | |

INT | 1.000 | 0.000 | 999.000 | 999.000 | |

INT_NET | 0.013 | 0.004 | 3.256 | 0.001 | |

INT_COM | 0.094 | 0.025 | 3.707 | 0.000 | |

INT_PDIS | 0.268 | 0.090 | 2.963 | 0.003 | |

INT_MOFT | 0.042 | 0.020 | 2.117 | 0.034 | |

Standardised model results: Standardisation | |||||

INT_MOFT | 0.977 | 0.011 | 88.631 | 0.000 | |

INT_NET | 0.972 | 0.010 | 96.948 | 0.000 | |

INT_COM | 0.775 | 0.068 | 11.356 | 0.000 | |

INT_PDIS | 0.870 | 0.040 | 21.866 | 0.000 | |

INT_NET | 5.705 | 0.476 | 11.984 | 0.000 | |

INT_COM | 5.779 | 0.489 | 11.823 | 0.000 | |

INT_PDIS | 5.201 | 0.500 | 10.405 | 0.000 | |

INT_MOFT | 5.761 | 0.455 | 12.665 | 0.000 | |

INT | 1.000 | 0.000 | 999.000 | 999.000 | |

INT_NET | 0.054 | 0.020 | 2.787 | 0.005 | |

INT_COM | 0.400 | 0.106 | 3.783 | 0.000 | |

INT_PDIS | 0.243 | 0.069 | 3.511 | 0.000 | |

INT_MOFT | 0.045 | 0.022 | 2.064 | 0.039 |

SE, standard error; Est/SE, Estimate divided by standard error; INT, Interaction; COM, Clear and Open Communication; NET, Networking; FT, Fostering Teamwork; MOT, Motivating others; IS, Intercultural Sensitivity; PD, Promoting Diversity.

, Correlation is significant at the 0.01 level (2-tailed).

The results in

Mean, standard deviation, critical value of chi-square fit index across 1000 draws.

Variable | Expected | Observed | Value |
---|---|---|---|

Proportions | 0.990 | 0.992 | - |

0.980 | 0.984 | - | |

0.950 | 0.961 | - | |

0.900 | 0.913 | - | |

0.800 | 0.803 | - | |

0.700 | 0.711 | - | |

0.500 | 0.503 | - | |

0.300 | 0.300 | - | |

0.200 | 0.201 | - | |

0.100 | 0.099 | - | |

0.050 |
0.051 |
- | |

0.020 | 0.024 | - | |

0.010 | 0.008 | - | |

Percentiles | 0.020 | 0.030 | - |

0.040 | 0.054 | - | |

0.103 | 0.128 | - | |

0.211 | 0.242 | - | |

0.446 | 0.448 | - | |

0.713 | 0.765 | - | |

1.386 | 1.395 | - | |

2.408 | 2.391 | - | |

3.219 | 3.222 | - | |

4.605 | 4.565 | - | |

5.991 |
6.065 |
- | |

7.824 | 8.273 | - | |

9.210 | 9.082 | - | |

Degrees of freedom | - | - | 2.000 |

Mean | - | - | 2.008 |

Standard deviation | - | - | 1.957 |

Number of successful computations | - | - | 1000 |

, The probability that the chi-square value exceeds the critical percentile value.

Mean, standard deviation, critical value of Root Mean Square Error of Approximation fit index across 1000 draws.

Variable | Expected | Observed | Value |
---|---|---|---|

Proportions | 0.990 | 1.000 | - |

0.980 | 1.000 | - | |

0.950 | 1.000 | - | |

0.900 | 1.000 | - | |

0.800 | 1.000 | - | |

0.700 | 0.359 | - | |

0.500 | 0.328 | - | |

0.300 | 0.258 | - | |

0.200 | 0.208 | - | |

0.100 | 0.139 | - | |

0.50 |
0.095 |
- | |

0.020 | 0.057 | - | |

0.010 | 0.040 | - | |

Percentiles | −0.039 | 0.000 | - |

−0.032 | 0.000 | - | |

−0.023 | 0.000 | - | |

−0.015 | 0.000 | - | |

−0.002 | 0.000 | - | |

−0.005 | 0.000 | - | |

−0.015 | 0.000 | - | |

−0.026 | 0.020 | - | |

−0.034 | 0.035 | - | |

−0.044 | 0.051 | - | |

−0.052 |
0.064 |
- | |

−0.061 | 0.079 | - | |

−0.068 | 0.084 | - | |

Mean | - | - | 0.014 |

Standard deviation | - | - | 0.023 |

Number of successful computations | - | - | 1000 |

, The probability that the RMSEA value exceeds the critical value.

Similar results are displayed in

The critical RMSEA value of 0.052 is exceeded in approximately 9.5% of the 1000 replications. Although the mean RMSEA value in the simulated data is indicative of good fit (0.014), the relatively large deviation between the expected and observed proportions containing the critical value raises concern regarding the approximate fit of the original CFA model, given the results of the Monte Carlo simulation.

The standard error, Monte Carlo-derived standard error, average standard deviation and average coverage values are presented in

Average population estimations with Mean Square Error, 95% coverage, proportion of replications equal to zero under H0, and parameter bias.

Model results: Estimates | Population | Average | Standard deviation | SE average | MSE | 95% Coverage | Percentage significant coefficients | Bias |
---|---|---|---|---|---|---|---|---|

INT_NET | 0.467 | 0.3498 | 0.3078 | 0.0156 | 0.1084 | 0.825 | 1.000 | 25% |

INT_COM | 0.375 | 0.2814 | 0.2478 | 0.0181 | 0.0701 | 0.819 | 1.000 | 24% |

INT_PDIS | 0.913 | 0.6850 | 0.6025 | 0.0369 | 0.4146 | 0.834 | 1.000 | 25% |

INT_MOFT | 0.947 | 0.7092 | 0.6243 | 0.0312 | 0.4459 | 0.830 | 1.000 | 25% |

INT_NET | 2.742 | 2.7414 | 0.0214 | 0.0214 | 0.0005 | 0.953 | 1.000 | 0.02% |

INT_COM | 2.798 | 2.775 | 0.0209 | 0.0216 | 0.0004 | 0.958 | 1.000 | 0.80% |

INT_PDIS | 5.461 | 5.4594 | 0.0458 | 0.0468 | 0.0021 | 0.955 | 1.000 | 0.02% |

INT_MOFT | 5.584 | 5.5828 | 0.0421 | 0.0432 | 0.0018 | 0.946 | 1.000 | 0.02% |

INT | 1.000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 1.000 | 0.000 | |

INT_NET | 0.013 | 0.0129 | 0.0018 | 0.0018 | 0.0000 | 0.942 | 1.000 | 0.76% |

INT_COM | 0.094 | 0.0937 | 0.0061 | 0.0061 | 0.0000 | 0.943 | 1.000 | 0.31% |

INT_PDIS | 0.268 | 0.2659 | 0.0185 | 0.0181 | 0.0003 | 0.942 | 1.000 | 0.78% |

INT_MOFT | 0.042 | 0.0417 | 0.0068 | 0.0069 | 0.0000 | 0.957 | 1.000 | 0.71% |

INT_NET, Interaction Networking; INT_COM, Interaction Communication; INT_PDIS, Interaction Promoting Diversity and Intercultural Sensitivity; INT_MOFT, Interaction Motivating Others and Fostering Teamwork.

Against this background, the information in

Next, we discuss the results from the Bollen–Stine residual bootstrapped standard errors and bias-corrected confidence intervals generated with regard to the Interaction sub-scale with 1000 bootstrap draws. The intention of this analysis is to provide valid inferences from the sample data to some large universe of potential data; in other words, to provide information about the population from statistics generated with random smaller samples. Because it would be virtually impossible to obtain access to random samples from populations that have the same characteristics as the larger population, statistical methods have been developed to determine the confidence with which such inferences can be drawn, given the characteristics of the available sample (Cohen, Cohen, West, & Aiken,

Bootstrapping and other re-sampling techniques complement traditional confidence intervals by estimating standard errors of parameter estimates over a large number of hypothetical sample draws (Hancock & Nevitt,

The results of the bias-corrected bootstrapping indicate that the 95% confidence intervals between model parameters are quite broad, which erodes confidence in the replication of specific point estimates in the population. In addition, the difference between the population parameter estimates and mean values recovered by Monte Carol draws, indicates substantial bias in the parameter estimates. The same conclusion can be reached with regard to the standard errors.

Considering all this information collectively, one would have to conclude that the construct validity evidence for the original CBST is limited. For example, the overall measurement model did not converge and was eventually abandoned because of high multicollinearity between dimension ratings. Consequently, only a small subsection of the measure was further investigated with additional analysis. Even these models of the Interaction dimension required extensive modification and manipulation before they showed acceptable fit to the data.

More supportive evidence for construct validity was found with regard to the revised Interaction meta-competency after the sub-dimensions of Motivating Others (INT_MO) and Fostering Teamwork (INT_FT), as well as the sub-dimensions of Promoting Diversity (INT_PD) and Intercultural Sensitivity (INT_IS) were combined. The newly combined sub-dimensions were labelled INT_PDIS (Promoting Diversity and Interpersonal Sensitivity) and INT_MOFT (Team Motivation). Theoretically it makes sense to group the dimension ratings of Promoting Diversity and Interpersonal Sensitivity, as well as Motivating Others and Fostering Teamwork.

The primary research objective was to examine the construct validity of an electronic in-basket using CBST technology. In the end, only a revised version of one of the six meta-competencies could be assessed. The results suggest that AC methodologies packaged in interactive software applications are not immune to the problems that face traditional sample-based assessments. Multicollinearity remains a particularly thorny issue, in part, because not enough consideration is awarded to the conceptual definition of competencies at the design stage. However, this problem does not seem to be unique to the current study. Hoffman et al. (

The lack of correspondence among the same dimension observations across AC exercises has often been regarded as problematic, and several innovative interventions have been proposed to remedy the problem. However, proponents of the exercise-centric ideology will highlight the link between exercise effects and criterion scores. However, recent studies suggest that ACs have been misspecified and as a result the contribution of dimensions have historically been underestimated in AC ratings. This holds important implications for practice because most AC applications are probably still expressed in dimension-centric discourse.

In the second round of data analysis, we investigated the model parameters by way of two re-sampling techniques, namely, Monte Carlo simulations and bias-corrected bootstrapping. Confidence intervals were provided from the bias-corrected draws to assess the variability of model parameters because of the calculated population standard errors. In accordance with the original results, we found the bootstrap confidence intervals to be quite wide and coverage levels below the suggested level of 0.90. This provided further support that the results should be interpreted with caution because estimates may be biased. More specifically, these techniques may provide AC scholars and practitioners with another set of tools to assess the validity of ratings, especially when samples are relatively small. We may have arrived at a different conclusion regarding the validity of the revised Interaction dimension, albeit not the whole in-basket, if these two re-sampling techniques had not been employed. In general, the CFA results of the revised Interaction model showed satisfactory fit, low residuals and robust factor loadings. However, the re-sampling techniques indicate that the results may not be trustworthy and may be because of type I errors. These two approaches provide valuable tools for AC practitioners and researchers who often have to conduct research with very small sample sizes.

Although the study provided a lot of useful findings, there are some conflicting results that need to be reported. One of the biggest limitations is that only one exercise type, namely, an in-basket, was used in the current study. This made the specification and estimation of method effects impossible. Typically, the size of the exercise effects provides important information regarding the functioning and internal structure of simulations. Based on best practice guidelines for the use of the AC method in South Africa, Meiring and Buckett (

Another limitation was the structure and design of the CBST. Ratings of competencies were reflected as PEDRs and not behavioural indicators. This greatly limited the number of data points to specify each of the meta-competencies. If meta-competencies were specified with behavioural indicators, the researcher could delete behavioural indicators that demonstrated collinearity, yet still measure the six meta-competencies. However, in the current study the researchers could only specify and evaluate the Interaction meta-competency because the other competencies had too few PEDR scores to combine into broader competencies.

An additional limitation of this study is that the performance ratings from managers, as well as the success of a follow-up supervisory development programme could not be investigated. As a result, the criterion validity of the six meta-competencies and job performance could not be investigated. It would have been interesting to see if differences on the meta-competencies translated into significant criterion-related differences.

The research value and contribution of this study can be best described by discussing multiple perspectives. From a practical perspective, this application of CBST demonstrates that faster, more accurate solutions exist for conducting ACs for the purposes of selection and development. From a theoretical perspective, the research results and learning points from the CBST in-basket exercise depict the real-life events of the manager and may act as a workplace or business simulation, which adds to the incremental validity of selection or development strategy. From a corporate perspective, the accelerating rate of change and the increasing uncertainty in the outcomes of change are evident across the whole business arena. This is enhanced by the increased demand for experienced talent. From a research perspective, there is considerable bias in model parameters when using small samples. However, bias-corrected bootstrapping techniques and Monte Carlo simulations may be used productively to evaluate the bias in model parameters.

This study set out to evaluate the construct validity of an electronic in-basket by investigating the internal structure of the exercise. The selection of competencies was based on job analyses and each of the six meta-competencies has a number of sub-dimensions. This design is similar to traditional AC exercises. The initial goal of the study was to assess the internal structure of the entire in-basket with a CFA methodology using MTMM matrices. However, initial statistical screening of the data suggested that a large number of dimensions were highly correlated and lacked discriminant validity. To remedy this problem, dimensions were collapsed whenever it made theoretical sense to do so. In the end, only one meta-competency, the Interaction dimension, could be evaluated with a CFA approach. The results showed that the proposed model fitted the sample data well.

However, results from the two re-sampling techniques suggested that the model parameters were contaminated by bias and may lead to invalid inferences. This study demonstrates how these two techniques can be used when using CFA approaches in small samples. Finally, the study demonstrates that for all the potential benefits associated with electronic- and Internet-delivered simulations, the

The authors declare that they have no financial or personal relationship(s) that may have inappropriately influenced them in writing this article.

J.B. wrote the article and conducted the statistical analyses. D.M. conceptualised the literature review and J.H.v.d.W. collected the data and wrote the original master’s thesis which forms the basis of the current article.

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Data sharing is not applicable to this article as no new data were created or analysed in this study.

The views expressed in this article are of the authors’ own, and do not represent the position of any related institutions or funding agencies. No funding was received by the authors during the process of completing the research contained in the current article.

Bivariate correlations between assessment centres indicators.

Assessment centres indicators | Bivariate correlations |
---|---|

INT-PD AND INT_IS | 0.954 |

INT_FT AND INT_MO | 1.000 |

LRN_CO AND INT_MO | 1.000 |

LRN_CO AND INT_FT | 1.000 |

LRN_HF AND INT_MO | 1.000 |

LRN_HF AND INT_FT | 1.000 |

LRN_HF AND LRN_CO | 1.000 |

ENT_CUST AND ENT_QUAL | 1.000 |

EXE_ASS AND LRN_BA | 0.995 |

EXE_DR AND LRN_BA | 1.000 |

EXE_DR AND EXE_ASS | 0.955 |

EXE_DM AND LRN_BA | 1.000 |

EXE_DM AND EXE_ASS | 0.955 |

EXE_DM AND EXE_DR | 1.000 |

DRI_SD AND DRI_PAS | 1.000 |

DRI_LS AND INT_MO | 0.994 |

DRI_LS AND INT_FT | 0.994 |

DRI_LS AND INT_FT | 0.994 |

DRI_LS AND LRN_CO | 0.994 |

DRI_LS AND LRN_HF | 0.994 |

VIS_LC AND ENT_INT | 0.996 |

VIS_VT AND VIS_SO | 0.997 |

INT_PD, Promoting Diversity; INT_IS, Intercultural Sensitivity; INT_MO, Motivating Others; INT_FT, Fostering Teamwork; INT_NET, Networking; INT_COM, Clear and Open Communication; LRN_CO, Coaching Others; LRN_HF, Handling Feedback; LRN_SR, Self-Reflection; LRN_BA, Building up Business Acumen; ENT_INT, Integrity (In Business); ENT_QUAL, Quality Orientation; ENT_PROF, Profit Orientation; ENT_CUST, Customer Orientation; EXE_ASS, Assertiveness; EXE_DR, Delivering Results; EXE_DM, Decision Making; EXE_PS, Problem Solving; DRI_PAS, Passion and Commitment; DRI_SD, Self-Direction; DRI_LS, Leading and Steering; DRI_INI, Initiative; VIS_LS, Leading Change; VIS_INN, Innovation; VIS_SO, Strategic Orientation; VIS_VT, Visionary Thinking.

Bias-corrected bootstrap results.

Confidence intervals of model results | Lower 0.5% | Lower 2.5% | Lower 5% | Estimate | Upper 0.5% | Upper 2.5% | Upper 5% |
---|---|---|---|---|---|---|---|

INT_MOFT | 0.740 | 0.789 | 0.817 | 0.947 | 1.094 | 1.117 | 1.168 |

INT_NET | 0.333 | 0.367 | 0.386 | 0.467 | 0.552 | 0.565 | 0.590 |

INT_COM | 0.232 | 0.272 | 0.290 | 0.375 | 0.462 | 0.478 | 0.506 |

INTPDIS | 0.550 | 0.638 | 0.688 | 0.913 | 1.125 | 1.160 | 1.222 |

INT_NET | 2.606 | 2.639 | 2.657 | 2.742 | 2.823 | 2.837 | 2.869 |

INT_COM | 2.664 | 2.695 | 2.710 | 2.798 | 2.881 | 2.897 | 2.927 |

INT_PDIS | 5.157 | 5.221 | 5.261 | 5.461 | 5.630 | 5.661 | 5.725 |

INT_MOFT | 5.313 | 5.380 | 5.412 | 5.584 | 5.749 | 5.780 | 5.838 |

INT | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |

INT_NET | 0.001 | 0.004 | 0.005 | 0.013 | 0.022 | 0.024 | 0.029 |

INT_COM | 0.043 | 0.053 | 0.059 | 0.094 | 0.150 | 0.162 | 0.183 |

INTPDIS | 0.103 | 0.140 | 0.158 | 0.268 | 0.424 | 0.452 | 0.509 |

INT_MOFT | 0.012 | 0.004 | 0.011 | 0.042 | 0.075 | 0.081 | 0.093 |

INT, Interaction; COM, Clear and Open Communication; NET, Networking; FT, Fostering Teamwork; MOT, Motivating others; IS, Intercultural Sensitivity; PD, Promoting Diversity.

, Correlation is significant at the 0.01 level (2-tailed).