Addressing gender discrimination in cognitive assessment using the English Comprehension Test

There is a need to develop tests locally for the diverse population of South Africa, which presents a unique combination of multilingualism and multiculturalism that innately affect performance in internationally created tests (Bekwa, 2016; Foxcroft, 2004; Foxcroft, Roodt, & Abrahams, 2013; Laher & Cockcroft, 2017; Arendse, 2018). The culturally complex context of South Africa makes this a formidable, yet imperative task. The researcher undertook this task and empirically created the English Comprehension Test (ECT). The ECT, a South African empirically developed test, was identified as a measure for verbal reasoning (Arendse & Maree, 2019; Arendse, 2018, 2020) but it is still in the validation process. Orientation: The empirically designed English Comprehension Test (ECT) is theorised to measure verbal reasoning and is currently undergoing validation. The test development produced two versions of the ECT, namely, ECT version 1.2 and ECT version 1.3. This study focuses on the latest test version, ECT version 1.3.


Introduction Orientation
There is a need to develop tests locally for the diverse population of South Africa, which presents a unique combination of multilingualism and multiculturalism that innately affect performance in internationally created tests (Bekwa, 2016;Foxcroft, 2004;Foxcroft, Roodt, & Abrahams, 2013;Laher & Cockcroft, 2017;Arendse, 2018). The culturally complex context of South Africa makes this a formidable, yet imperative task. The researcher undertook this task and empirically created the English Comprehension Test (ECT). The ECT, a South African empirically developed test, was identified as a measure for verbal reasoning (Arendse & Maree, 2019;Arendse, 2018Arendse, , 2020 but it is still in the validation process.

Research purposes and objectives
South African organisational contexts continuously require tests that are applicable to the South African workforce and serve as a valid and reliable measurement instrument. As testing forms a large part of organisational selections for specific positions, it is essential that the tests used in these contexts are valid and reliable because they are used for decision-making purposes. More importantly, the assessments in the organisational context are often used to measure cognitive performance and functioning (Foxcroft, 2004;Foxcroft & Aston, 2006;Foxcroft et al., 2013;Laher & Cockcroft, 2017;Muleya, Fourie, & Schlebusch, 2017). As the ECT is considered a cognitive assessment, it is crucial to assess whether there is differential performance in the test because of gender. As language is a factor that has previously been found to affect performance (Foxcroft & Aston, 2006;Laher & Cockcroft, 2017), the interaction between the gender and language may provide useful insights. The objectives of the study were as follows: • Objective 1: To examine whether there are any differences in men and women with regard to performance in the ECT (cognitive assessment). • Objective 2: To examine the interaction between gender and the different language groups in the ECT total score.

Literature review
The context of psychometric testing is crucial to understanding the existence of bias in testing, and thus, it is important to delve into the issues promoting this discrimination (Foxcroft, 2004;Foxcroft & Aston, 2006;Foxcroft et al., 2013;Muleya et al., 2017). Gender discrimination, particularly prejudice towards women, has persisted because of the dominant patriarchal influence in the global context. These patriarchal acts of injustice caused the rise of feminist movements as a means of challenging the hegemonic systems of thought (Phakeng, 2015;Stone & Coetzee, 2005). The responsiveness of feminists to patriarchal mechanisms of exclusion was vital in the fight for gender equality in the workplace and in university spaces (Phakeng, 2015;Stone & Coetzee, 2005). This awareness led to the criticism of Perry's college-stage theory (1970) for only focusing on males' cognitive development at university, and consequently, Belenky, Clinchy, Goldberg and Tarule (1986) developed a female perspective. Belenky et al. (1986) decided to replicate the study in order to develop an understanding of cognitive development of female college students. Belenky and Staunton (1998) and Belenky et al. (1986) identified seven positions through which women progress on their journey to acquiring new information. These positions describe the process by which a woman becomes actively involved in debates, knowledge production and confidence in her cognitive and moral development (Garrison, 2009).
One of the methods used for the exclusion of women in the workplace and university was psychometric assessment, which created the segregation of women because of theorised intelligence deficits. Globally, women were discriminated from men through the application of psychometric instruments, which indicated that they were cognitively inferior to men and were thus not capable of occupying certain positions (Camarata & Woodcock, 2006;Hur, te Nijenhuis, & Jeong, 2017;Hyde, 1981;Miller & Halpern, 2014;Palejwala & Fine, 2015;Toivainen, Papageorgiou, Tosto, & Kovas, 2017). This discrimination was very powerful as it appeared to be scientifically proven that women were less intelligent than men in male-dominated spaces. It was found in several studies that men tended to score better in verbal analogies and spatial relations tasks (Camarata & Woodcock, 2006;Hur et al., 2017;Hyde, 1981;Miller & Halpern, 2014;Palejwala & Fine, 2015;Toivainen et al., 2017), whilst women were found to be cognitively stronger in certain domains than men, such as verbal ability (reasoning) and other verbal-related cognitive assessments, such as word memory, anagrams, reading, writing, general and mixed verbal ability assessments (Griskevica & Rascevska, 2009;Hur et al., 2017;Hyde, 1981;Miller & Halpern, 2014;Palejwala & Fine, 2015;Strand, Deary, & Smith, 2006;Toivainen et al., 2017;Wai, Hodges, & Makel, 2018;Wilsenach & Makaure, 2018). These gender differences were explained as biological differences between the male and female sexes because of hormones or genetic differences between men and women (Hur et al., 2017;Miller & Halpern, 2014;Toivainen et al., 2017;Wilsenach & Makaure, 2018). Other reasons provided were the different socialisation of men and women, particularly through school and cultural influences, which, therefore, led to differences in the observation of intelligence across the genders (Miller & Halpern, 2014;Wilsenach & Makaure, 2018). Another aspect included the psychosocial influence that led to differential performance in intelligence measures for men and women, which was also informed by the gender stereotypes associated with men and women (Hur et al., 2017;Miller & Halpern, 2014). However, relative evidence existed, which indicated that there were no gender differences in general intelligence (Camarata & Woodcock, 2006;Griskevica & Rascevska, 2009;Hur et al., 2017;Palejwala & Fine, 2015;Strand et al., 2006;Toivainen et al., 2017).
Despite these diverse views on gender differences in cognitive assessment across numerous studies in American, European and Asian countries, in Africa very few studies have been conducted on differences in cognitive abilities between men and women (Hur et al., 2017). A recent study on differences in cognitive assessment in South Africa, however, noted a disconcerting trend amongst Grade 3 boys in this country, because they were consistently achieving much lower scores compared with women in Grade 3 (Wilsenach & Makaure, 2018). Another study conducted by Bakhiet and Lynn (2015) using the Ravens Coloured Progressive Matrices (a non-verbal intelligence assessment) on Xhosa South African schoolchildren found that the intelligence scores of these schoolchildren were similar to Zulu South African schoolchildren who had performed the same test some years before. The intelligence scores obtained by the Xhosa and Zulu South African schoolchildren allowed the researchers to conclude that the education these schoolchildren were receiving was not increasing their cognitive level (Bakhiet & Lynn, 2015). This conclusion, however, raises issues regarding the appropriateness of tests that originated from different geographical locations as they do not always consider the multicultural and multilingual context of South Africa (Foxcroft, 2004;Laher & Cockcroft, 2017).
In South Africa, psychometric instruments previously served to perpetuate the ideology of the Apartheid government (Laher & Cockcroft, 2013). The combination of race and gender amplified the discrimination, which was not different in the American context, where African American and Mexican individuals were discriminated in psychometric assessments (Kennedy, Allaire, Gamaldo, & Whitfield, 2012;Sireci & Parker, 2006). Moreover, a further implication of the diverse languages in South Africa was the intensified degree of discrimination in the use of psychometric assessments (Laher & Cockcroft, 2013). The level of bias, therefore, intersected on three levels: race, gender and language. The recognition of these injustices prompted the introduction of laws to prohibit discrimination and enforced fair testing practices for employment and educational purposes, such as the Employment Equity Act (Act 55 of 1998) and the Health Professions Act (Act 56 of 1974) of South Africa. Psychologists have thus made concerted efforts to limit bias in testing, and have adapted international tests for use in South Africa and created norms for the South African population as corrective procedures in testing (He & Van de Vijver, 2012;Laher & Cockcroft, 2013Malda, Van de Vijver, & Temane, 2010;Muleya et al., 2017;Van de Vijver & Tanzer, 2004). There have also been substantial efforts to address the gender, racial and language discrimination associated with psychometric testing, which include the comprehensive validation of testing instruments to limit all bias against any racial and gender group as far as possible (Foxcroft & Aston, 2006;He & Van de Vijver, 2012;Malda et al., 2010;Van de Vijver & Tanzer, 2004). Language has been discriminated against English additional language individuals and has been identified as one of the most important factors affecting performance in tests (Foxcroft & Aston, 2006). In terms of cognitive assessment, language has also been found to affect performance in tests related to verbal comprehension (Foxcroft & Aston, 2006). A recent study by Reilly, Neumann and Andrews (2019) found gender differences in reading and writing achievement scores and attributed some of these gender differences to language and culture. In light of these issues, gender cannot be considered in isolation but should instead be regarded as part of other factors that affect men and women in completing assessments.
The relevance of addressing racist and sexist research was recently emphasised in the retracted article by Nieuwoudt, Dickie, Coetsee, Engelbrecht and Terblanche (2019), in which they claimed that coloured women had an increased risk of low cognitive functioning and were presented with low education levels. The term 'coloured' refers to the legal classification of racially mixed individuals in South Africa. This legal classification was created during Apartheid in South Africa and has remained a legal classification in South Africa until the present (Adhikari, 2006;Isaacs-Martin, 2018). The article by Nieuwoudt et al. (2019) was petitioned and later retracted by the publishers as it was heavily criticised for perpetuating racist and sexist ideologies as well as colonial stereotypes, with Apartheid underpinnings, of coloured women. The uncritical use of the term coloured to homogenise a racially diverse group of women was found in the article (Boswell, Erasmus, Johannes, Mahomed, & Ratele, 2019). Moreover, the authors were criticised for applying the flawed methodology in addition to using an international test that was not culturally adapted to the South African population, which would, therefore, have provided potentially biased results.
The consequences of such results are far-reaching and cause psychological damage as the conclusions generated by this study were generalised to all coloured women (Boswell et al., 2019).
When reviewing the literature on the use of psychometric instruments in South Africa, our history reminds us that we need to be cautious of the manner that assessments have been used to promote racist science. The uncritical use of measurement instruments had led to unfair assessment and biased conclusions. These conclusions, based on the literature, are not singular, but can represent multiple factors. In this manner, a feminist theory such as intersectionality (Crenshaw, 1988) becomes an important way of assessing the different aspects that affect performance in assessments. Intersectionality argues that individuals can be oppressed on multiple grounds simultaneously (Crenshaw, 1988). With reference to the literature, some of the aspects, such as gender, race, language and culture, can simultaneously oppress individuals completing cognitive assessments when the assessment had not been subjected to sufficient validation for their population and context. In this article, the use of intersectionality (Crenshaw, 1988) allows the author to focus predominantly not only on gender but also on language as another factor that can affect performance in assessment. Although there have been important developments and interventions to guide test developers and test users on the fairness, validity and reliability of assessments, the retracted research study by Nieuwoudt et al. (2019) reminds us of the importance of continually validating assessments in the South African context. For this reason, this study was guided by two research questions, namely: are there any differences in the performance of men and women in the ECT? and what is the interaction effects of gender and language on the ECT total score?

Research approach
This study used a quantitative cross-sectional design. The ECT is theorised to measure verbal reasoning and is currently undergoing validation. The ECT was administered http://www.sajip.co.za Open Access to a non-probability convenience sample of 882 individuals. The data were analysed by differential test functioning (DTF) in Winsteps and a two-way ANOVA in Statistical Product and Service Solutions (SPSS). Differential test functioning analysis was used to assess differences across gender groups in the ECT. This statistical analysis allows the researcher to assess whether the different genders, man and woman, performed similarly in the test. It is worth noting that DTF is of critical relevance in crosscultural and multilingual research (Sireci & Berberoglu, 2000). A two-way ANOVA was conducted to assess the interaction effects of gender and language on the ECT total score (Lee & Lee, 2018). This was anticipated to provide additional information on gender differences and to assess whether language had an interactional effect on gender.

Research participants
The (1) Afrikaans, (2) English and (3) African languages, as shown in Table 1. All participants had completed Grade 12 (highest school grade). It should be noted that the majority of the sample population were relatively young and included considerably more men than women.

Measuring instrument
The ECT is an empirically created test that is theorised to measure verbal reasoning (Arendse & Maree, 2019;Arendse, 2018Arendse, , 2020. It is comprised of comprehension and language sections, which include multiple-choice questions that are dichotomously scored. The language section also includes a written answer section with four sentence construction items. This test has been used on individuals from different linguistic and cultural backgrounds, as well as on different age groups in South Africa. The ECT is currently still in development and is, at present, undergoing validation. The development of the ECT, thus far, has led to the piloting of two test versions, namely, ECT 1.2 (39 items) with a time limit of 45 min and ECT 1.3 (42 items) with no time limit (Arendse & Maree, 2019;Arendse, 2018). As the ECT is still in development, it has only been used for research purposes. The ECT has been used as a screening tool for verbal reasoning in educational and organisational settings. It may assist organisational practitioners in screening for verbal reasoning, which is often required in organisational positions. This study is part of the validation and further development of the ECT. However, this article focuses only on ECT 1.3, the latest test version.

Research procedure and ethical considerations
A convenience sampling method was used to collect data as the participants were attending selections and were available after assessments had been completed. After individuals were done with the selection process, they had a lunch break. After the break, the participants were informed of the ECT for research purposes and their consent to participate in the research requested, after which they completed the test. The intention behind carrying out the research after the selection process was to avoid the research having an impact on the performance of participants in the selection. It should be noted that fatigue must be considered because of the time when the research took place (Arendse & Maree, 2019;Arendse, 2018Arendse, , 2020. As the sample comprised of people seeking employment, the participants in the study can be considered to be job seekers from various backgrounds and ages and, therefore, regarded as relevant to organisational psychologists.
The ethical considerations for this study were anonymity, because no identifying information was required for the study, and confidentiality, because demographic variables were treated with confidentiality. All participants gave their written consent to participate in the study. Safeguarding information is important, and thus, only relevant project members are able to access the research.

Statistical analysis
Measurement invariance was explored by conducting a DTF analysis within a Rasch framework using Winsteps (Linacre, 2009). As the sample size of the men and women differ significantly, DTF is able to evaluate differential performance in the test without this being affected by the sample size (Bond & Fox, 2007). Differential test functioning requires the use of item difficulties, referred to as item measures in Rasch, for gender comparison (Linacre, 2012). The analysis, therefore, is used to assess the performance of men and women to establish whether test items caused differential performance in the case of either gender. The two-way ANOVA assessed the interactional effects of gender and the different language groups on the ECT total score. The two-way ANOVA was conducted in SPSS, and the Tukey post hoc test was run for significant variables (Lee & Lee, 2018).

Ethical considerations
Ethical clearance was obtained from the Faculty of Humanities Research Ethics Committee at the University of Pretoria for the PhD study (No. GW20150407HS) from which these results were obtained.

Results
The normality of ECT version 1.3 was assessed by the skewness coefficient -0.256 and the kurtosis coefficient -0.082. The skewness and kurtosis coefficients were within the commonly established −1.000 to +1.000 ranges. This indicates that the data are normally distributed, and thus, the analyses of the study can be run (Arendse, 2020).

Differential test functioning results
Comparison of the average performance of women and men is shown in Table 2. It can be observed that there are some items in which both genders have the same or similar means, whilst it also shows that men and women have higher means in respect of different items.
According to the DTF statistics (Table 1), the items that have the highest t statistical values (as they are greater than the 1.96 cut-off for 95% confidence interval) include items 4 (−3.07), 6 (−3.15), 32 (2.84), 36 (2.69) and 40 (−2.21). These items are statistically different for the two genders and can be considered as possibly biased.
In Table 3, the person and item infit and outfit Mean-Square Statistics (MNSQ) values for men and women were both acceptable as they were close or equal to 1 (Linacre, 2002). These results indicated that both men and women presented a good model fit according to each person's ability and the item difficulty. The person separation values for the men and women are below 2 (Baghaei & Amrahi, 2011), which suggests that there is limited variation in the abilities of men and women. The item separation value is much higher than 2 (Baghaei & Amrahi, 2011), which suggests that there is a relative range of item difficulties across the genders in the test. It is, however, apparent that the item difficulties for the male group are more varied compared with the female group. The item reliability across gender groups is considered excellent reliability values.
In Table 4, the empirical slope of 0.942 is considered acceptable and suggests that the items are relatively similar for both genders, with only a few item differences. The correlation of the male and female intercepts is 0.986, which suggests that the items are measuring the same construct across gender groups. The reliability for both genders is well over 0.90, which indicates that high internal consistency is present (Erguven, 2014;Nunnaly & Bernstein, 1994;Suhr & Shay, 2009).
The test items across gender groups were acceptable in terms of their variation of difficulties. Although the majority of the test items did not show any bias across genders, five test items were identified (items 4, 6, 32, 36, and 40) as statistically different across the two genders.  Both Figures 1 and 2 show the performance of women and men in the flagged (statistically different) items to be relatively similar.
In the assessment of the item content of these statistically different items, as shown in Table 5, it was observed that no recognisable gender discrimination is apparent in the item content. The respective means displayed in Table 5 indicated that women performed higher in three of the five items.

Two-way ANOVA results
In Table 6, the descriptive statistics for gender and the respective language groups are shown. The descriptive statistics for gender indicates the number of men and women in each language group. As expected, the largest numbers belong to the male group. It is, however, worth noting that the means across the different genders are similar, with a small difference across these average scores.
From Table 7, it can be observed that there was no statistically significant difference in the mean ECT total score between men and women, F (2, 54) = 0.672, p = 0.413. There was, however, a statistically significant difference in the mean ECT total score for the different language groups, F (2, 54) = 55.893, p = 0.000. The interaction between the gender and language had non-significant effects on the dependent variable, ECT total score, F (2, 54) = 1.234, p = 0.292.
In Figure 3, the estimated marginal means plot provides a graphical image of the study results. Based on the graph, the lines for men and women appear in a relatively parallel form, thus suggesting that there might not be an interaction effect in the data.
As the interaction results for gender and language were statistically insignificant, the Tukey post hoc test was run on the variable language groups, which was previously reported as statistically significant. Table 8 indicates the results of the Turkey post hoc test for the language groups. Based on the post hoc test results, the differences between Afrikaans and English (p = 0.002), Afrikaans and African languages (p = 0.000) and English and African languages (p = 0.000) are all statistically significant.   Flagged items of the ECT version 1.3

Amount of males
Correct Incorrect/missing

Outline of the results
This study was aimed at exploring gender differences in the ECT by investigating the test through DTF analysis. The findings of the study indicated that the fit statistics across gender groups revealed high reliability values and good average infit and outfit MNSQ values, which gives some certainty that the gender performance in the test was not necessarily biased and that both genders performed similarly in the test items. The results also showed that the performance of the participants in the sample was problematic in terms of their limited variation of abilities across gender groups. The persons' limited ability levels imply that they were unable to perform better, but the items were not the cause of any specific bias linked to their ability. When observing the content of the items identified as statistically different (Table 5), there appears to be no obvious gender discrimination. The literature on gender differences relating to cognitive assessment suggests that women are more skilled at verbal tasks than men (Griskevica & Rascevska, 2009;Hur et al., 2017;Hyde, 1981;Miller & Halpern, 2014;Palejwala & Fine, 2015;Strand et al., 2006;Toivainen et al., 2017;Wai et al., 2018;Wilsenach & Makaure, 2018), but this finding cannot be concluded in respect of the ECT on the basis of three statistically different items.
An additional analysis was conducted to assess the interaction between gender and language groups. The reasoning for this was the fact that language is a factor that also affects performance in tests (Bekwa, 2016;Foxcroft & Aston, 2006;Reilly et al., 2019;Arendse, 2018). The two-way ANOVA was conducted to examine the effect of gender and the different language groups on the ECT total score. The ANOVA results indicated that there was no statistically significant interaction between the effects of gender and the different language groups on the ECT total score, F (2, 54) = 1.234, p = 0.292. The small mean differences were statistically non-significant, and the estimated marginal means plot was also indicative of negligible differences across gender groups. This relates to the findings of the DTF, in which gender differences were only found to affect five items in the test. This is a positive finding, as assessments should not be found to discriminate across genders. Furthermore, it indicates that the test is not biased towards individuals on the basis of their gender. Although gender was found not to discriminate in the ECT total score, there were statistically significant differences in the mean ECT total score for different language groups. When examining these language differences, mean differences were found across three language groups (Afrikaans, English and African language groups). This confirms findings in the literature that language may cause differential performance in assessments.

Practical implications
One way in which the five statistically different items from the DTF can be interpreted is that these differences observed across gender groups are indicative of language-related differences. This was confirmed by the two-way ANOVA, in which gender and the interaction between the gender and the different language groups were found to be statistically non-significant. The findings, however, confirmed that the different language groups had statistically significant differences across their means. One cannot escape the presence of language inhibiting performance in a test, when the majority of the participants are black Africans and are predominately English second-or third-language speakers. The official languages associated with black African members include IsiXhosa, IsiZulu, Sepedi, Setswana, Tshivenda, Sesotho, SiSwati, IsiNdebele and Xitsonga. These languages comprised of the African languages group. The remaining two official languages, English and Afrikaans, formed the other two language groups. When attempting to make sense of the language differences observed in the two-way ANOVA, the African languages vary substantially from the English language, in that there are instances where there is no African equivalent for an English word (Schaap, 2011). This may have had an impact on how African language-speaking individuals interpreted the items in the test. Moreover, these items could have different social meanings attached to certain words across the individuals based on their first language (Radden, 2008). This may also be true for both the African and Afrikaans language groups. In addition, the language differences may be indicative of how individuals differ in their thinking because of language differences and semantic  structures (Boroditsky, 2011;Gentner & Goldin-Meadow, 2003). These differences may also connect to Vygotsky's emphasis on the influence of culture and language on cognitive development (Ormrod, 2008;Vygotsky, 1978) as the environment and learning opportunities can influence this (Van der Pool & Catano, 2008). The finding that gender discrimination was limited to five statistically significant items, which on further investigation indicated no statistically significant mean differences, is a positive one. It also points to the necessity of exploring gender differences as this affects performance in a test. The empirical ECT, which is still in development, can be considered gender neutral in terms of item content, as the five items were not prejudicing individuals because of associated gender knowledge. It is, however, of concern that language was found to affect performance, but this is also a common bias that most tests fall prey to in multilingual and multi-cultural contexts, such as South Africa (Bekwa, 2016;Foxcroft, 2004;Foxcroft et al., 2013;Laher & Cockcroft, 2017). This is a significant finding that will assist in further developing and validating the ECT.
This study relates to other cross-cultural studies that have found background factors to have an effect on the performance of individuals in cognitive assessments (He & Van de Vijver, 2012;Van de Vijver & Tanzer, 2004). Thus, when interpreting the results of this study in light of the language differences, it should be noted that in cross-cultural contexts, the background and culture of different persons completing the test need to be considered ( Van de Vijver & Rothmann, 2004). In this study, culture and language are interrelated as words can be regarded as the overlap of race and class (Cooper, 2018). The intersection between the race and class includes culture, which may explain the differences observed across gender groups and, more specifically, the differences between languages. This notion of language differences was also found by research conducted on the verbal scale of the Wechsler Intelligence Scale for Children, which showed higher loadings on linguistics and culture for aboriginal children (Flanagan & Ortiz, 2001).
African individuals in different contexts across the globe were discriminated in cognitive assessment because of their perceived poor performance in cognitive assessments. Their poor performance was linked to historic and educational inequalities (Kennedy et al., 2012;Laher & Cockcroft, 2013. In light of numerous cross-cultural findings (Flanagan & Ortiz, 2001;Foxcroft, 2004;He & Van de Vijver, 2012;Van de Vijver & Rothmann, 2004), and this study in particular, these conclusions appear to be inadequate for explaining the findings across gender groups but may explain the differences across language groups. In South Africa, it can be deduced that language is influenced by race and culture (Cooper, 2018;Foxcroft & Aston, 2006;Ormrod, 2008;Vygotsky, 1978). Thus, the implication of language, which cannot be separated from race and culture, provides a more substantial reasoning for the observed gender differences in this study. This is because of language often being influenced by a person's race and culture, which forms part of the contextual aspects impacting on language (Cooper, 2018;Foxcroft & Aston, 2006;Ormrod, 2008;Vygotsky, 1978). The intersection of race and culture in language needs to be considered when examining the results of non-native English individuals in English cognitive assessments. This consideration will limit bias and prevent discrimination against individuals from different cultures in cognitive assessments (Flanagen & Ortiz, 2001;Laher & Cockcroft, 2013Van de Vijver & Rothmann, 2004). This consideration also allows the use of intersectionality (Crenshaw, 1988) as a lens through which quantitative findings can be understood.

Limitations and recommendations
A limitation of this study is that the results are not generalisable as convenience sampling was carried out. Although the sample consisted of a relative range of ages, the majority of the sample consisted of young adults.
Having predominantly male participants in the sample is another limitation of the study, as the genders were not equally represented. In terms of the language group distribution, the majority of the sample comprised of African language speakers, and thus, the language groups were not equal. There is also the possibility that fatigue may have impacted the participants' performance in the test and should be considered.
The recommendations for the ECT are that the associated language issues with the possibly biased items identified in the ECT need to be examined further and either removed or rephrased for better interpretation. A factor analysis and reliability analysis of the male and female samples should be performed to confirm whether these factor structures correspond with the overarching factor structure and reliability of the ECT.

Conclusion
Psychometric assessments have been known to discriminate and were previously implicated in emphasising gender and racial differences on the basis of cognitive assessments. As the ECT can be considered a cognitive assessment because it measures verbal reasoning (Arendse, 2018), the identification of five biased items from the DTF analysis was the cause of concern. It was, however, argued that the item content did not suggest gendered knowledge but rather tapped into language differences across individuals. This finding, therefore, contradicts numerous international studies that had found women outperformed men in verbal cognitive assessment. The two-way ANOVA found that no statistically significant differences were observed between men and women in the ECT total score. In addition, the interaction between gender and language had non-significant effects on the dependent variable, ECT total mean score. This confirmed that no gender discrimination was observed in the ECT. The findings nevertheless indicated that there were statistically significant differences in the ECT total mean score for the different language groups. It was observed in the post hoc analysis that there were statistically significant mean differences amongst all the different language groups, Afrikaans, English and African languages. The language issue has been plaguing psychometric testing and test development in South Africa for years, and it remains an immense task to ensure that the tests produced or used are not prejudicing any individuals or limiting their opportunities unfairly. The findings of this study are, therefore, a step towards rectifying the discrimination of the past, in terms of both gender and language. The identification of biased items is imperative to the further development and validation of the ECT and will require further investigation. This study promotes the use of DTF and ANOVA as a means of ensuring fairness in assessment practices across gender groups. Consequently, this study contributes to cross-cultural test development. This study also highlighted the importance of incorporating intersectionality into quantitative studies to ensure that bias in cognitive assessment is addressed.