Developing and testing items for the South African Personality Inventory (SAPI )

Orientation: A multicultural country like South Africa needs fair cross-cultural psychometric instruments. Research purpose: This article reports on the process of identifying items for, and provides a quantitative evaluation of, the South African Personality Inventory (SAPI) items. Motivation for the study: The study intended to develop an indigenous and psychometrically sound personality instrument that adheres to the requirements of South African legislation and excludes cultural bias. Research design, approach and method: The authors used a cross-sectional design. They measured the nine SAPI clusters identified in the qualitative stage of the SAPI project in 11 separate quantitative studies. Convenience sampling yielded 6735 participants. Statistical analysis focused on the construct validity and reliability of items. The authors eliminated items that showed poor performance, based on common psychometric criteria, and selected the best performing items to form part of the final version of the SAPI. Main findings: The authors developed 2573 items from the nine SAPI clusters. Of these, 2268 items were valid and reliable representations of the SAPI facets. Practical/managerial implications: The authors developed a large item pool. It measures personality in South Africa. Researchers can refine it for the SAPI. Furthermore, the project illustrates an approach that researchers can use in projects that aim to develop culturally-informed psychological measures. Contribution/value-add: Personality assessment is important for recruiting, selecting and developing employees. This study contributes to the current knowledge about the early processes researchers follow when they develop a personality instrument that measures personality fairly in different cultural groups, as the SAPI does.


Introduction Problem statement
In South Africa, personality research and assessment has been gaining momentum in the last 10 years (Meiring, Van de Vijver, Rothmann, & Barrick, 2005;Taylor, 2000;Visser & Viviers, 2010). The need for measures that are more sensitive to ethnic differences has heightened the interest in personality assessment (Nel et al., 2012;Valchev et al., 2011). This makes fair and comparable measurement more challenging.
South African legislation also provides a legal framework for developing psychological tests. Section 8 of the Employment Equity Act (Act 55 of 1998) (Government Gazette, 1998) clearly states that all psychological measures in South Africa should measure concepts fairly and equally for all ethnic groups and deal with cultural, linguistic and racial elements without introducing bias against any group. Differences in culture, the distribution of socioeconomic resources, education and employment statuses in the country further contributes to the challenge of fair psychological measurement in South Africa (Foxcroft & Roodt, 2009). This is particularly relevant in instances where organisations use psychological measures for recruiting, selecting, placing and developing employees.
In an effort to overcome these problems, personality researchers from South Africa and the Netherlands initiated a project eight years ago. This was the South African Personality Inventory project 1 (Meiring, Van de Vijver & Rothmann, 2006;Nel et al., 2012;Valchev et al., 2011;  The project initially used an combined etic-emic approach in which native speakers of each of the 11 languages described persons they know well (like parents and friends) and did not know well (teachers or neighbours). After content analysis of these responses, the researchers identified unique facets (specific to certain languages) and common facets (that most or all languages shared) (Nel et al., 2012;Valchev et al., 2011;Valchev et al., 2013).
The researchers clustered these facets further. This resulted in nine general personality clusters (extraversion, softheartedness, conscientiousness, emotional stability, intellect, openness, integrity, relationship harmony and facilitating). The personality structure that the researchers derived comprised a three-tiered hierarchical structure. It had nine clusters at the top, 37 sub clusters (between two and six sub clusters per cluster) and 190 personality facets at the lowest level.
Three of these nine clusters corresponded with the big five factors (extraversion, conscientiousness and emotional stability) (see John & Srivastava, 1999), two were reminiscent of the openness factor of the big five, (openness and intellect) and one was similar to the Honesty factor in the HEXACO model (integrity) of Ashton and Lee (2007). Three SAPI clusters, namely soft-heartedness, relationship harmony and facilitating, were unique to the SAPI although some elements of the big five's agreeableness factor were present in these unique clusters.
The overall objectives of this article are to discuss the phase of the project in which the authors developed items using the responses they obtained during the qualitative phase and to investigate the construct validity and reliability of the instruments they developed to measure the nine clusters.
Because the SAPI is still in its early development stage, the authors determined construct validity only to construe which items performed well within a specific cluster. According to Bayoglu, Unal, Elibol, Karabulut and Innocenti (2013), researchers should first determine reliability and construct validity for a new questionnaire before attempting further validity studies. This information pertains to developing and testing items.

Literature review Developing the South African Personality Inventory item pool
According to Saucier (2008), the way one defines personality is significant because it affects how researchers will select variables when they study personality.
When researchers define personality constructs in different cultural contexts, they usually use a top-down approach (the etic approach) because they investigate and test the generalisability of Western models and theories of personality in non-Western cultural contexts (Cheung, Van de Vijver & Leong, 2011). This approach assumes that the personality traits that these Western models and theories measure adequately represent personality dimensions in other cultures. However, Cheung et al. (2011, p. 595) discussed various substantive and methodological issues related to the etic approach, noting that researchers might miss: '... indigenous constructs that are salient in the local folk concepts of personality and in the local taxonomy of person descriptions' when they use imposed etic measures.
The aim of indigenous psychology (the emic approach) is to develop an insider's perspective of psychological phenomena in a culture. Using conceptions and methodologies that are embedded in, and derived from, the ethnic or cultural group being studied to generate knowledge and enrich mainstream psychology typifies this approach (Chui, Kim, & Wan, 2008;Ho, 1998;Ho, Peng, Lai, & Chan, 2001). Cheung et al. (2011) contended that researchers need combined etic and emic approaches in order to enlarge current conceptualisations of universal personality constructs, thereby possibly bridging the gap between mainstream and indigenous psychology and demarcating the universal and culturally specific aspects of psychological constructs. This combined etic-emic approach can include: ... (a) the use of a combination of etic and emic measurement, (b) studies in which universal and culture-specific aspects are delineated in an iterative process of data collections with continually adapted instruments, and (c) the use of mixed methods (e.g., the use of an etic measure combined with interviews for collecting information about culture-specific features not covered by the etic instrument). (Cheung et al., 2011, p. 597) For a complete description of this convergent approach, where researchers combine the etic and emic approaches, see the work of Cheung et al. 2011;Nel et al. 2012.
In this project, the SAPI research team used a combined etic-emic approach to identify culturally and linguistically adequate personality descriptive terms for all 11 official South African languages (Cheung et al., 2011;Nel et al., 2012). Current etic models, like the Five Factor Model (FFM) and the HEXACO model (Nel et al., 2012), partly informed the clustering of these emic terms.
The next step in the project was to use construct modelling to develop a culturally appropriate measuring instrument using these personality descriptive terms.
Construct modelling is a framework for developing a measuring instrument based on using four building blocks: 1. Explaining the construct to others with the help of the construct map. 2. Creating items believed to lead respondents to give responses that will indicate levels of the construct map. 3. Trying out those items with a sample of respondents. 4. Analysing the resulting data to check whether the results are consistent with the initial intentions as expressed in the construct map (Wilson, 2005).
Taking a combined etic-emic approach, an adapted version of Wilson's (2005) proposed construct map, the authors used a modelling framework to develop the item pool for each of the identified SAPI clusters. The process involved the following steps: 1. Grouping the original responses by facet and extracting content-representative responses. 2. Describing the nine SAPI constructs (i.e. also defining these constructs) with the help of the content-representative responses. 3. Transforming qualitative responses into item stems. 4. Designing items the authors believed tapped the facets. 5. Administering the items to a sample of respondents. 6. Analysing the resulting data to check whether the results are consistent with the initial intentions as stated in the construct map.
A detailed discussion of each of these building blocks, in relation to developing the SAPI items, follows.

Grouping the original responses and extracting contentrepresentative responses
In order to extract content representative responses, the authors first grouped all the original responses associated with each SAPI cluster according to language (for example, extraversion clustered according to Afrikaans, English, Swati, Ndebele, IsiXhosa and IsiZulu). They then grouped the responses from the 11 languages as they related to the various sub clusters (for example, all the responses from all the languages for emotional stability clustered according to balance, courage and ego strength). The authors then clustered the facets in each sub cluster as the various language groups presented them. Finally, the authors examined the original responses by facet and retained only those that represented the facets as content-representative responses.
The authors used these content-representative responses to develop the construct maps of the SAPI.

The construct maps of the South African Personality Inventory
One can define a construct map as a one-dimensional latent variable. However, because the SAPI structure is multidimensional, Wilson (2005) suggested dealing with one dimension at a time. This would make it possible to use the construct modelling framework for developing the SAPI. A construct map contains a coherent and substantive definition of the content of the construct as well as an idea that the construct comprises an underlying continuum (Wilson, 2005). Similarly, Boyle, Matthews and Saklofske (2008) stated that the key to moving forward with psychometrically sound measurement rests with the definitions that researchers decide best represents a particular domain of behaviour, psychological disturbance (or wellbeing) or underlying traits, like extraversion or neuroticism.
The authors took into account that, in addition to being multidimensional, the SAPI is also hierarchical in its structure. It consists of clusters, sub clusters, facets and items.
With this in mind, the authors developed definitions for the clusters and facets in the SAPI. For a complete description of the definitions of the SAPI clusters, see Nel et al. 2012.
The authors developed the definitions of the various facets using the shared content in the content-representative responses. They discarded facets that consisted of four or fewer descriptions and those that fewer than two language groups presented (for example, the authors discarded dutiful and deliberating from conscientiousness: talented and useless from intellect; prim and proper from openness; and satisfying others and wrathful from soft-heartedness).
During this process, three new facets emerged. These were gullible (in soft-heartedness), shamed (in emotional stability) and impulsive (in emotional stability). the authors divided the original introvert or extravert facet (in extraversion) into two separate facets and developed definitions for the remaining 184 facets.

Item stems
The next step involved transforming the content-representative responses into item stems (see Table 2 for an example and Nel et al. [2012] for examples of the original responses). The authors then used these item stems in the next building step: the process of designing the items.

The process of designing items
The process of designing items consisted of four parts: 1. developing stimuli or items to which the participants respond 2. deciding on a response format or method 3. determining conditions that govern how participants respond to the stimuli 4. establishing procedures for scoring the responses (Hogan, 2007).
For the first part of the process, the authors rephrased the SAPI item stems as items that they could use in the final versions of the questionnaires. The development of the SAPI items followed the psycho-lexical tradition of item development closely. This is an approach that researchers use widely when they examine personality (see Cheung et al., 2011). Researchers who work in the psycho-lexical tradition usually begin their research into personality descriptors in a particular language group with the analysis of the group's dictionary. This resulted in the extraction of a comprehensive collection of personality-descriptive terms that they were subsequently reduce according to a number of criteria (Angleitner, Ostendorf, & John, 1990).
The SAPI project used an adapted version of the psycholexical approach. Instead of using South African dictionaries for all the 11 languages, the SAPI project derived its personality descriptors from the qualitative phase's content-representative responses (i.e. the transcripts of the Not allowing others to manipulate or control one's actions. Ensuring that everyone is aware of the boundaries that one has set and communicating one's views about the behaviour of others.

Expressiveness Outspoken
The inclination to share one's feelings or problems with others as a way of sharing burdens and being frank and forthright about one's feelings in a non-threatening manner.

Sociability Extrovert
Enjoying the company, companionship and presence of other people, being loud, sociable and expressive and being able to connect with others easily.

Facilitating Guidance Leading
Having leadership qualities; have and like to take the role of a leader in one's close relations and in one's broader social environment; enjoying leading.

Encouraging Others Having aspirations for others
Hoping for the best for others and that they will succeed in life. Like to see other people progress and realise their potential. Promote others' success and well-being. Having dreams and ambitions for others and wishing others well.

Integrity Integrity Morally Conscious Have good morals and values, a sense of what is right or wrong and being righteous.
Fairness Discriminative Being prejudiced towards others with different orientations, backgrounds, and beliefs.

Intellect
Aesthetics Artistic The inclination, likings or preference for engaging or appreciating the arts.

Reasoning
Intellect Being bright, shrewd, informed and generating good ideas. Being able to understand concepts easily and learn quickly. Contemplate the content of questions in order to give precise and accurate answers in line with expectations.

Skilfulness Enterprising
Taking the initiative to exploit new business opportunities, having business knowledge and the creativity to make and sell items.

Social Intellect Perceptive
Being observant, reading the social environments for cues and having insight about others.

Openness Broadmindedness Dreamer
Having dreams; dreaming of the future and of things one would like to see in the future; dreaming about others; pursuing one's dreams.
Epistemic Curiosity Eager to learn Being eager, willing and determined to learn, showing an interest in schoolwork and school related academic activities.

Materialism Materialistic
Being materialistic and concerned about wealth; spending money on expensive articles.

Openness to experience Adventurous
Being an adventurous person who enjoys exploring things to gain new experiences.
Relationship harmony Approachability Accommodating Knowing how to deal with others, making others feel at home and explaining things when others do not understand them.
Interpersonal Relatedness Forgiving Accepting apologies from others, not holding grudges, not taking revenge and preferring to make peace by talking about things.
Conflict-seeking Troublesome Causing trouble, arguments, fights and conflict between people.

Meddlesomeness Interfering
Meddling and getting involved in other people's affairs when one's involvement is not needed, collecting information about other people's affairs that do not concern one, intruding and telling others what to do when it is none of one's business, invading others' privacy and prying into their affairs.
Soft-heartedness Gratefulness Appreciative Expressing appreciation, acceptance and love. Liking, adoring, enjoying, and being fond of objects, persons, and/or situations in general.

Active Support Community Involvement
Taking interest in, caring for, or serving the community or its development; gaining respect by being a role model for the community.

Hostility Critical
Being critical and insulting towards others and opposing others' lifestyles. Being outspoken and looking for weaknesses in others.

Amiability
Kind Having a soft heart and being soft-spoken. Being gentle and nice towards people.

Egoism Greedy
Wanting more than what one already has. Not being satisfied with what one has.
Empathy Caring Being aware of others' needs and doing things for others. Looking after others and being concerned about the welfare of others. Having others' best interests at heart, being interested in others. utterances about personality the authors collected from interviews in the first qualitative stage of the SAPI project). They then transformed them into item stems. Whilst they were converting the item stems into items, they followed these general guidelines for writing items to ensure their standardisation: 1. Items had to be short, simple and clear. 2. Items were written in the first person, starting with 'I' followed by concrete behaviours, objects and contexts. 3. Negatives should not be used in the main parts of items. 4. Items that described a single activity or habit were avoided. 5. Temporal qualifiers, like often, always and sometimes were avoided. 6. Items had to be formulated in the direction of the construct. 7. Double-barrelled items were not allowed. 8. Items had to refer to concrete behaviours and not beliefs, values or orientations. 9. Psychological trait terms had to be avoided. 10. Items should not use idioms and expressions or sayings in order to avoid confusion. 11. Items had to be written in English so that they could be translated (cf. Hendriks, Hofstee, & De Raad, 1999).
It is a general rule of thumb, when one develops psychological instruments, to develop two to three times as many items as one needs for the final instrument initially (Hogan, 2007). During the process of developing items for the SAPI, the authors developed 2573 items for the nine clusters: • Throughout the process, the authors revisited the qualitative data in order to ensure that the items derived from the original responses were relevant. They presented these items to cultural and language experts (representative of all 11 official languages in South Africa) in order to identify: 1. Items that were not translatable to a language other than English 2. Items that were ambiguous or not understandable to speakers of a particular language 3. Items that were not culturally appropriate for a certain language group.
The choice of item format (part two of the item design process) links to the nature of the construct one is measuring and to practical considerations (Foxcroft & Roodt, 2009).
Researchers usually design personality inventories for administration individually or to groups. Therefore, these instruments need to be easy to score and usually use the format of the popular fixed-response or a Likert-type response format (Foxcroft & Roodt, 2009;Hogan, 2007;Wilson, 2005). The authors chose the five-point Likert-type response format for the SAPI. Responses ranged from 1 (strongly disagree) to 5 (strongly agree).
The next part of the process of designing items involved determining the conditions that govern how participants respond to the stimulus. Consistent with common practice in personality assessment, participants had unlimited time to complete the questionnaire. The authors designed an answer sheet to accompany the test booklets. This answer sheet can be hand or computer scored.
Finally, establishing procedures for scoring the response is an important part of developing a measuring instrument.
When analysing Likert-type responses, researchers treat responses as belonging to a numerical scale. They then either add the items or carry out a factor or latent variable analysis. This results in a factor score that measures a common characteristic of the item the researchers set for a respondent (Dittrich, Francis, Hatzinger, & Katzenbeisser, 2007). In this instance, developing the SAPI scale involved 'summing Likert-type items to yield a score that represents the degree to which the construct being measured is present in the respondent' (Fitzpatrick et al., 2004, p. 334).
The last two building blocks in the construct modelling framework relate to testing the developed items in a sample of respondents and analysing the results to determine whether the items achieved the initial purpose for which the researchers developed them. Researchers achieve this by investigating the items' reliability and validity. Table 3 shows that most participants were between 18 and 21 years old (67%) and had completed Grade 12 (83%). Most were female (59%) and 94% of the participants rated their English reading ability as good to very good. Participants spoke Afrikaans (30%), English (16%) and IsiZulu (13%) as home languages. The sample comprised African (56%) and White (36%) participants.

Measuring instruments
Developing and refining the items: The immediate objective of this stage was to develop an item pool to measure the facets, sub clusters and clusters using the construct modelling framework that Wilson (2005) described. The authors grouped the original responses the researchers obtained from the interviews in the qualitative phase of the SAPI. They then extracted content-representative responses and developed definitions for the various facets. They generated item stems using the facets' definitions and the content-representative responses.
The authors developed 2573 items for the nine proposed clusters of the SAPI and assembled them into separate questionnaires for administration to the various populations.
The authors encountered a few challenges in constructing and refining the items before they developed the 2573 items.     Firstly, the dataset included more than 50 000 statements in 11 official languages. The authors had to evaluate them and turn them into items. Some of the personality facets they identified were common to all 11 official languages, whilst others were specific to some languages or to only one language. The challenge was to determine which responses were relevant for developing items. Some items seemed to be culture-specific and could have different connotations in different cultural groups. Examples are: 'bewitching people' (soft-heartedness) and 'wandering in the streets' (conscientiousness). Some groups could regard items of this nature as offensive or confusing.
Secondly, some items were too vague or abstract. The authors eliminated them in the process of generating items or revised them by contextualising the specific items. For example, an item pertaining to being 'outgoing' became 'go out to parties' (extraversion). During this phase, the authors eliminated items if they felt that the items did not measure a certain facet or if the context was vague.
Thirdly, the items had to exclude idiomatical expressions, like 'to jump the gun' (extraversion) or 'standing behind friends' (soft-heartedness). Including these items would have been problematic because previous South African studies, which focused specifically on the fair cross-cultural application of instruments, suggested that items should be free from aspects that different groups could understand in different ways (Meiring et al., 2005;Taylor & De Bruin, 2005).
As a result, the authors developed and retained 2573 items to measure the nine clusters. They prepared and administered these items in 11 different studies via paper-and-pencil instruments or internet-based surveys.
The authors measured each of the constructs using separate questionnaires, except for the relationship harmony and soft-heartedness constructs. Because the large number of items they generated for the relationship harmony and softheartedness constructs (400 and 482 respectively), the authors developed two questionnaire versions for these clusters to avoid using very long questionnaires (they labelled these questionnaires RH-1 and RH-2 for relationship harmony and SH-1 and SH-2 for soft-heartedness). They identified anchor items and included them in both versions of the questionnaires, whilst they divided the remaining items randomly between the two versions. They rated items on a five-point Likert-type scale, where responses ranged from 1 (strongly disagree) to 5 (strongly agree).

Research procedure
The authors obtained permission to administer the questionnaires and ethical clearance from the different universities and training institutions. They explained the purpose of the SAPI project to the participants and they gave their informed consent by completing and returning a consent form. The authors adhered to all the necessary ethical guidelines when they collected the data through the surveys and ensured the participants that they would handle the data confidentially.
The SAPI project members collected the data. The authors administered the conscientiousness, integrity, intellect, relationship harmony and soft-heartedness questionnaires using a paper-and-pencil format, whilst they administered the emotional stability, extraversion, facilitating and openness clusters via the internet. Each participant had to complete only one of the 11 questionnaires. Data collection stretched over a period of eight months. Student participants received course credit for participating in the study.

Statistical analysis
The authors performed the statistical analysis using the SPSS package (SPSS Inc., 2010). They inspected the data for missing and/or unexpected values. They checked the minimum and maximum values, as well as the means and standard deviations to determine their plausibility. The authors then investigated the skewness and the kurtosis coefficients of the items from the questionnaires and identified items with an absolute value for skewness of > 2 or for kurtosis of > 4. They excluded these items from further analyses.
The authors then investigated the item loadings with total score (cluster score) and performed a principal component analysis of items. They requested one component and inspected the component matrix to identify items with absolute loadings of < 0.20. Although they could have set criteria that are more stringent for the component matrix, they decided to be over-inclusive at this stage of the analyses and to remove only the weakest items systematically.
The authors examined the item loadings with facets and repeated the procedure to determine the item loadings with the total score for the items within the facets of the questionnaires. During the analyses, they selected only the items that they intended to represent a particular facet for the principal component analysis. The authors retained one component and inspected the loadings of the items. The authors expected that all the items would have relatively large loadings (> 0.30). They removed items with low loadings (< 0.30) because they gave early indications that the items were not functioning in accordance with expectations.
Finally, the authors calculated Cronbach's alpha coefficients for the various facets in order to assess the reliability of the facets they measured. According to Frisbie (1988), highly acceptable reliabilities for standardised tests will yield test scores between 0.85 and 0.95. The guidelines that Cicchetti (1994) described suggest that one should regard clinical significance as unacceptable when a reliability coefficient is below 0.70. One should regard reliability coefficients of between 0.70 and 0.79 as fair, reliability coefficients of between 0.80 and 0.89 as good and reliability coefficients of 0.90 or above as excellent. However, one could regard a reliability coefficient of 0.70 or higher as acceptable during the research and when developing instruments (see Nunnally & Bernstein, 1994).

Psychometric properties of the South African Personality Inventory items
The results show that, of the 2573 items the authors developed, 36 items from the Conscientiousness, Integrity, Openness, Relationship Harmony and Soft-heartedness clusters yielded an absolute value for skewness of > 2 and for kurtosis of > 4. Taking into account the item correlations with the total scores of the various clusters, the authors eliminated 219 items because they shared less than 5% of their variance with the total score. After eliminating these items, only 17 items appeared not to represent the particular facet for which the authors wrote them. The authors excluded these items from further analyses. Finally, the authors also excluded 33 items (including items from all the clusters except Openness), which decreased the reliability of the various clusters, from further analyses. The authors eliminated 305 items during this process (see Table 4).

Reliability
The results showed that the Cronbach's alpha coefficient values of 16 of the facets were lower than 0.70. These facets consisted of between one and four items and were mostly from the soft-heartedness and conscientiousness clusters. The authors excluded these facets from further analyses. The remaining reliability coefficients for the various clusters ranged between 0.71 (conscientiousness: rebellious) and 0.97 (openness: religiosity).

Factor analysis
The section below presents the various factor solutions for each of the SAPI clusters. Factor analysis determined the construct validity of the clusters and facets.
Firstly, the authors performed a simple principal component analysis on the facets in the various clusters to determine the number of factors to extract. They did this by investigating the clusters' eigenvalues, scree plots and parallel analysis outcomes. Before performing the principal component analyses, the authors assessed the suitability of the data for factor analysis by inspecting the Kaiser-Meyer-Olkin values and the Bartlett's Test of Sphericity. Table 5 summarises the findings for the nine clusters.
The Kaiser-Meyer-Oklin values for all the scales were acceptable and the Bartlett's Test of Sphericity was significant in all instances. This confirms that the authors could perform exploratory factor analyses on the various data sets.
Depending on the suggestions from the three indicators (eigenvalues, scree plot and parallel analysis), the authors examined one, two, three, four, five or six factor solutions  for the various subscales. Consequently, they used the factor solution that seemed to be theoretically and psychometrically the most sound in each instance.
After determining the number of factors to extract for each of the SAPI clusters, the authors performed maximum likelihood analyses on the range of facets for each of the clusters. The variance that these factors explained ranged between 38% and 66%. The scales the authors derived from these factor-analytic results in each cluster yielded adequate Cronbach's alpha values that ranged from 0.74 to 0.95 (see Table 5).

Discussion
The overall aim of the SAPI project was to develop and validate an indigenous personality instrument that will measure personality in a valid, reliable, equivalent and biasfree way irrespective of peoples' culture, race, language or socioeconomic background (see Nel et al., 2012;Valchev et al., 2011;Valchev et al., 2013).
This study focused specifically on developing an item pool to measure the various personality facets, sub clusters and clusters that the researchers identified in the first stage of the SAPI project (Nel et al., 2012) and on determining which of these items assesses the specific facets, sub clusters and clusters in a realistic and consistent way.
The discussion below focuses on the performance of the 2573 items the authors included in the item, validity and reliability analyses. The reason for this process was to establish which items showed a normal distribution and loaded onto both the facet and cluster the authors intended them to measure. In addition, the analysis aimed to establish the internal consistency between items for each facet. When developing instruments, Worthington and Whittaker (2006) suggest that it is vital to eliminate as many items as possible in the earlier stages of item analysis. It ensures that one measures the factors (clusters in this case) one extracts subsequently using meaningful items for the final instrument.

Psychometric performance of the items
A secondary objective of this study was to establish which items showed a normal or atypical distribution.
According to Rouse (2007), this method of analysis is important because one should identify major disparities in the parameters early in a study in order to disregard biased items before continuing with factor analysis. This may also be the ideal way to identify difficult and discriminative items (Ross, 2005).
The results of this study showed that the authors eliminated only 1.4% of the items because of atypical distribution. The authors identified as many reasons as possible for these results.
For example, the items that related to 'laziness' and 'cleaning' (both from conscientiousness) showed high skewness and kurtosis. The data collection context may have caused these findings.
The authors first collected the data for Conscientiousness at the police services using a sample of entry-level recruits.
Most of the participants either agreed or disagreed with both statements in order to show themselves in a favourable light. Therefore, the authors eliminated these items because they may include a socially desirable component that depends on the context of measurement.
Another two types of items, which related to 'proudness of values' and 'proudness of family', also showed atypical distributions (both are from relationship harmony). These items could be problematic because the facet they should have measured was Proud. However, statistical analysis showed that participants might have attributed this pride to two different things (values and family). As a result, the authors eliminated both items.
After this process, the authors conducted a validity analysis on the items (Costello & Osborne, 2005). The objective was to establish whether each item actually measured its intended cluster. Because of this analysis, the authors eliminated 8.51% of the items because they did not correlate with the total score of the cluster. This indicated that certain items did not measure the expected cluster and the authors should have included them in a different cluster or disregarded them.
For example, the authors initially included an item that measures 'to do what you say' in the Conscientiousness cluster. Although this item appeared to measure the facet it should have measured, it failed to measure the overall conscientiousness cluster (within which the facet falls). Therefore, the authors thought that retaining this item was problematic because the objective was to retain items that measure their intended facet and cluster. This item may have fitted better into another cluster, like emotional stability.
When the authors re-assessed the definition of emotional stability, the element of unpredictability seemed to be more relevant for this type of item. However, the emotional stability cluster also included some irrelevant items. Another example of an item that failed to measure the overall cluster was 'to wash hands repeatedly'. Although this type of item (which also assesses compulsive behaviour) seemed relevant for inclusion in the overall emotional stability cluster, it failed to load. This may be because the item assessed a specific element of psychopathology and participants who display that type of behaviour may not be prepared to recognise that element of their behaviour or may be in denial (Andersson et al., 2011;Mataix-Cols, Marks, Greist, Kobak, & Baer, 2002).
After this process, the authors inspected the item loadings to identify items that actually loaded on the specific facet they should have measured. The results showed that only 0.66% of the items failed to load. This showed that these items measured the cluster but not the specific factor they should have measured. Examples from the relationship harmony cluster included items that measured 'strong will' and 'to keep from getting involved in disagreements'. It is possible that the latter item shows social desirability and this could explain the finding that the item did not load onto the intended facet. The first item may be ambiguous as one could evaluate the word 'strong' physically, although this was not its intention. The authors deleted both items from the item pool.
The next objective was to establish the internal consistency of items that measure a specific facet.
The authors eliminated only 33 (1.28%) of the items during this stage. Most facets showed acceptable internal consistency (more than the guideline of α ≥ 0.70 that Nunnally & Bernstein, 1994) set. The authors either eliminated the facets that showed a lower internal consistency completely or removed the items in these facets that contributed to the lower reliability. Examples of facets that they disregarded during this stage include: gullible and serious (both from soft-heartedness); fair and discriminative (integrity); performance oriented and thorough (conscientiousness); and tolerant (relationship harmony). These facets contained too few items to yield an adequate reliability (like agreeable from soft-heartedness) or their items showed low internal consistency. Therefore, the authors considered these facets as not appropriate enough to measure the specific cluster.
The role of the separate quantitative studies (each of which measured a cluster of the SAPI) was to establish dimensions and their items (Wittes & Brittain, 1990). Of the 2573 items the authors initially developed, they retained 2268 items after the quantitative studies. It meant that they eliminated 305 items and retained 88% of them.
This created a challenge for constructing the SAPI because the authors retained so many items. In the diverse context of South Africa, constructing items is an extremely sensitive undertaking because psychological instruments must adhere to the guidelines that the Employment Equity Act (1998) has set. This means that items must measure the same facet in the same way in the different groups and be free of ambiguity, misperception and complexity (Ross, 2005).
This study identified a small number of items that the authors disregarded because they did not meet these criteria. It is also worth noting that most facets showed acceptable internal consistency. This shows that there is low measurement error and that the items measure the intended facet fairly well (Suen & McClellan, 2003).
Additional analyses examined the extent to which the (quantitative) factor analytic results converged with the (qualitative) content analysis of the items, as Nel et al. (2012) identified in the previous phase of the project. According to Nel et al. (2012), Conscientiousness consisted of five sub clusters.
However, the facets distributed themselves around a single overall factor (or sub cluster). This was also the case with Integrity (originally two sub clusters), Openness (originally four sub clusters) and Facilitating (originally two sub clusters). It seems that these facets interrelate more strongly than was originally perceived. It is also possible that the items that measure the facets are too similar and are, therefore, unable to distinguish between the facets (Norris & Lecavalier, 2010).
Emotional stability included six sub clusters in the qualitative analysis. However, the quantitative analysis yielded only four factors (sub clusters). The four factors are neuroticism, dignified, ill-temperedness and emotional stability. Their current arrangement is similar to the original composition.
The authors retained the Neuroticism sub cluster with a similar formation to the one they identified in the qualitative stage. The other facets distributed beneath the three other factors. Some of the elements that appeared to relate to emotional control, emotional compassion and self-esteem loaded onto the dignified factor (some of these facets initially clustered beneath the emotional control, emotional sensitivity and ego-strength sub clusters (see Nel et al., 2012).
The temperamental facets (or states) loaded onto the Illtemperedness factor and the more trait-like facets onto the emotional stability factor. These facets were initially composites of the sub clusters of balance, courage, ego-strength and emotional control. Most of the facets that comprised the new sub cluster (labelled emotional stability as well) come from four different sub clusters founded in the qualitative stage. This made the label appropriate.
Extraversion and Intellect each consisted of four sub clusters. However, the authors extracted only three factors (sub clusters) for extraversion and two for Intellect. For extraversion, the factors that the authors extracted were gregariousness, introvertedness and candidness. The extroverted and introverted qualities (initially clustered collectively beneath the Sociability, positive emotionality and dominance sub clusters) appear to have split to form the gregariousness and introvertedness factors respectively. Only the Candidness factor retained the predominant facets of the initial Expressiveness sub cluster. For Intellect, the authors labelled the two sub clusters they extracted Aesthetics and Intellect. All the facets relevant to artistic and creativity appear to have loaded onto the Aesthetics sub cluster, whilst the remaining facets loaded onto the Intellect sub cluster.
In both versions of Relationship Harmony, the authors extracted only two factors, or sub clusters, (Positive Relational Behaviour and Negative Relational Behaviour), although they extracted four sub clusters from the original data (Nel et al., 2012).
For the final cluster, the authors extracted soft-heartedness, two sub clusters from version 1 (detrimental behaviour and good-naturedness) and four sub clusters from version 2 (detrimental behaviour, active support, egoism and amiability).
This differed from the original composition that included six sub clusters. An examination of the differences between the original relationship harmony and soft-heartedness clusters and the current composition of these clusters suggests that the positive and negative facets clustered together in both clusters, thus leading to the extraction of fewer factors.
Therefore, the authors conclude that the overall personality structure that Nel et al. (2012) proposed differs to some extent from the personality structure this study found.
It is possible to suggest some reasons for these differences.
The first reason may relate to the nature of the facets that the authors measured. The authors constructed the items so that they asked the participants to evaluate themselves on a 5-point scale by agreeing or disagreeing with statements. During the factor analysis it was evident, on the facet level, that the negative facets grouped together and the positive facets grouped together (specifically in relation to the relationship harmony and soft-heartedness clusters).
A study by Pettersson and Turkheimer (2010) also reported this tendency. They found that, when one includes evaluative terms in an overall personality instrument and the number of negative and positive statements does not balance, the negative facets and positive facets tend to split when one performs exploratory factor analysis. Therefore, Smith (2003) suggested that an individual facet should include an equal number of negative and positive items in order to derive data that are more substantial.
The second reason may relate to including various social desirability elements in the personality structure that Nel et al. (2012) identified. In both the relationship harmony and softheartedness socially desirable elements (caring, loving and peacekeeping), and socially undesirable elements (abusive, provoking and argumentative) are present. Integrity includes more desirable elements (loyal, truthful and trustworthy) than it includes undesirable ones (like pretending). It is possible that participants were inclined to agree more with the desirable items than with the undesirable ones and this might have distorted the factor analysis of the clusters and sub clusters. Previous research findings indicate that the effects of social desirability may alter the accuracy of a derived personality structure (Bourgeois, Loss, Meyers & LeUnes, 2003).
A third reason may be that some of the facets did not cluster correctly beneath the relevant clusters or showed double loadings onto more than one sub cluster. Three of the facets, particularly musical (intellect), religiosity (openness) and materialistic (openness), failed to load onto the cluster they were supposed to measure. This implies that these facets may be better suited for inclusion in another cluster. Facets that showed double loadings were more prevalent than were those with no loadings. This suggests that the composition of some facets may be uncertain and that one needs to refine them further in order to distinguish these facets. Most of the double loadings centred towards the emotional stability cluster. The facets short-tempered, depressive and needy loaded onto more than one sub cluster. These facets related to being easily angered, being sad most of the time and needing to be wanted or accepted (Nel et al., 2012).
The development of items for these facets should focus on content distinction to avoid overlap across facets.

Limitations of the study
This study has limitations. The first is that the authors conducted 11 different studies to assess the nine clusters. Therefore, one can make only limited inferences about the overall personality structure, internal validity of the structure and the composition of the facets, sub clusters and clusters in this structure. The 2573 items the authors developed to measure the clusters in the SAPI made a single study unrealistic and served as the motivation for conducting 11 different studies.
Another limitation is the different modes the authors used in the 11 studies (which included using 11 different questionnaires). Participants completed some questionnaires using the paper-and-pencil format, whilst they completed other questionnaires via the Internet. The different modes may have yielded different approaches from participants, depending on the mode with which the particular participant felt more comfortable. According to Davies, Foxcroft, Griessel and Tredoux (2005), computer anxiety may distort results if participants are not computer literate. Using crosssectional designs in the different studies limited the ability to make interpretations about the causal effect of the items' performance. The authors could have avoided this problem if they had used a longitudinal design.
The sampling method the authors used in most of the studies is another limitation. Using non-probability convenience sampling limited the inclusion of diverse groups in the studies. This further restricted the inferences the authors made about the performance of the items.
Furthermore, this study used only students, limiting the possibility to draw conclusions about maturational effects. However, changes in personality generally refer to mean level shifts rather than to structural changes (see Allemand, Zimprich, & Hendriks, 2008). Therefore, one can expect that the outcomes of the authors' analyses of scale properties will generalise to other age groups.
The final restriction is the use of English throughout the studies. Only 15.7% of the 6735 participants indicated that they used English as their home language. This means that participants may have misinterpreted items, despite the fact that the authors used certain procedures to reduce misinterpretation. These procedures included using cultural and language experts to evaluate items beforehand to determine whether participants from the 11 official languages were able to understand each English statement so that the authors did not include ambiguous and/or dubious items in the questionnaires.