I nterference between work and nonwork roles : t he development of a new s outh a frIcan Instrument

Orientation: The interference between work and personal life is a central issue in the 21st century as employees attempt to balance or integrate their involvement in multiple social roles. Research purpose: The purpose of this study was to, (1) develop new items for a more comprehensive work−nonwork interference instrument, (2) evaluate the newly developed items to retain those items that accurately capture the different dimensions and (3) eliminate undesirable items from the different subscales in the instrument. Motivation for the study: Although the interaction between work and personal life has received extensive attention in the work−family fields of research, various theoretical, empirical and measurement issues need to be addressed. Research design, approach and method: A cross-sectional survey design was used to collect the data. Main findings: Initially, 89 items were developed. During the pilot study among mineworkers ( n = 245), 41 poor items were eliminated on the basis of descriptive statistics, inter-item correlations, item-total correlations and the qualitative investigation of items highly redundant in terms of wording. Thereafter, the instrument (48 items) was administered to 366 support and academic personnel at a tertiary institution. Using Rasch analyses and item correlations, 18 additional items were eliminated, resulting in a 30-item instrument (15 items were retained to measure work-nonwork interference and 15 items to measure nonwork-work interference). Practical/managerial implications: A major theoretical limitation to the measurement of work− family interference relates to the dimensionality and inconsistent measurement of the directionality of interference Contribution/value-add: With the development of this new instrument, several of the theoretical and measurement limitations voiced by previous researchers have been addressed, providing this instrument with distinct advantages over previous work−family instruments.


INTRODUCTION Key focus of the study
A widely studied topic in Occupational Health Psychology is the interaction between work and family. Researchers have devoted considerable attention to examine the interrelationships between these domains (Eby, Casper, Lockwood, Bordeaux & Brinley, 2005;Lewis & Cooper, 2005) and valuable insights have been gained from different perspectives as well as in different disciplines. However, one important issue that has not yet been addressed adequately is the one of measurement. Although the measurement of work−family relations has progressed substantially over the past decade, with a variety of work−family measuring instruments available in international literature, important theoretical and measurement issues or critiques are being raised against the existing instruments. These issues could pose serious problems for the interpretation of past research results and the future measurement of work−family interference (Bellavia & Frone, 2005;Frone, 2003;Geurts & Demerouti, 2003;Tetrick & Buffardi, 2006).

Background to the study
When considering the measurement of work -family interference, several theoretical and measurement issues are raised. Major theoretical limitations identified by previous researchers pertain to the directionality and/or dimensionality of work−family interference (e.g. the conceptualisation of unidirectional or bidirectional constructs in work−family relations) and the inconsistent use of terminology to explain the relations between work and family (Bellavia & Frone, 2005;Frone, 2003;Geurts & Demerouti, 2003;Tetrick & Buffardi, 2006). In addition, some of the issues pertaining to the measurement of work−family interference raised by previous researchers include the wording of items, the use of appropriate response anchors and scales, and the lack of comprehensive procedures for the development of scales (Bellavia & Frone, 2005;Frone, 2003;Netemeyer, Boles & McMurrian, 1996;Small & Riley, 1990;Tetrick & Buffardi, 2006). The use of specific items in the measurement of work-family interference is especially problematic, as various items in existing instruments confound with external variables, causes and consequences. Also, the number of items needed to measure the different directions of interference (i.e. work−to−family and family−to−work interference) differs and is inconsistent (e.g. measuring work−family conflict with five items and family−work conflict with two items). This becomes problematic for the elimination of poor items, because some dimensions (due to the unequal number of initial items) may end up with only one item measuring the dimension, resulting in reliability issues.
In addition to the above-mentioned limitations, there is another limitation that this study specifically sought to overcome. The majority of studies measure the conflict or interference between work and family (for overviews, see Byron, 2005;Eby et al., 2005;Mesmer-Magnus & Viswesvaran, 2005). However, in review papers on work -family interference it is recommended that interference between work and other or additional nonwork dimensions or roles outside the work domain be acknowledged and measured (Bellavia & Frone, 2005;Frone, 2003;Geurts & Demerouti, 2003;Kirchmeyer, 1992;Tetrick & Buffardi, 2006). Even though the majority of researchers use 'family' synonymously with 'personal life' and other dimensions outside work, it is well-established that individuals are involved in multiple social roles outside of work, including the spousal role, parental role, religious/spiritual role, leisure role, social role, and community role (Lingard & Francis, 2005;Plaisier et al., 2008). As the dynamics between work and these roles are very complex, researchers have suggested that work could interfere differently with these specific roles (Aryee, 1992;Day & Chamberlain, 2006;Doumas, Margolin & John, 2008;Holahan & Gilbert, 1979;Kirchmeyer, 1992;Kossek & Ozeki, 1998;Mallard & Lance, 1998;Simon, 1995Simon, , 1997.

Potential value added by the study
Although it has been proposed in the literature, few researchers have measured the conflict between work and more specific nonwork dimensions or roles (Aryee, 1992;Frone & Rice, 1987;Holahan & Gilbert, 1979;Small & Riley, 1990). The majority of these measure the conflict with specific family roles (e.g. spousal and parental roles) and one or two other nonwork roles (e.g. homemaker role, marital relationship role), but only in one direction (work→nonwork role conflict). The only scale measuring both directions of conflict between work and various other nonwork roles or dimensions is the multirole work−family conflict (WFC) scale of Premeaux, Adkins and Mossholder (2007), which measures the conflict in both directions between work and additional spouse, parental, homecare and leisure roles. However, the items of this instrument also confound with causes and consequences (e.g. 'Because I am often stressed from my marriage/relationship I have a hard time concentrating on my work').
It seems important to investigate the interference between work and different nonwork roles. Individuals employed in demanding professions are under severe pressure due to their active involvement in various social roles -amongst others the role of spouse/life partner, parent, family member and homemaker (Barnette & Baruch, 1985;Holahan & Gilbert, 1979;Lingard & Francis, 2005;Pietromonaco, Manis & Frohardt-Lane, 1986;Plaisier et al., 2008;Schultheiss, 2006;Simon, 1995;Small & Riley, 1990). Employed individuals are also confronted with simultaneous pressures from these different roles (e.g. work demands and family demands) and are often pressured by the different role players to engage in various activities (e.g. employees being pressured by their managers to engage in work activities, whilst also being pressured by their spouse and children to engage in family activities) (Greenhaus & Powell, 2003). For many individuals, having to integrate their involvement in multiple social roles is especially challenging, because they are occupied by many role relationships simultaneously (Pietromonaco et al., 1986), with each role requiring a specific amount of time, energy and psychological absorption (Small & Riley, 1990).

Measuring instruments for work-family interference
In the literature, the most widely used definition for the interference between work and family life is the one of Greenhaus and Beutell (1985), which states that work−family conflict presents a form of inter-role conflict in which role pressures from the work and family domains are mutually incompatible in some respects, and where conflict can occur in either direction (i.e. work−family conflict or family−work conflict). Consequently, researchers have expanded the concept of conflict by outlining the direction of conflict, the roles that are affected, and the nature of the conflict (see overviews by Allen, Herst, Bruck & Sutton, 2000;Byron, 2005;Mesmer-Magnus & Viswesvaran, 2005). There are various measuring instruments that attempt to address or measure this phenomenon. A comprehensive and extensive summary of instruments developed before 2000 can be found in Carlson, Kacmar and Williams (2000).
From the literature it is clear that various instruments are available to measure the conflict or interference between work and family. These instruments can be summarised as instruments that measure, (1) only the interference from work to family, (2) both directions of interference, (3) the interference of the work domain with other nonwork dimensions or roles, (4) the interference of nonwork roles in the work domain and (5) both directions of interference between work and various nonwork dimensions or roles.

Theoretical limitations
One of the major theoretical limitations to the measurement of work−family interference relates to the dimensionality and inconsistent measurement of the directionality of interference. Earlier studies of work and family relations conceptualised the interaction or relation between the work and family domains as unidimensional, with global or general measures of WFC being employed (Bedeian, Burke & Moffett, 1988;Burke, 1988;Cooke & Rousseau, 1984;Kopelman, Greenhaus & Connolly, 1983;Thomas & Gangster, 1995). In other words, these studies conceptualised and measured the conflict between work and family as one dimension that did not distinguish between the direction of interference. They therefore ignored the possible conceptual distinction of the interference between work and family.
According to Greenhaus and Beutell (1985), these measures and studies failed to capture the dimensionality inherent in WFC and they suggested that work−family conflict presents a form of inter-role conflict in which role pressures from the work and family domains are mutually incompatible and conflict can occur in either direction (i.e. work being in conflict with family, or family being in conflict with work). This notion of two separate dimensions, in terms of which WFC is considered conceptually bidirectional in nature (i.e. work→family conflict and family→work conflict), was empirically supported by earlier researchers (Frone, Russell & Cooper, 1992;Gutek, Searle & Klepa, 1991). Since then, a growing body of research has documented the directional facet of work-family interaction, which validated the conceptual distinction between work− family and family−work interference/conflict (Carlson & Frone, 2003;Curbow, McDonell, Spratt, Griffen & Agnew, 2003;Geurts et al., 2005;Grzywacz & Marks, 2000;Mesmer-Magnus & Viswesvaran, 2005). Although various bidirectional instruments are available (Carlson & Frone, 2003;Curbow et al., 2003;Geurts et al., 2005;Grzywacz & Marks, 2000;Netemeyer et al., 1996), the majority of these almost exclusively measure the interference of work in the family domain.

Measurement limitations
One of the key measurement issues concerning the items of existing instruments is the development and use of items that confound with external variables, causes or consequences. According to Bellavia and Frone (2005), the measurement of work−family relations becomes problematic when items measure the relation between a cause or consequence and work−family relations simultaneously. In other words, assessing a construct in terms of its cause or consequence makes the data collection on those causes and consequences meaningless, as the cause and the consequence are already given in the item. For instance, when a researcher attempts to measure the interference from work to family, but includes a cause (e.g. work pressure) and a consequence (e.g. irritability) in the work−family conflict item (e.g. 'You are irritable at home because your work is demanding'), the relation with external variables (e.g. work pressure and irritability) will be inflated. This relation is therefore not meaningful, as the item is already correlating with measures of the cause and consequence.
An additional issue closely related to confounded items is the use and measure of specific types of conflict (e.g. time-, strainand behaviour-based conflict). These items are developed on the basis of the consequences of the conflict and do not necessarily measure conflict per se, but rather the time, strain and behaviour consequences of the conflict or interference (Bellavia & Frone, 2005;Frone, 2003). Take, for example, a time-based interference item 'Your work takes up time that you would have liked to spend with your spouse/friends/family'. In this item, you do not have time to spend with your family and friends as a result of the interference of your work. Therefore, work takes up available time and as a result you do not have time to spend with your family. In this item, it is clear that you are faced with a consequence that involves your time and that the item was based on the consequence (time-based consequence) of the interference.
With regard to the number of items used in existing instruments, two main issues are raised by previous researchers, namely the use of single-item measures and the inconsistent use of the number of items measuring the two directions of interference (Bellavia & Frone, 2005;Netemeyer et al., 1996;Small & Riley, 1990;Tetrick & Buffardi, 2006). Single-item measures pose the problem of random measurement error and may not adequately assess the domain of the construct (Nunnally, 1988). When items measuring the two directions of conflict are not parallel in construction (e.g. comparing five WFC items with two FWC items), the items do not really measure the dimensions or direction consistently.
Researchers have also raised the issue concerning the use of response anchors, where some studies measure the frequency of occurrence of work−family interference (e.g. a response anchor ranging from never to always) (Geurts et al., 2005;Grzywacz & Bass, 2003;Grzywacz & Marks, 2000), while other studies provide only the experience of work−family interference (Kirchmeyer, 1992;Netemeyer et al., 1996;Small & Riley, 1990;Stephens & Sommer, 1996). As the second type of response anchor (e.g. agree/disagree) does not provide additional information on the frequency of interference, this becomes problematic -a strongly agree response might merely represent the individual's level of certainty that work -family conflict occurred. It therefore is impossible to estimate even the prevalence of conflict that occurred. However, with the use of frequency-based response anchors, researchers can assess the prevalence of work−family conflict with more confidence.
Finally, the lack of rigorous procedures for the development of scales for work−family instruments is raised as a concern (Tetrick & Buffardi, 2006). Although clear guidelines for scale development are given in the psychometric literature, few work−family scale development studies adhere to or follow these guidelines. A variety of scale development studies regarding work−family interference are available in the literature Curbow et al., 2003;Grzywacz & Marks, 2000;Kirchmeyer, 1992;Netemeyer, 1996;Premeaux et al., 2007). However, limited information is given on the procedures used for scale development, which could have provided strong evidence for the validity of these studies. Few work−family studies adhere to basic scale development procedures or thoroughly describe these procedures Geurts et al., 2005;Mallard & Lance, 1998;Netemeyer, 1996).

Item evaluation and the Rasch model
A crucial feature of scale development is the number of items used in a scale to measure the specific construct(s) (DeVellis, 2003). In the psychometric literature, the number of items is closely related to the reliability of the scale, which is one of the most important indicators of a scale's quality (DeVellis, 2003;Foxcroft & Roodt, 2005). According to DeVellis (2003), researchers are constantly confronted with the matter of developing shorter scales with fewer items without influencing or compromising the reliability of the scale. Various researchers have indicated that shorter scales are more advisable, as they place less of a burden on the respondents (Netemeyer et al., 1996;Stephens & Sommer, 1996). Researchers therefore have the responsibility to develop instruments with the fewest possible items, but still measure the construct(s) adequately. Consequently, in order to obtain only the items that most adequately represent the construct(s), special attention is paid to item evaluation and item elimination during scale development.
In the psychometric literature, the classical test theory (CTT) and item response theory (IRT) are recognised as fundamental theories for the development and analysis of standardised instruments. According to Allen and Yen (2002), CTT relies on minimal assumptions that can be interpreted with relative ease. CTT, however, is based on the fundamental assumption of an individual having a true score and an observed score, where the differences between the two scores are attributed to measurement error. The main criticism against this theory lies in the observed score, which relies on the content of the instrument (test), making it possible for individuals with similar trait levels to score differently depending on the item bias (Fan, 1998).
According to Tennant and Conaghan (2007), the Rasch measurement model has become the standard for modern psychometric evaluations of outcome scales. It is a unidimensional measurement model that is based on the assumption that the items summed together form a unidimensional scale. With the Rasch model, the operating characteristics of all the items are examined across the whole continuum of a latent trait, seeing that the model is based on latent trait theory (Hagquist, 2007). With the development of the new instrument in this study, the Rasch measurement model is applicable, and is used to evaluate all the items measuring the various dimensions of the work-nonwork interference instrument.
In contrast to the CTT, the fundamental assumption associated with the IRT is that the latent traits of individuals are independent of the content of an instrument, thereby enabling researchers to compare various individuals' latent traits sensibly even if different items are applied. According to Meads and Bentall (2008), IRT is a general statistical theory about item (question) and scale (questionnaire) performance and how that performance relates to the factor(s) that is or are measured by the items in the scale. In IRT there are different models with varying complexities. These models include one-, two-and three-parameter IRT models, with the simplest logistic latent trait IRT model being the Rasch one-parameter model (Rasch, 1960).
The basic criterion of invariance, which is a crucial feature of fundamental measurement, is reflected in the Rasch model (Bond & Fox, 2007). As invariance means that an instrument is required to work in the same way for all individuals, invariant functioning across any group of respondents is implied. According to Bond and Fox (2007), the Rasch model states that the probability of a person to correctly answer an item is a logistic function of the person's ability minus the item difficulty. Within the framework of the Rasch model, person ability refers to the level of the construct being measured, whereas the item difficulty refers to the intensity of the item rather than the difficulty of the item. In addition, the Rasch model also takes into account the different categories on the scales, where persons with the same ability (or level of the construct) will respond differently for items with different intensities. The model therefore indicates that the probability of a person selecting a certain point on a scale is the logistic function of that person's ability minus the item difficulty (intensity), plus the difficulty of the threshold between the current scale category and the next category (Bond & Fox, 2007).
The use of Rasch analysis specifically for the development and analysis of questionnaires or instruments has recently increased in the field of psychology and psychiatry (Betemps & Baker, 2004;Cervellione, Lee & Bonanno, 2009;Merrell & Tymms, 2005;Pallant & Tennant, 2007;Prieto, Alonso & Lamarca, 2003). The use of Rasch analysis in assessing questionnaire scaling properties, as opposed to the use of classical test theory and factor analysis (Prieto et al., 2003;Wright, 1996), is preferred by several authors, given that factor analysis does not necessarily provide a conceptual linear assessment of the construct and may provide misleading evidence (Waugh & Chapman, 2005;Wright, 1999). According to Meads and Bentall (2008), the use of the Rasch model only informs whether items can be considered unidimensional; however, it is only once unidimensionality has been confirmed that it is justifiable to claim that the items measure one construct. Although the Rasch model has recently been used in some South African studies (De Bruin & Taylor, 2005;Kagee & De Bruin, 2007;Maree, Maree & Collins, 2008a;2008b;Mpofu et al., 2006;Potgieter, Davidowitz & Venter, 2008;Rothmann, 2010;Taylor, 2008), no studies could be found that use the Rasch model specifically for scale development within the field of work and family research.

Involvement in multiple social roles and the role identity theory
The notion of an individual's involvement in various social roles is closely related to the identity theory of Stryker (1968), which was originally built on the assumptions, definitions and propositions of the symbolic interactionism perspective (Mead, 1934). The basic concepts of the symbolic interactionism perspective were, however, redeveloped and refined in the development of role identity theory. According to identity theory, an individual's self-concept is the product of social interaction and is a multifaceted social construct that emerges from the roles occupied in society (Burke, 1980;McCall & Simmons, 1966;Stryker, 1968Stryker, , 1980Stryker & Serpe, 1992). For each of the roles occupied in society there are distinct role identities for individuals. For instance, a person's role identities may include being a mother and being a wife. According to Burke (1980) and Thoits (1991), role identities are self-conceptions and self-definitions that individuals apply to themselves based on the structural role positions they occupy in society, and these role identities provide meaning for them. The self-concept of an individual is therefore made up of a variety of role identities, which are the meanings that individuals attribute to themselves by occupying a particular position or being in particular role relationships (Burke, 1980;McCall & Simmons, 1966;Stryker, 1968;Wiley, 1991).
Role identity theory further states that identities are organised hierarchically in the self-concept on the basis of the saliency of the roles. In short, the salience hierarchy represents the probability that a particular identity will be evoked in particular situations, and it is closely related to the commitment of the individual to the various role identities (Burke, 1980;McCall & Simmons, 1966;Stryker, 1968Stryker, , 1980Stryker & Serpe, 1992). According to Stryker (1980), the greater the commitment, the more salient the identity and the more likely that the individual will choose behaviours confirming the particular identity in a particular setting. Consequently, the commitment to a particular role identity affects its saliency and therefore the likelihood of acting in a way that confirms the identity.
According to Wiley (1991), identity theory can also be used to explain the source of stress or conflict that individuals experience when occupying multiple social roles. Individuals occupy a variety of roles to which certain role identities are linked. The saliency that each individual attaches to these roles may vary. Individuals view certain roles as being more important than others, which will result in the individual choosing behaviours that will confirm the more salient identities. For example, individuals who attach high salience to their parental role will more likely choose to participate in activities that confirm the parent identity in the self-concept, for example, spending time with their children as opposed to participating in the activities of another, less salient role (e.g. leisure role). Wiley (1991) suggests that individuals may experience stress due to conflict between their actions confirming disparate identities. In other words, when individuals are faced with a choice between role behaviours that confirm identities of similar salience or commitment, conflict arises (e.g. individuals experience conflict when they have to participate in the activities of two salient roles simultaneously -spending time with their children and spending time with their husband). From the perspective of their role identity, individuals will experience interference between their work role and other nonwork roles when the roles that they occupy are similarly salient and/or when they lack the opportunities to participate in certain roles.
Construct measurement is of great importance when studying the interference between work and private life. However, it is clear that there are theoretical and measurement issues that need to be addressed. The purpose and contribution of this study was to develop an instrument -grounded in the conceptual literature and following rigid scale-development procedures -to measure the interference between work and different nonwork roles in both directions. In achieving these objectives, this study sought to overcome the limitations regarding directionality, the use of confounded items, the number of items used to measure both directions equally, the narrow focus on the interference between work and specific dimensions, and the lack of rigorous scale development procedures.

Phase 1: Scale development and pilot study
In order to develop the new scale, a fourstep procedure was followed: • initial construct conceptualisation • item generation and evaluation This procedure closely adhered to the procedures described in the psychometric and scale development literature (Boyar, Carr, Mosley & Carson, 2007;Carlson, Kacmar, Wayne & Grzywacz, 2006;DeVellis, 2003;Geurts et al., 2005;Kirchmeyer, 1992;Netemeyer et al., 1996) and is described in more detail below.

Initial construct conceptualisation
Prior to the development of the scale, it was important to define the construct to be measured. Drawing on the theoretical perspective of role identity theory and previous work-family definitions (Burke, 1980;Geurts et al., 2005;McCall & Simmons, 1966;Netemeyer et al., 1996;Stryker, 1968;Wiley, 1991), worknonwork interference was defined as: a process in which the involvement of an individual in one domain (or social role) interferes with the functioning or involvement in another domain (role), where the interference affects the way in which the individual's self-identity is influenced by external stimuli to such an extent that it results in an inadequate Vol. 36 No. 1 Page 5 of 14 performance or behaviour in order to conform to one or more highly-salient identities/roles. (Burke, 1980;Geurts et al., 2005;McCall & Simmons, 1966;Netemeyer et al., 1996;Stryker, 1968;Wiley, 1991) Based on this definition, work→nonwork interference (W−NWI) can be defined as the process through which the involvement in the work role interferes with functioning or involvement in roles in the nonwork domain, whereas nonwork→work interference (NW−WI) is the process through which the involvement in nonwork roles interferes with the functioning in the work role.
On the basis of the social roles outside of work that are mentioned most often in the literature (e.g. parental, spousal and domestic or homecare roles) and unique or specific nonwork roles (e.g. religion/spirituality), mentioned in the qualitative study of Koekemoer and Mostert (in press), only four social roles were used and measured in this study (i.e. parental, spousal, religion/spirituality and domestic roles). Therefore, depending on the direction of interference and the roles interfered with, individuals might experience W−NWI, including work−parent role interference (WPI), work−spouse interference (WSI), work− religion/spirituality interference (WRI) and/or work−domestic interference (WDI), and/or NW−WI, including parent−work interference (PWI), spouse−work interference (SWI), religion/ spirituality−work interference (RWI) and domestic−work interference (DWI).
The following definitions were developed or derived from the literature (Baruch & Barnett, 1986;Glaser, Evandrou & Tomassini, 2006;Pietromonaco et al., 1986) to describe the different roles included in the instrument: • A worker is defined as a person who is currently working or employed and who is actively involved in paid work. • A parent is defined as a person who is providing and/or caring for one or more child(ren) living at home and/or who is or are dependent on the person in some way. • A spouse is defined as a person who is married or is living with a partner with whom he or she has a serious, committed and intimate relationship. • Religion/spiritual role is defined as deriving a personal sense of meaning through religious or spiritual activities. • Domestic role is defined as performing a variety of house chores or domestic activities in order to maintain or provide a well-kept household and/or to enhance the aesthetic appearance of the home environment.
During the process of evaluating the items, two work−family subject matter experts (i.e. researchers in the area of work and family) independently classified the items into the following categories using the above-mentioned criteria: • items 100% correct or applicable for the new scale • items mostly correct but that may require some changes in terms of the wording • items of which some part of the item could be used • time-, strain-and behaviour-based items • items not applicable at all • and items for which some words may be used to construct new items.
The classifications done by these judges (researchers) were then brought together in order to discuss which items could be used in the process that followed. During this process of item evaluation, 77 items were discarded on the basis of the evaluation criteria and categorisation. The remaining items were then used in the item development process that followed.

Item development
During the item development phase, the remaining items from the initial item pool were re-evaluated and adapted in order to fit the different proposed definitions in the best possible way. Some items were also adapted in terms of wording to correspond with the selected frequency-based response format scale: 'How often does it happen that ...', with responses varying between 0 ('never'), 1 ('some of the time') 2 ('most of the time'), and 3 ('always'). The decision to use a frequency-based response format scale where no midpoint is provided was based on suggestions from previous researchers in the worklife interaction field (Bellavia & Frone, 2005;Kirchmeyer, 1992). According to these researchers, frequency data and responses are less biased and fixed-frequency response anchors can shape the respondents' answers and ensure a positive or negative standing on each question.
In addition to the re-adapted items, new items were written for the dimensions that were not present in the initial item pool (e.g. religion/spirituality items) and for scales from the initial item pool that did not have a sufficient number of items that could be used. Additional items were therefore developed in order to ensure that each dimension contained a representative set of items. According to Nunnally (1988), it is always advisable to have at least one-and-a-half to twice as many items as will appear in the final scale so as to have ample room to discard items that work poorly.

Item refinement and item judgement
Following the item development process, attention was paid to item refinement and judgement. During this step, a panel of 10 members from North West University were asked to judge the items. These members included personnel from the Department of Human Resource Management, as well as PhD students from this department in which the primary research area is organisational behaviour and employee well-being. The judges were provided with the construct definitions and were asked to categorise the items into the different scales. They were also asked to indicate which items were difficult or ambiguous. Based on their inputs and feedback, editorial changes were made to some items in the process of refining the items of the scale.
After the scale development process was completed, the next step was to evaluate the items of the new scale. This validation effort was done in two studies. Study 1 was a pilot study in

Pilot study
The main objective of the pilot study was to purify the measure by eliminating undesirable items. According to DeVellis (2003) it is always advisable to administer the items to a developmental sample and to undertake a pilot study in order to investigate the performance of the items before developing the final items for the scale. Also, in order to concentrate on the adequacy of the items, DeVellis (2003) suggests that the sample should be large enough to eliminate subject variance as a significant concern. Although a sample of 300 is generally regarded as adequate, DeVellis (2003) suggests that scales have been successfully developed with smaller samples.

Research design
A cross-sectional survey design was used to collect the data. Cross-sectional designs are used to observe a group of people at a particular point in time -for a short period, such as a day or a few weeks (Du Plooy, 2002).

Participants and procedure
During the pilot study, the newly developed 89-item, multidimensional work−nonwork interference instrument was administered to employees working at a South African mine (n = 245), where a response rate of 49% was obtained. Fifty-five per cent of the respondents were female. Close on two-thirds (63.70%) of the respondents indicated Afrikaans or English as their home language, whilst 36% indicated African languages as their preferred language. The majority of the participants were either White (54.30%) or African (38.80%). The majority of the employees possessed a grade 12 certificate (33.50%) or a university degree (25.70%). Nearly 60% of the participants were between the ages of 20 and 39; 22.34% of the participants were between 40 and 49 and only 15.50% were older than 50.

Item refinement
In order to establish which items were the most desirable to retain for further analyses, a process of item elimination was followed. For this process, certain guidelines from the literature and previous studies were used (Curbow, Spratt, Ungaretti, McDonell & Breckler, 2006;DeVellis, 2003;Foxcroft & Roodt, 2005) to determine items that performed poorly. This process included the investigation of descriptive statistics (mean, standard deviation, variance and distribution), inter-item correlations and item-total correlations, and the qualitative investigation of items highly redundant in terms of wording.

Item elimination process:
To eliminate poor items, the following cut-off criteria were used: • items with a mean closer to the centre of the range of possible scores are more desirable • items with low standard deviations (< 1.00) are less desirable • items with high variance are more desirable.
With regard to the inter-item correlations, more desirable items are items that have a moderate to strong correlation with all the other items. Inter-item correlations should be substantial, with a minimum significant level of 0.05. The items that did not correlate well with other items were therefore discarded. Regarding item-total correlations, items with higher itemtotal correlations are more desirable than items with low values. Positive item-total correlations indicate that the items measure the same thing that is measured by the test. An itemtotal correlation near zero indicates that the item does not discriminate between high and low scores. After the abovementioned criteria were taken into consideration, all the items were evaluated in a more qualitative manner, with attention being paid to item wording. This qualitative elimination technique formed a crucial part in the process to eliminate items in the pilot study, at which time various items were discarded and changed.
After the item-elimination process was completed, only 48 items remained (24 W−NWI items and 24 NW−WI items). The same number of items was retained for all of the dimensions on the basis of recommendations in the literature on scale development (Carlson et al., 2006). As a result, the scale used in the item-evaluation study (Phase 2) included only six items for each dimension.

Phase 2: Item evaluation
Following the development of the scale and the pilot study, 48 items were retained to measure the various dimensions of work−nonwork interference. According to DeVellis (2003) shorter scales are better because they place less of a burden on the respondents, which, in the case of the present study, implied that further item elimination was needed. The main objective of the item evaluation study was to establish how the remaining items of the new scale were performing, which items could be eliminated and which items were most desirable to retain for further validation. The first part of the item elimination in the item evaluation study was based on the Rasch model (Rasch, 1960), whereas the second part paid attention to item correlations.

Research design
A cross-sectional survey design was used in the item evaluation study to collect the data and to attain the research objectives. Cross-sectional designs are used to observe a group of people at a particular point in time -for a short period, such as a day or a few weeks (Du Plooy, 2002). The design is also used to assess inter-relationships among variables within a population and will therefore help to achieve the various specific objectives of this research (Struwig & Stead, 2001).

Participants and procedure
The study sample was obtained from employees working at a tertiary institution in the North West Province. Only married employees with children were selected to participate in the study (n = 366). This decision was based on the conceptual development of the instrument, which is restricted to specific nonwork role interference, including specific roles such as that of spouse and parent. Prior to the study, permission to undertake the study was requested from (and granted by) the university's Ethics Committee. In order to attain the specific sample, lists of all married employees with children were obtained from the various faculties and departments of the university. Using this information, the questionnaires were distributed personally to the selected employees with the help of fieldworkers. A letter was included with the questionnaires to explain the goal and importance of the study. Although the identities of the selected employees were known at first, the questionnaires were returned anonymously. The participants were also assured of the anonymity and confidentiality with which the information would be handled. The participants were given two to three weeks to complete the questionnaires, after which they were collected personally by the fieldworkers. Although most of the participants were White (80.35%), participants from the African (14.75%), Indian (3.00%) and Coloured (0.80%) groups were also included in the sample. Men (34.70%) as well as women (65.00%) were included in the study. The majority of the participants had postgraduate degrees (47.81%), whilst others had university degrees (12.57%), technical college diplomas (6.00%), technikon diplomas (8.20%) or grade 12 certificates (19.95%). In total, 26.77% of the participants worked as administrative assistants, while 25.68% worked in the administrative offices. The majority of the participants worked in academic faculties, including the faculties of health sciences (13.39%), natural sciences (11.46%), education (10.38%), engineering (9.58%), arts (6.83%), economic and management sciences (6.56%) and theology (2.70%). A number of participants worked as lecturers (9.84%), senior lecturers (11.46%), associate professors (6.56%) and professors (7.10%).

Statistical analysis
The Rasch analyses were carried out using the WINSTEPS program (Linacre, 2005). For the purpose of the analyses in this item-validation study, the assumption was made that the newly developed instrument should consist of different unidimensional dimensions, and each dimension was analysed separately using Rasch analyses. The evaluation of the overall dimensionality of the work−nonwork interference instrument was not evaluated or addressed in this study, as the main objective was first to evaluate the performance of the items in each dimension.
During the Rasch analyses, consideration was given to issues relating to reliability, item measures and item fit. Where a Cronbach's alpha is traditionally calculated to express the reliability of an instrument, the Rasch model provides two reliability estimates, namely the person reliability and the item reliability. The person reliability index measures the degree to which the scale can differentiate persons on the measured variables (by subtracting the average person measurement variance from the observed person variance) (Cervellione et al., 2009;Fox & Jones, 1998), whereas the item reliability index measures the degree to which the relative difficulties of items are differentiated along the measured variables (by dividing true item variances by observed item variances) (Cervellione et al., 2009;Fox & Jones, 1998). In other words, the item reliability index measures the extent to which the items in each dimension are able to discriminate between persons and items, as well as the ability to measure the same latent trait. With these reliability calculations, an adjustment for measurement errors is made by not using raw scores; this is a very important advantage for using the Rasch reliability in assessing the functioning of an instrument (Boone & Rogan, 2005). In both person and item reliabilities, values range from 0.00 to 1.00, where a value greater than or equal to 0.80 is considered acceptable (Fox & Jones, 1998). Similarly, a separation index is measured that estimates the spread of a person, or items, on the measured variables. In order to indicate adequate separation for persons, items or both, the separation index should be at least 2.00 (Fox & Jones, 1998).
In addition to reliability, Rasch analyses allow for a closer investigation of items (i.e. item measures and item fit). In the case of item measures, the intensity with which the items measure the latent trait is investigated, whereas item fit relates to how probable a person's response is. According to Boone and Rogan (2005), one of the most important aspects to consider when developing or analysing an instrument is to identify problematic items. Fit statistics are usually used to identify persons or items that behave idiosyncratically -those items that are answered in inconsistent and erratic ways for whatever reason. In the Rasch model, two chi-square-based fit statistics are used to report the item fit, namely infit and outfit. According to Linacre (2005), outfit statistics are more sensitive to responses in which the item difficulty and person ability differ drastically, and are more likely to indicate lucky guesses and careless mistakes, while infit statistics report smaller differences in the comprehension of the items. Infit statistics are usually used to identify problems with the measurement items. According to Bond and Fox (2007), reasonable item mean square ranges for infit and outfit for Likert survey data are between 0.60 and 1.40.
The results of the Rasch analysis were used to eliminate undesirable items from the different dimensions in the instrument. Although the Rasch results did indicate problematic items, which were subsequently eliminated, further analyses were needed to attain fewer items that could adequately measure the interference between work and nonwork roles. In order to eliminate items that correlate exceptionally high with items from other dimensions, or exceptionally low with items from the same dimensions, item correlations were investigated.

RESULTS
In the results that follow, a summary of the reliability and separation index, as well as of the measure and fit statistics for the items based on the Rasch analysis, are given for all the dimensions of the newly developed instrument. Thereafter, results pertaining to the investigation of item correlations are given.

Reliability and separation index
The person and item reliability and separation indices for the various dimensions are presented in Table 1.
Although the item separation and reliability indices were very good for the majority of dimensions overall, estimates for the work−domestic dimension were lower than those provided in Regarding person average measures, the dimension with the highest average measure was the work−parent dimension (-1.41, SD = 1.93), whereas the lowest average measure was for the parent−work dimension (-2.97, SD = 1.65), indicating that, in the analyses of the work−parent dimension, the average person measure was higher than in the analyses of the parent− work dimension. Comparisons between these scores are possible, because all the scores are standardised on the same scale. Finally, the average item fit was acceptable for all items, indicating that overall there were no major problems with item fit for the dimensions. The average person fit was acceptable for all the respondents, indicating no unexpected answers, which means that, on average, the respondents did not underfit or overfit (guideline between 0.60 and 1.40; Bond & Fox, 2007).

Measure and fit statistics for items
The measure and in-and outfit mean squares are presented in Table 2 and Table 3. The measure values can be used to obtain  (Bond & Fox, 2007) Regarding the fit statistics of the work→nonwork interference items presented in Table 2, the results of the work−parent dimension indicated that Item 6 had the lowest intensity (lowest measure value), whereas Item 4 had the highest intensity (highest measure value). Also, Item 6 seemed to underfit (infit = 1.57), indicating the unpredictability of the item (it does not fall within the item mean square range of 0.6 and 1.4). Fit statistics for the work−spouse dimension indicated that Item 5 had the lowest intensity and also seemed to underfit (infit = 1.55), whilst Item 6 had the highest intensity. Regarding the work−religion dimension, Item 6 had the lowest intensity and Item 5 the highest intensity, while the item with the lowest intensity of the work−domestic items was Item 1 and the item with the highest intensity was Item 4.
The fit statistics for the nonwork→work interference items are presented in Table 3.
From the fit statistics presented in Table 3 it is evident that Item 3 of the parent−work dimension had the lowest intensity, while Item 2 seemed to have the highest intensity and seemed to overfit (outfit = 0.59). Fit statistics for the spouse−work dimension indicated that Item 3 had the lowest intensity, while Item 2 had the highest intensity. For the religion−work items, Item 3 indicated underfit (infit = 1.40) and had the lowest intensity, while Item 5 had the highest intensity. Although this item underfits, it is in line with the guideline of 1.40. Regarding the domestic−work dimension, Item 3 had the lowest intensity, while Item 5 had the highest intensity.
On the basis of the above Rasch results, the following items should be eliminated from the work→nonwork interference dimensions: WPI item 6, WSI item 5, and WDI items 3 and 6. The only item that should be eliminated from the nonwork→work interference dimensions on the basis of the Rasch analyses is RWI item 3.

Item correlations
Subsequent to the Rasch analyses, the items were also evaluated and investigated using item correlations. In order to ensure items that could discriminate well between dimensions, it is advisable to eliminate items with low correlations within dimensions and high correlations with items from other dimensions, and only retain items that correlate highly within dimensions (DeVellis, 2003). Item correlations for the W−NWI items and NW−WI items are provided in Table 4 and Table 5. For illustrative purposes, low correlations within dimensions are in bold blocked together, while high correlations with items from other dimensions are in bold outside the blocks. As can be seen in Table 4 and Table 5, several items had low correlations within dimensions, including WRI item 6, PWI item 1, SWI item 1, SWI item 3 and RWI item 2. Various items also had high correlations with items from other dimensions,  (Bond & Fox, 2007)      including WPI item 3 with WSI item 1 and WDI item 1, WDI item4 with WPI item 4, WSI item 3 with WDI item 5, and WRI item 2 with WPI item 5. Regarding the NW-WI items, items that correlated highly between dimensions were PWI item 6 with SWI item 6, DWI item 1 with PWI item1, 2 and 5, DWI item 5 with PWI items 2 and 5 and DWI item 6 with PWI items 2 and 5.

DISCUSSION
The interference between work and personal life is one of the central issues in the 21st century, as employees attempt to balance or integrate their involvement in multiple social roles (Lingard & Francis, 2005). Although the interaction between work and personal life has received extensive attention in the work−family fields of research, various theoretical, empirical and measurement issues need to be addressed (Bellavia & Frone, 2005;Frone, 2003;Tetrick & Buffardi, 2006). In an attempt to address the limitations pertaining to measuring worknonwork interference, the purpose of this study was to develop a work−nonwork interference instrument with which to measure the interference between work and nonwork roles. With the development of this new instrument, several of the theoretical and measurement limitations voiced by previous researchers have been addressed, providing this instrument with distinct advantages over previous work−family measurements.
Firstly, researchers have raised theoretical issues regarding the dimensionality and directionality of and the narrow focus on interference between work and specific dimensions of previous work−family instruments (Bellavia & Frone, 2005;Frone, 2003;Geurts & Demerouti, 2003;Tetrick & Buffardi, 2006). Grounded in the theoretical perspective of role identity theory (Burke, 1980;McCall & Simmons, 1966;Stryker, 1968), the newly developed W−NWI instrument measures the interference between work and different nonwork roles (i.e. parental role, spousal role, religion/spiritual role and domestic role). In contrast to previous work−family measurements, this instrument differentiates between various interference dimensions and is bidirectional in nature.
Secondly, various measurement issues have been expressed relating to item development and item use (i.e. items confounded with external variables, causes and consequences, measuring different types of conflict and the inconsistent use of the number of items measuring each direction). In an attempt to address these issues, close attention has been paid to the process of item development and item selection. Only items stated in general terms (e.g. items that did not include time-, strain-or behaviour-based types of interference) and items that did not confound with external variables, causes or consequences have been included. In addition, and based on recommendations in the literature on scale development, the same number of items was retained for all interference dimensions (Carlson et al., 2006).
Thirdly, the issue of the use of certain response anchors and scales in work−family instruments has been addressed. As suggested by previous researchers, response anchors that do not provide additional information on the frequency of interference (e.g. agree/disagree) were used. Instead, a frequencybased response format scale was used. The decision to use a frequency-based response format scale where no midpoint is provided was based on suggestions made by Bellavia and Frone (2005) and Kirchmeyer (1992), who stated that frequency data and responses were less biased compared with scales not including the frequency. In addition, fixed-frequency response anchors can shape the respondents' answers and can ensure a positive or negative standing on each question.
Finally, in order to address the issue of rigorous scale development, the procedures described in the literature were closely adhered to (Boyar et al., 2007;Carlson et al., 2006;DeVellis, 2003;Geurts et al., 2005;Kirchmeyer, 1992;Netemeyer et al., 1996). Particular attention had been paid to construct conceptualisation, item generation and evaluation, item development, and item refinement.
Careful attention had been paid to the evaluation of items in order to identify items that accurately captured the different dimensions of interference and to discard items that were inefficient. In contrast to previous work−family scale development studies, items of the work−nonwork interference instrument have been evaluated and eliminated on the basis of the Rasch measurement model, a technique that has become the standard for modern psychometric evaluations of outcome scales (Tennant & Conaghan, 2007). As Rasch is a unidimensional measurement model that is based on the assumption that all items summed together form a unidimensional scale, the operating characteristics of all the items are examined across the whole continuum of a latent trait (Hagquist, 2007;Rasch, 1960).
From the Rasch analyses, two problematic aspects have emerged − problems relating to dimensions (namely person separation and reliability problems) and problems relating to specific items (e.g. problematic items). Although no item separation and reliability problems have been found in the majority of dimensions, on the basis of the person separation and reliability indices it has been concluded that some of the dimensions could not separate or discriminate adequately between persons (i.e. RWI, PWI, DWI, WRI and WPI), indicating possible sample-specific problems.
The RWI dimension (and to an extent the WRI dimension) indicates the poorest discrimination between persons. The inclusion of a religion/spiritual role item has been based largely on the qualitative study of Koekemoer and Mostert (in press), in which religion/spirituality has emerged as a very strong personal dimension in the life of South African employees. A possible explanation for the poor discrimination between persons in this dimension might be the homogeneity of the sample, as the majority of the participants were white people working in a tertiary institution that traditionally had a strong religious background. Administering the scale to a more diverse group might yield very different results, with fewer person separation and reliability problems.
In contrast to the person separation and reliability problems, item separation and reliability problems are found only in the WDI dimension, indicating item-specific problems rather than sample-specific problems. It seems that, although the WDI items were able to separate between persons (according to the Rasch criteria), the items were unable to discriminate, suggesting that the formulation of some items could be improved, or that the participants were unable to understand what was being measured. When considering the measure levels of the items more closely, it is clear that the majority of these items measure the dimension on the same level (i.e. WDI items 3, 2, 6, 5).
An advantage of using Rasch analyses is the ability to identify problematic items for elimination purposes (Boone & Rogan, 2005). On the basis of the Rasch analyses, five problematic items were identified and eliminated because of their infit and outfit statistics (viz. WPI item 6, WSI item 5, RWI item 3, WDI item 3 and WDI item 6).
Although WPI item 6 and WSI item 5 had the lowest measures, which indicate the high endorsement of the items by the participants, they seemed to underfit (based on infit statistics), pointing to the unpredictability of these items. In addition, a close investigation of the phrasing of the other WPI items and WSI items showed that all the items related to the quality of relationships with one's spouse or child(ren). However, both WPI item 6 and WSI item 5 related more to schedules or arrangements with one's children or spouse, and not necessarily to the quality of the relationships.
RWI item 3 also seemed to be problematic due to the low measure (indicating the high endorsement of the item) and underfit (indicating the unpredictability of the item). When considering the RWI items, all the items appeared to measure the current interference of religion/spirituality in one's work, whereas WRI item 3 rather measured the potential of religion/ spirituality matters interfering with one's work. This item could be classified as a more emotional item that suggests potential or future interference, unlike the current interference measured with the other items, and was therefore eliminated.
Items were also eliminated by means of an investigation of item correlations. Regarding the W−NWI items, four items were eliminated due to high correlations between dimensions (i.e. WPI item 3 correlating highly with WSI item 1 and WDI item 1; WDI item 4 correlating highly with the majority of the WPI items; WSI item 3 correlating highly with all the WDI items; WRI item 2 correlating highly with several items in other dimensions). In the majority of these cases, the domestic items correlated very highly with the spouse and parent dimensions, indicating that the participants found it difficult to discriminate between interference in domestic activities and activities or obligations relating to their children or spouse. In addition, WRI item 6 was also eliminated due to its low correlation with all the other WRI items. When considering the wording of this item, the item appears to indicate a more cognitive type of interference, whereas the other WRI items suggest a more physical interference in activities. Various NW−WI items were also eliminated because of low correlations within dimensions (i.e. PWI item 1, PWI item 6, SWI item 1, SWI item 3, RWI item 2) and high correlations with other dimensions (i.e. DWI item 1, DWI item 5 and DWI item 6 with PWI items 1, 2 and 5). It seems that, in the case of the DWI items, high correlations were found mostly with PWI items, which could indicate that some individuals were unable to distinguish certain domesticrelated activities or interference from parental-related activities or interference. RWI item 2 seems to differ from all the other RWI items, as all the items appear to suggest religion/spiritual activities interfering with one's work, whereas Item 2 rather reflects interference or uneasiness in the relationships at work due to religion/spirituality. After the elimination process, a work-nonwork interference scale with 30 items was retained.
Notwithstanding the valuable contributions and advantages of the newly developed W−NWI instrument, some limitations do exist. On the basis of the suggestions of previous researchers regarding the development of new instruments (DeVellis, 2003;Nunnally, 1988), a large number of items were initially developed and included in the pilot study. As the same dimensions were initially measured with various items (between eight and 12 items per dimension), some of the participants complained about the length of the questionnaire, the number of items measuring the same construct and the repetition of the items. This could have had an influence on how the participants responded to the items (e.g. responding randomly). This limitation was addressed in the evaluation study, for which fewer items were used and items measuring different dimensions were randomly combined with items from other questionnaires. The participants therefore could not easily identify the nature of specific items. With the Rasch analyses, the occurrence of potential random selection was possible (e.g. participants only answering questions randomly and not considering the questions first), but none were identified.
Another limitation indicated by the Rasch analyses was the homogeneity of the sample used in this study -the participants displayed no diversity. This definitely influenced the results and the items that had to be eliminated. It is plausible that these items could have performed better in samples that were more diverse in terms of culture, background, etc. Although two samples were used in this study to evaluate the instrument, the samples were not very large, especially the sample used in the evaluation study. Larger samples could also have provided different results.
Despite the limitations of the study, recommendations can be made for future studies regarding the use of the newly developed W−NWI instrument. Firstly, in relation to the limitation of homogeneity, it is recommended that the instrument be administered to more diverse samples. In this study, the focus was only on item elimination, therefore no analyses were conducted to determine the psychometric properties. It is recommended that the validity of the instrument should be investigated further -firstly by analysing the internal psychometric properties of the instrument (i.e. construct validity, convergent validity, discriminant validity) and secondly by analysing the external validity (i.e. relationships with causes and consequences of work−nonwork interference). This instrument only measures the negative interference between dimensions and does not allow for possible positive influences or spillover between dimensions. According to Tetrick and Buffardi (2006), researchers are starting to place more emphasises on 'positive psychology' and there is a need to explore the ways in which work and family roles or nonwork roles can enhance one another. According to Carlson et al. (2006), the fundamental thinking behind the positive influences between work and personal life is that these domains provide individuals with resources or other benefits that may help them to perform better across the various domains in their lives. As few instruments are available that measure the positive interaction between work and family, possible future studies could include the development of an instrument measuring positive influences between work and other nonwork roles.