PRACTICALLY SIGNIFICANT RELATIONSHIPS BETWEEN TWO VARIABLES

It is shown how effect sizes can be used to establish whether relationships between two variables are practically significant (important). This is done for populations as well as for samples. Four cases are distinguished: When both variables are nominal, both dichotomous, one dichotomous and the other on an interval scale and lastly both variables on an interval scale. Examples are given to illustrate the use of the suggested effect sizes. hoe effekgroottes word om te bepaal of verbande tussen betekenisvol (belangrik) is. steekproewe Vier digotoom, digotoom die ander veranderlikes gegee die die effekgroottes te illustreer.

Two situations have to be distinguished: (1) when dealing with a population and (2) when a random sample is drawn from a population. Only in the second situation is the statistical significance of a relationship appropriate, since the test result obtained from the sample is used to establish whether two variables are related within the population (with a small probability of concluding this erroneously). In the first situation another way has to be found to determine whether the relationship is "practically significant". Here, as in the case where two population means are compared (cf. Steyn, 2000), an effect size, as a measure of practical significance, can be a useful aid. Also, such an effect size can be established from a sample in order to determine the importance of a statistically significant relationship.
Many different effect sizes exist and are discussed in Psychological literature (see Nickerson, 2000). While the reporting of effect sizes is encouraged by the American Psychological Association (APA) in their Publication Manual (4 th th edition, APA, 1994), Kirk (1996) noted on the basis of a survey of four APA journals, that most of these measures are seldom if ever found in published reports.
The reporting of effect sizes has the added attraction to some analysts of facilitating the use of meta-analytic techniques (see Rosenthal, 1991). Different kinds of relationships exist, that depend on the scales on which the two variables are measured. In this paper the following cases are dealt with: Both variables on a nominal scale; Both variables dichotomous; One variable dichotomous, the other on an interval/ratio scale; Both variables on an interval/ratio scale.
In the following section, an overview is given of population effect sizes for each of the cases. The second section deals with the estimation of effect sizes by using random samples. Examples are given throughout the two sections, and the last section contains a discussion and conclusions of how to apply practical significance.

POPULATION EFFECT SIZES OF RELATIONSHIPS
Both Variables On A Nominal Scale Consider the following example: Example 1: In order to study the relationship between temperament type and grouping of faculty members and students at a tertiary education institution, the Myers-Briggs Type Indicator (MBTI) was administered to all the lecturers and students of an Economics and Management Faculty at a South African university (Rothmann et al., 2000a). Table 1 gives the numbers of lecturers, male and female students within each of the four temperament types.  ith row and the jth  column of the table). Also, denote f i+ to be the ith row's total frequency, and letting f +j be that of the jth column. Let N be the population size, i.e. the total frequency. Cohen (1988) suggested the following effect size to measure the relationship between x and y: (1) Note that , where is the usual Chi-square statistic for this two-way frequency table.
The following guidelines are given by Cohen (1988) in order to judge the importance of a relationship: w = 0,1: small effect. w = 0,3: medium effect w = 0,5: large effect Cohen justifies his guidelines for w by giving the equivalent values of the contingency coefficient and Cramér's j 1 . In the following section examples are given of 2x2 contingency tables for each of these guidelines.
Example 1: (continued) Consider Table 1. To calculate the effect size w, it is necessary to obtain the cell values of every cell in the contingency table: The cell value of the cell in the i th row and j th column is given by: For the top-left cell (i.e. first row, first column) Each cell's value can be calculated in the same way, resulting in: w 2 = 0,0047 + 0,0052 + 0,0014 + 0,0183 + 0,0071 + 0,0003 + 0,0021 + 0,0014 + 0,0016 + 0,0001 + 0,0001 + 0,0002, and This value of w indicates that the effect is small to medium. Therefore there is some indication of a relationship between the temperament type and the grouping of faculty members and students into categories.

Both Variables Dichotomous
Consider first the following example: Example 2: A survey of an organisation's 60 employees regarding their preferences for a new medical scheme, resulted in the frequency table given by Table 2. Is there a relationship between gender and preference? Cohen (1988) suggested as effect size the so-called phi coefficient j, which is a special case of the effect size w, when r = 2 and c = 2. However, a simpler formula for the calculation of j can be used when the frequencies in the 2x2 table is given by a, b, c and d in the following way: y Category 1 Category 2 x Category 1 This effect size can also be negative when bc > ad, implying that the frequencies b and c are more abundant than the other two cell frequencies. Therefore, in contrast to the case where more than two categories (levels) occur on one or both variables, the direction of the relationship can also be determined.
Since the phi coefficient is a special case of w in (1), the same guidelines for this effect size can be used (without taking the sign of j into consideration).

Example 2: (continued)
Considering the data in Table 2 and using (2) we have: Since j is almost 0,1 in absolute value, it can be considered as a small effect. No relationship really exists and the negative sign is therefore of little importance.
To get a feeling of what "small", "medium" and "large" effects mean in terms of 2x2 tables, consider Table 3 in which a population of size 200 has been grouped.
For a 2x2 table to describe a positive relationship, the frequencies in the cells where x and y have the same value (e.g. both 1 and both 2) have to be larger than those of the remaining two cells. In Table 3 (a) above these frequencies are both 55 in contrast to the 45 of the other cells, resulting in a effect size of 0,1. In Table 3 (b) and (c) these frequencies increase relative to those of the two remaining cells and therefore the value of j also increases.
Analogous illustrations of negative relationships can be given by making the frequencies of cells where x and y are different, larger. In such cases the values of j will be negative.
The value of j will be zero when frequencies in two rows (or columns) are equal.  Also keep in mind that the maximum absolute value of j when dealing with 2x2 tables is 1 (which is the case when either b=c=0, resulting in j = 1, or a = d = 0, resulting in j = -1).
One Variable Dichotomous, The Other On An Interval/Ratio Scale Example 3: In the study described in example 1 (Rothmann, et al., 2000a) the means of the continuous personality type scores of the lecturers were compared with those of the students (see Table 4). Here a dichotomous variable (x) can be considered an indicator of membership of population members to two distinct groups or sub-populations (in example 3 it is the lecturers and students). The usual measure for a relationship between such a variable x and one on an interval or ratio scale (y) is the pointbiserial correlation r pb . It can be calculated by taking x as a variable with two distinct numerical values (e.g. 0 and 1) and obtaining the Pearson product moment correlation coefficient between x and y. Take the effect size of the difference between two population means m 1 and m 2 to be (Cohen, 1988): ( 3) where s is the common standard deviation of the two populations. The relationship between r pb and D is given by: with p the proportion of the population members belonging to the first population and q = 1 -p the remaining proportion. Steyn (2000) suggested that when dealing with populations with different standard deviations s 1 and s 2 that the following effect size for a difference in population means should rather be used: It can be shown that the same relationship as in (4) exists between r pb and the newly defined D a .
Using this relationship, guideline values for D (Cohen, 1988) of 0,2 (small effect), 0,5 (medium effect) and 0,8 (large effect) transform to values 0,1, 0,243 and 0,371 for r pb . For convenience the following guideline values are therefore suggested for r pb : small effect : 0,1 medium effect : 0,25 large effect : 0,4 Example 3: (continued) From Table 4 the effect sizes in respect of the relationship between the personality type scores and the sub-population membership can be calculated as in Table 5. Let both variables x and y be assumed to be normally distributed. Also let the variable z be x when dichotomised with values at the medians of the lower half and upper half of the x values. Now the following relationship between the correlations of y and x and y and z exist (Cohen, 1988): From (6) it follows that the guideline values from the previous section for r pb (which were derived from those of D), now transform to the following rounded values in respect of r (Cohen, 1988 Table 6 contains the Pearson correlations between the academic performance and the continuous scores of the personality construct extraversion/introversion for a core group of the students per academic year and gender (i.e. students who passed all their subjects the previous year and who were registered for all the prescribed subjects of the current year). Note that since the guidelines are somewhat arbitrary, the correlations 0,23 and 0,24 are viewed to have a medium effect, being nearer to 0,3 than to 0,1.
According to Cohen (1988) "… many of the correlation coefficients encountered in behavioural science are of this order of magnitude, and, indeed, this degree of relationship would be perceptible to the naked eye of a reasonably sensitive observer." Since 0,47 is near 0,5 it can be taken to be a large effect.
Here it falls around the upper end of the range of r's one encounters in fields like differential, personality-social, personnel, educational, clinical and counselling psychology (Cohen, 1988).

The Estimation Of Effect Sizes Of Relationships From Samples
In the previous section we gave the appropriate effect sizes for establishing the importance of a relationship between two variables for a complete population. When dealing with a random sample of size n from such a population, the effect sizes can no longer be determined exactly, but can be estimated from the results of the sample. In this section these estimates are given together with their statistical properties as far as unbiasedness is concerned.

Two Categorical Variables
By using the cell frequencies of a contingency table in respect of a sample, w can be estimated by , using formula (1). From Johnson et al. (1995, p.447) it follows that the expected value of is approximately . This means that overestimates by . Where n is large, this bias term can be neglected and it follows that is virtually unbiased for w. Note that in order to establish this unbiasedness, the condition that every cell frequency must be above 5 must be met. This is the usual condition under which the Chi-square test on a contingency table is applicable.
Example 5: (Elifson et al., 1990, p.422). Interviews were conducted with 70 homosexual and 110 heterosexual males concerning their fear of contracting AIDS. Assume that these respondents were randomly chosen from some specified population. Table 7 gives a 3x2 contingency table of the results (with the expected frequencies when assuming no relationship in brackets). Firstly the hypothesis of no relationship was statistically tested and found to be highly significant.
Here = 44,48/180 = 0,247 with approximate bias , which is negligible. The effect size is = 0,497 which indicates an important relationship between sexual orientation and fear of AIDS.

Two Dichotomous Variables
As in the previous section, the population effect size j can be estimated by j, using the cell frequencies from a contingency table of a sample in formula (2). Also since is a special case of , this estimation of j is unbiased for large n. Even for n = 20 Monte Carlo simulations (with 10 000 replications) showed a bias of about 0,02 when data were generated from Table 3(c) where j = 0,5. (See Steyn, 1999 for more details). For j = 0,3 (as in Table 3(b)) the bias was even smaller.
Example 6: (Larsen & Marx, 1981, p.337): Over the years studies have sought to characterise nightmare sufferers. To investigate whether men fall into this pattern to the same extent as women, random samples of 160 men and 192 women were drawn, resulting in Table 8. Let the null-hypothesis be that no relationship exists between nightmare pattern and gender, then the usual Chi-square-test yields no statistically significant result. .
Hence the relationship between nightmare pattern and gender would only be due to chance. A small effect size can therefore be assumed. For completeness sake, the phi coefficient was calculated in this case ( = 0,033).

One Variable Dichotomous, The Other An Interval Scale
In order to estimate r pb from a random sample of the population, it suffices to estimate D a in (5). Steyn (2000) suggested the estimator where and are the two sample means and S max is the maximum of the two standard deviations; slightly underestimates D a . The estimator for r pb follows from (4): Example 7: Rothmann (1999) conducted a study to test a programme which improved participants' knowledge of facilitation. He assigned half of a group of third year volunteer students randomly to an experimental group who took the programme. The remainder of the volunteers were used as a control group. Before the programme a facilitation test was administered to all 48 students and after the intervention to 44 of them. The increase in scores between the pre-and post-tests gives the results in Table 9. Testing the null-hypothesis of no difference between the test and control means, resulting in a highly significant difference in means [t = 11,24; p < 0,0001].
Since the population studied can be viewed to be the 48 volunteers from which the two groups were randomly chosen, the proportions p and q can be taken to be equal. First estimate D a by: The effect size can be estimated to be 0,80 which is very large and indicates an important relationship between group membership and increase in knowledge of facilitation. The programme was therefore highly successful.

Both Variables On An Interval Scale
The natural estimator for r is the product moment correlation coefficient r, based on a random sample from the population. According to Johnson et al. (1995, p.55) r is a biased estimator for r, with bias which is always between -0,2/n and 0,2/n. This means that for large samples r is unbiased but for smaller samples it underestimates r whenever r is positive. When r is negative it overestimates r. Keeping this in mind, it is suggested that r be used.
Therefore, for small samples and a positive correlation the effect size estimator based on r will be conservative in the sense that a practically significant relationship will not always be detected in cases where it really exists. The opposite is true for negative correlations.
Example 8: (Adapted from Bartholomew and Knot, 1999, p.69). Pearson correlation coefficients were obtained for six ability variables from a random sample of 112 individuals (see Table 10). With the exception of the correlation between reading comprehension and mazes, all the correlations are statistically significant at the 5% level of significance. This means that the null-hypothesis of no correlation is rejected. Clearly, not all these correlations indicate important relationships, and in viewing the correlations as estimates of effect sizes, e.g. the correlation between Non-verbal intelligence on the one hand and block design, reading comprehension and vocabulary on the other hand, have large effects.

DISCUSSION AND CONCLUSIONS
Measures of relationships like the phi coefficient and Pearson correlation coefficient are well known. However, their usage as measures of effect size is less known. In this paper it was shown how effect sizes w and r pb also have their place in this regard. Apart from Steyn (1999Steyn ( ,2000, a clear distinction between population and sample cases of effect sizes is rarely made. The author tried to make this distinction in the current paper and illustrated it by an abundance of examples.
While effect sizes are suggested for each of four cases, for relationships when dealing with a complete population, the estimates from random samples are not always unbiased. Especially with small samples, biased estimations can occur and care should be taken when drawing conclusions regarding the size of the effect.
While many other types of effect sizes exist (Nickerson, 2000;Cohen, 1988;Steyn, 2000), the focus in this paper was on effect sizes which arise from relationships. There are also effect sizes when comparing several means in respect of one or more variables (Steyn, 1999), and are topics for further research.