Part 2 Chi-squared test

2.1 What is chi-squared?

\(\chi ^2\) is for a preliminary step of a statistical investigation, helping to determine whether samples in two categorical variables are observed by luck or not. It can test for independence of two categorical variables among other functions. Note that Fisher’s and Barnard’s exact tests are better, but computationally heavier – more on this later.

Explanation — independence

It compares expected number of sampled vs. actual observations. We will take a look at two different stories. First story will be to see whether gender and ethgrp (ethnic group) are independent or not, and the second story will be to see whether the two categorical variables par_nerve (“how often does your spouse/partner get on your nerves?”) and spdemand2 (“how often does spouse/partner make too many demands?”).

2.2 The first story — tabulate and \(\chi ^2\) on `gender` and `ethgrp`

Let’s start with the command tab ethgrp gender, chi2 which gives this output:

. tab ethgrp gender, chi2

     race/ethnicity |
          recode (4 | gender of respondent
        categories) |      male     female |     Total
--------------------+----------------------+----------
              white |     1,104      1,299 |     2,403 
              black |       217        300 |       517 
hispanic, non-black |       174        193 |       367 
              other |        39         39 |        78 
--------------------+----------------------+----------
              Total |     1,534      1,831 |     3,365 

          Pearson chi2(3) =   3.9497   Pr = 0.267

In our world, we think that race/ethnicity and gender are independent (ethnicity does not depend on your gender and vice versa). Even if we know they are, let’s pretend we don’t know for sake of an example (and because certain samples does not guarantee the right results, usually questionable sampling methods). In order to conduct a preliminary analysis on the independence hypothesis or the culture hypothesis for further research, we need to perform the \(\chi ^2\) test. For exact test, use the command tab ethgrp gender, exact.

2.2.1 The rule of thumb

The important number here is Pr = 0.267, which determines whether you reject the null hypothesis that the two variables are independent from each other or not. In this case, we fail to reject the null hypothesis at the 5% (for example) significance level because 0.267 is higher than 0.05. In short, the two variables are likely to be independent. If you know the math or already did it by hand, you can use the Stata command display chiprob(df,x) or the shortform di chiprob(df,x).

Just to cross-check, Pearson chi2(3) = 3.9497 tells us that the degrees of freedom is 3, and \(\chi ^2\) value is 3.9497. Looking at the table above, 3.9497 is between 2.366 and 4.11. This tells us a significance level somewhere between 0.5 and 0.25, which is way higher than the standard 0.1, 0.05, or 0.01. Therefore, we fail to reject the null.

2.2.2 Exact or \(\chi ^2\) test?

You should always use exact test if computationally possible. It definitely depends on the hardware specs, but with 2019 consumer-grade technology (i3/i5/i7 or Ryzen CPUs), exact test should be able to handle sample size in the 100,000s within a minute if not few seconds. Since exact and \(\chi ^2\) tests are just preliminary steps of a statistical analysis, it is most likely not worth the hours or days of computation, so if sample size is close to or higher than million, or if it takes more than a minute, use \(\chi ^2\) test.

2.3 The second story — tabulate and \(\chi ^2\) on `par_nerve` and `spdemand2`.

Reviewing the questions, “how often does your spouse/partner get on your nerves?” and “how often does spouse/partner make too many demands?” are seemingly dependent on each other, as well as correlated. The difference between the first story and this second story is that the categories of the second story has an order. To clarify, “often” is a higher (discrete) value than “some of the time” and so forth.

. tab par_nerve spdemand2, chi2

       how often does | how often does spouse/partner make too many
  partner get on your |                  demands?
              nerves? |     never  hardly ev  some of t      often |     Total
----------------------+--------------------------------------------+----------
                never |       117         61         20         13 |       211 
hardly ever or rarely |       269        364        139         40 |       812 
     some of the time |       155        345        281         98 |       879 
                often |        14         24         32         41 |       111 
----------------------+--------------------------------------------+----------
                Total |       555        794        472        192 |     2,013 

          Pearson chi2(9) = 300.3252   Pr = 0.000

2.3.1 Validity check

Are the categories within the variables mutually exclusive? More concretely, is “some of the time” exclusive to “hardly ever or rarely” for both variables? The answer is unlikely. Person A’s definition of “hardly ever or rarely” could be twice a month, but it could be twice a week for person B, which may be person A’s “some of the time” instead. For this reason, survey design is important and the NSHAP team has to pay due diligence on probing questions, adding supplemental details along the questions, or reformulate the answers entirely.