Data @ Reed

Chi-square tests

Chi-square tests are non-parametric analyses that evaluate frequencies in a sample and compare those to the expected frequencies in a population. Chi-square goodness-of-fit tests look at one variable, while a chi-square difference of means test looks at two variables. 

Chi-square test, goodness of fit

A chi-square goodness of fit compares your observed values to expected values. For this example, we will look at the 1988 NLSW (National Bureau of Labor + Statistics, Young Women dataset) data and use the csgof package for our analysis.

First, install the csgof package. In Stata, type

findit csgof

and click on "csgof from". This will take you to a second screen in Stata; click on "click here to install", install the package, and then return to the command line.

Load your data and look at your data (using the browse command).

sysuse nlsw88


Based on other information, you hypothesize that the people captured by this dataset are mostly "race = white", with smaller frequencies of people coded as "race = black" and an even smaller group of "race = other".

csgof race, expperc(75 20 5)

These results show that the racial composition in your sample does not match your expectations.

Chi-square test, independence

Also known as the chi-square test for a difference of means, this test examines the relationship between two categorical variables. In this example, I will look at the stock Stata dataset of automobile repair data from 1978 and see if there is a relationship between a car's repair rating and whether or not it was produced in the US.

sysuse auto

tab rep78 foreign, chi2

If you wanted to see row percentages instead of frequencies, specify that in the options:

tab rep78 foreign, row nofreq chi2