Part 6 Ordered logit (ologit)
6.1 Goal of ologit
Ordered logit is used when the dependent variable is an ordinal variable. For example, the attend
variable is coded the following way:
0 = never
1 = less than once a year
2 = about once or twice a year
3 = several times a year
4 = about once a month
5 = every week
6 = several times a week
The key here is that there is clear order from 0 to 6, where 0 is the minimum, while 6 is the max. Another example can be the varaible rlthapy
, on a scale from 1-7 from very unhappy to very happy. Essentially, we do not know if the distance between the values 1-2 is the same for 2-3. You can also choose to convert haml*
(income brackets) variables to one ordinal variable.
6.2 Stata
Similar to logit, coefficients from the output of the ologit command cannot be interpreted directly except for the signs. Let’s skip interpreting the marginal effects since it is covered in logit tutorial. Note that svy:
is a prefix command that tells Stata account for clusters, strata, and weights. Here is a truncated output for the command:
. svy: ologit attend age i.maritlst i.ethgrp i.social i.sptime, or
------------------------------------------------------------------------------------------------
| Linearized
attend | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
age | .998321 .0061196 -0.27 0.785 .9861047 1.010689
|
maritlst |
living with a partner | .3363577 .0803721 -4.56 0.000 .2081447 .5435475
separated | .1905459 .2151926 -1.47 0.148 .019718 1.841351
divorced | .9607897 .2991025 -0.13 0.898 .5141283 1.795499
widowed | 1.454997 .553336 0.99 0.329 .6778344 3.123205
never married | 1.689047 1.671342 0.53 0.599 .2314619 12.32548
|
ethgrp |
black | 1.416675 .1992523 2.48 0.017 1.068027 1.879136
hispanic, non-black | .4342512 .0993871 -3.64 0.001 .2742182 .6876792
...
Starting with the specification, attend
is our \(y\) variable, while age
, maritlst
, ethgrp
, social
, and sptime
are our \(x\) variables.
Notice that one discrete value is missing from each of the discrete independent (\(x\)) variables. For the variable maritlst
, “married” is missing, while for the variable sptime
, “together” is missing. What is happening here is that the odds ratio values are in comparison to the base values “married” and “together” respectively. In essence, these two values for the two variables are base values (there is an option to changes this with base
), or an odds ratio of 1, so they are omitted.
To interpret the odds ratios, let’s compare it to relative risk (ratio). First to explain relative risk, a relative risk of 2.1 means there is a 1.1 or 110% higher probability than the baseline (or control). For another example, a relative risk of 0.4 means there is a 0.6 or 60% lower probability than the baseline. Odds ratio is a bit more complicated. To formally get into odds ratio, the relation of odds ratio to relative risk is
\(OR \approx \frac{(R_C-1) \times RR}{R_C \times RR - 1}\)
where \(OR\) is odds ratio, \(RR\) is relative risk, and \(R_C\) is the baseline. Or if you already know what odds are, it is
\(OR = \frac{positive\ odds}{negative\ odds}\)
where positive is answering 1, and negative is answering 0 on a certain choice on the condition of the independent variables. As a rule of thumb, relative risk of 3 is approximately odds ratio of 4, though this relation does not scale linearly. Odds ratio higher than 1 increases probabilty and vice versa.
To interpret the “/cutX”" coefficients, take a look at ASCII art/picture at: https://www.stata.com/support/faqs/statistics/cut-points
The odds ratios are not statistically significant if the confidence interval includes 1, unlike OLS of 0. Like OLS however, P>|t|
lower than 0.05 is statistically significant.
Odds ratio higher than 1 means that there is a higher odds of upgrading attend
one level. To clarify, “upgrade” here means going from “about once or twice a year” to “several times a year” for attend
variable.
6.3 Graphing
. predict y_hat*
. graph hbox y_hat1, over(social)
The asterisk in y_hat*
means “all” where in this case, it will create y_hat1
to y_hat7
. These different y_hat*
values are to predict different \(y\) choices. For example, you can run graph hbox y_hat1, over(social)
and see the probability of never attending decrease as they socialize more. However, graph hbox y_hat7, over(social)
tells the opposite story.
6.4 Validity
Ordered logit is another regression analysis, which means that there are validity hurdles. The command svy: tab happy
shows that most respondents are “pretty happy” and “very happy” from the output, but the first basic concern is “pretty” and “very” adjectives, while the second basic concern is “happy” in this survey. Especially because the answers are not quantifiable, framing the research question, assumptions, and modelling require extra care.