Assignment #1
- PPol 604
- Due: Thursday, 17 January 2013
Type up your answers. Give proper credit to those you work with and/or the text(s).
Solve the following problems. Show all of your work, but keep your answers concise.
Highlight your (final) answer
to distinguish it from your other numbers and text. Include a copy of your input
(e.g. do file) or output (e.g. log file),
when it is an appropriate way to show your work.
However, do not include unnecessary output (i.e. no data dumps), and format any output
so that it is easily readable.
An appropriate time to include output is when you put your results
in a table--if your results are wrong, then the grader has no idea how you came to your
conclusions (i.e. give partial credit) unless you provide some output. Explanation
includes statistical and substantive explanation (explain so that a statistical
layperson can understand it, and so that a
statistical analyst will see your erudition).
- {65} As a comparison with ordered logit, we will first do this problem with logit. To solve this problem, use a subset of the 1996 GSS dataset here.
Researchers have posited that age, education, gender,
religion, socioeconomic status, and class influence whether a person attends
church regularly. Your measure of church attendance is an ordinal one. To run logit, simplify that variable: Create a dummy dependent variable indicating if a respondent
is a regular churchgoer or not (in this case, a regular churchgoer attends
church once a month or more). (Hint: the codebook
command might be useful, e.g. codebook attend.) Create true dummy
variables for whether
the respondent is female, Protestant, or Catholic (let the other categories
be combined into the baseline category). Treat the other variables as
interval level. Create a table to show results.
a. Before estimating the
regression, hypothesize the expected signs of the coefficients (and why).
b. Use logit to test all of the theories simultaneously (i.e. see which variables are significant while controlling for the others). If there are any insignificant
independent variables, see if you can remove them (through a joint significance test),
and run the econometric analysis again, but
present both sets of results. You do not need to interpret yet.
(Use the simpler specification in the analysis that follows.
This is where you will start to interpret.)
c. Discuss qualitatively how each independent variable affects church attendance.
d. Calculate the probability and confidence interval of regular attendance
for a baseline case of male, non-Catholic, non-Protestant, 45 years old, 12 years
education, working class, sei=49 (you may not need all of these if you have
dropped any variables).
e. Calculate changes in probability for various changes in independent
variables (dummy variable moves from 0 to 1, age moves from 45 to 61, education moves
from 12 to 16, working class to middle class, sei moves from 49 to 68).
Interpret and explain the results. Use a table.
f. Plot the relationship between education and (the probability of) regular church attendance for average men and women (where average is a 45-year-old, middle class Protestant with sei = 49).
g. Discuss the fit of the model generally using chi2, pseudo-R2, percentage predicted correctly, and proportional reduction of error.
h. Test for misspecification (functional form or omitted variables) and show your results.
i. Test for multicollinearity and show your results.
j. Now, use the original ordinal dependent variable to fit an ordered logit model with all of the variables.
If there are any insignificant
independent variables, see if you can remove them (through a joint significance test),
and run the econometric analysis again, but present both sets of results. (Use the simpler specification in what follows.)
k. Discuss if there are any qualitative differences in the (binary) logit and ordered logit results.
l. Calculate the predicted probabilities and confidence intervals of different levels of attendance
for a baseline case of male, non-Catholic, non-Protestant, 45 years old, 12 years
education, working class, sei=49 (you may not need all of these if you have
dropped any variables).
m. Calculate changes in probabilities for
changing from the baseline case to a female (with all else remaining the same).
Interpret and explain the results. Use a table.
n. Plot the relationship between education and (the probability of) attending church more than once a week for average men and women (where average is a 45-year-old, middle class Protestant with sei = 49).
What happens when you generate a graph like this for a category in the middle (e.g. the median response: attending several times a year)?
o. Discuss the fit of the model generally using chi2, pseudo-R2, percentage predicted correctly, and proportional reduction of error.
p. Test for misspecification (functional form or omitted variables) and show your results.
q. Test for multicollinearity and show your results.
- {35} [roughly from Golder 2006] Use the data set here. This is a subset of the 1992 National Election Study (cleaned up a little), which includes Bush, Clinton, and Perot voters. (Use summarize, tab, and codebook to examine the variables.)
a. Estimate a multinomial logit model where voters are choosing (vote) between Bush (0), Clinton (1), and Perot (2). Make Bush the base category. The explanatory variables should be conservative, economyworse, education, union,, income, and black. Qualitatively discuss (i.e. coefficient signs and statistical significance) which variables affect which vote choices.
b. How can you compare Clinton and Perot voters? Make that comparison, and qualitatively discuss which variables affect the vote choice between Clinton and Perot.
c. Estimate a binary logit model using only Bush and Clinton voters using the same explanatory variables as (a). How and why do the results differ or remain the same as (a)?
d. Interpret the relative risk ratio on black for all three vote comparisons.
e. Graphically present the effect of economyworse on (the probability of) voting for the three candidates. To do so, set conservative to 4, the other variables to their medians, and obtain the predicted probability of voting for each candidate at each level of economyworse. (economyworse is derived from the question: "Would you say that over the past year the nation's economy has gotten better, stayed about the same, or gotten worse?" It is coded 1: much better, 2: somewhat better, 3: stayed the same, 4: somewhat worse, 5: much worse.) Then create three lines (one for each candidate) on a graph with economyworse on the x-axis and predicted probabilities of voting for each candidate on the y-axis. Interpret the graph. [Hint: marginsplot will not do this for all three lines simultaneously, but 3 margins commands will generate the information you need (which is 3 probabilities at each level of economyworse).]
f. Discuss the fit of the model generally using chi2, pseudo-R2, percentage predicted correctly, and proportional reduction of error.
g. Test for multicollinearity (using vif) and show your results.
Back to
Assignments
page