Assignment #10
- PPol 604
- Due: Thursday, 28 March 2013
Type up your answers. Give proper credit to those you work with and/or the text(s).
Solve the following problems. Show all of your work, but keep your answers concise.
Highlight your (final) answer
to distinguish it from your other numbers and text. Include a copy of your input
(e.g. do file) or output (e.g. log file),
when it is an appropriate way to show your work.
However, do not include unnecessary output (i.e. no data dumps), and format any output
so that it is easily readable.
An appropriate time to include output is when you put your results
in a table--if your results are wrong, then the grader has no idea how you came to your
conclusions (i.e. give partial credit) unless you provide some output. Explanation
includes statistical and substantive explanation (explain so that a statistical
layperson can understand it, and so that a
statistical analyst will see your erudition).
- {50} [from Box-Steffensmeier 1996 via Jones 2008]
Does a war chest deter (or delay) a challenger from entering? Data on this question from the U.S. House elections in 1990
are found here.
te denotes the ending time of a reporting period or the time of a "high quality"
challenger enters a House race against an
incumbent. (There can be multiple observations per incumbent: caseid denotes
an incumbent identifcation code.) The scale of time is in weeks. The minimum is 1,
denoting 1 week; the maximum is 90, denoting 90 weeks. If cut_hi = 1, it indicates
that a high quality challenger entered at te.
iv denotes the prior vote the incumbent received (scaled between 0 and 1; the minimum
value is .5 (denoting 50 percent) and the maximum value is 1 (denoting 100 percent)).
ec denotes the incumbent's warchest, that is, the amount of money the incumbent has in
reserve to use at his or her discretion (scaled in millions of dollars). The minimum value
is .00069, which corresponds to $690; the maximum value is 1.688, which corresponds to
$1,688,000.
south is a dummy variable denoting whether or not the incumbent is in a Southern
state (1 denotes South, 0 denotes non-South).
dem is a dummy variable denoting whether or not the incumbent is a Democrat
(0 = Republican).
a. How many cases are right-censored? What does right-censoring mean in the context of
this research problem?
b. Which covariates vary over time? Do they have any problems with reverse causality (i.e. rate
dependence)?
c. Run a Cox model using the covariates provided. Interpret the results. Create
a graph of the (smoothed) hazard function and interpret it.
d. Run a linktest. This specifies whether there is a specification error,
including the proportional hazards assumption. (If the coefficient on _hatsq
is statistically different than 0, then it indicates there is a problem.) What do you
conclude?
e. Test the proportion hazards assumption by:
- interacting the variables with time (tvc, etc.)
- testing the Schoenfeld residuals (estat phtest)
- graphically for party (stphplot and stcoxkm)
In each case, what do you conclude?
f. Test for functional form by plotting martingale residuals against war chest and vote. What do you conclude?
g. Test goodness of fit by:
- using Cox-Snell residuals
- calculating the concordance (estat concordance)
In each case, what do you conclude (or fail to conclude)?
h. Test for outliers using dfbetas, likelihood displacement, and LMAX. What do you conclude?
i. Estimate the generalized gamma model for the war chest data. Use the same covariates as the Cox model. Among the distributions nested in the generalized gamma model,
which, if any, provides the best fit to the data? Can we rule out any?
j. Using the AIC/BIC criteria, which model fits best among the generalized gamma, Gompertz, log-logistic, log-normal,
Weibull, and exponential distributions? How do your conclusions here compare to
your answer in i?
k. Plot the estimated hazard rate from your preferred model in question j. Describe the
main features of the hazard function as displayed in this graph. How does it differ from
the hazard function in part c?
l. Assess the substantive significance of the variables in your preferred parametric model by calculating the median survival time (using the margins command after streg). Set all variables to the median [margins, at((median) w x y z)] and calculate the median survival time. Then move one variable with a negative coefficient to the 95th percentile, while holding the others at the median and calculate the median survival time [margins, at((median) x y z (p95) w)]. Move that variable back to the median, and move another negative variable and recalculate. If a variable has a positive coefficient, move it to the 5th percentile [margins, at((median) w x y (p5) z)]. (Why?) In this application, what is wrong with this approach?
- Work on your poster.
Back to
Assignments
page