Assignment #10

PPol 604
Due: Thursday, 28 March 2013

Type up your answers. Give proper credit to those you work with and/or the text(s).

Solve the following problems. Show all of your work, but keep your answers concise. Highlight your (final) answer to distinguish it from your other numbers and text. Include a copy of your input (e.g. do file) or output (e.g. log file), when it is an appropriate way to show your work. However, do not include unnecessary output (i.e. no data dumps), and format any output so that it is easily readable. An appropriate time to include output is when you put your results in a table--if your results are wrong, then the grader has no idea how you came to your conclusions (i.e. give partial credit) unless you provide some output. Explanation includes statistical and substantive explanation (explain so that a statistical layperson can understand it, and so that a statistical analyst will see your erudition).

  1. {50} [from Box-Steffensmeier 1996 via Jones 2008] Does a war chest deter (or delay) a challenger from entering? Data on this question from the U.S. House elections in 1990 are found here. te denotes the ending time of a reporting period or the time of a "high quality" challenger enters a House race against an incumbent. (There can be multiple observations per incumbent: caseid denotes an incumbent identifcation code.) The scale of time is in weeks. The minimum is 1, denoting 1 week; the maximum is 90, denoting 90 weeks. If cut_hi = 1, it indicates that a high quality challenger entered at te. iv denotes the prior vote the incumbent received (scaled between 0 and 1; the minimum value is .5 (denoting 50 percent) and the maximum value is 1 (denoting 100 percent)). ec denotes the incumbent's warchest, that is, the amount of money the incumbent has in reserve to use at his or her discretion (scaled in millions of dollars). The minimum value is .00069, which corresponds to $690; the maximum value is 1.688, which corresponds to $1,688,000. south is a dummy variable denoting whether or not the incumbent is in a Southern state (1 denotes South, 0 denotes non-South). dem is a dummy variable denoting whether or not the incumbent is a Democrat (0 = Republican).
    a. How many cases are right-censored? What does right-censoring mean in the context of this research problem?
    b. Which covariates vary over time? Do they have any problems with reverse causality (i.e. rate dependence)?
    c. Run a Cox model using the covariates provided. Interpret the results. Create a graph of the (smoothed) hazard function and interpret it.
    d. Run a linktest. This specifies whether there is a specification error, including the proportional hazards assumption. (If the coefficient on _hatsq is statistically different than 0, then it indicates there is a problem.) What do you conclude?
    e. Test the proportion hazards assumption by: In each case, what do you conclude?
    f. Test for functional form by plotting martingale residuals against war chest and vote. What do you conclude?
    g. Test goodness of fit by: In each case, what do you conclude (or fail to conclude)?
    h. Test for outliers using dfbetas, likelihood displacement, and LMAX. What do you conclude?
    i. Estimate the generalized gamma model for the war chest data. Use the same covariates as the Cox model. Among the distributions nested in the generalized gamma model, which, if any, provides the best fit to the data? Can we rule out any?
    j. Using the AIC/BIC criteria, which model fits best among the generalized gamma, Gompertz, log-logistic, log-normal, Weibull, and exponential distributions? How do your conclusions here compare to your answer in i?
    k. Plot the estimated hazard rate from your preferred model in question j. Describe the main features of the hazard function as displayed in this graph. How does it differ from the hazard function in part c?
    l. Assess the substantive significance of the variables in your preferred parametric model by calculating the median survival time (using the margins command after streg). Set all variables to the median [margins, at((median) w x y z)] and calculate the median survival time. Then move one variable with a negative coefficient to the 95th percentile, while holding the others at the median and calculate the median survival time [margins, at((median) x y z (p95) w)]. Move that variable back to the median, and move another negative variable and recalculate. If a variable has a positive coefficient, move it to the 5th percentile [margins, at((median) w x y (p5) z)]. (Why?) In this application, what is wrong with this approach?
  2. Work on your poster.

Back to Assignments page