Assignment #1

PPol 604
Due: Thursday, 17 January 2013

Type up your answers. Give proper credit to those you work with and/or the text(s).

Solve the following problems. Show all of your work, but keep your answers concise. Highlight your (final) answer to distinguish it from your other numbers and text. Include a copy of your input (e.g. do file) or output (e.g. log file), when it is an appropriate way to show your work. However, do not include unnecessary output (i.e. no data dumps), and format any output so that it is easily readable. An appropriate time to include output is when you put your results in a table--if your results are wrong, then the grader has no idea how you came to your conclusions (i.e. give partial credit) unless you provide some output. Explanation includes statistical and substantive explanation (explain so that a statistical layperson can understand it, and so that a statistical analyst will see your erudition).

  1. {65} As a comparison with ordered logit, we will first do this problem with logit. To solve this problem, use a subset of the 1996 GSS dataset here. Researchers have posited that age, education, gender, religion, socioeconomic status, and class influence whether a person attends church regularly. Your measure of church attendance is an ordinal one. To run logit, simplify that variable: Create a dummy dependent variable indicating if a respondent is a regular churchgoer or not (in this case, a regular churchgoer attends church once a month or more). (Hint: the codebook command might be useful, e.g. codebook attend.) Create true dummy variables for whether the respondent is female, Protestant, or Catholic (let the other categories be combined into the baseline category). Treat the other variables as interval level. Create a table to show results.
    a. Before estimating the regression, hypothesize the expected signs of the coefficients (and why).
    b. Use logit to test all of the theories simultaneously (i.e. see which variables are significant while controlling for the others). If there are any insignificant independent variables, see if you can remove them (through a joint significance test), and run the econometric analysis again, but present both sets of results. You do not need to interpret yet.
    (Use the simpler specification in the analysis that follows. This is where you will start to interpret.)
    c. Discuss qualitatively how each independent variable affects church attendance.
    d. Calculate the probability and confidence interval of regular attendance for a baseline case of male, non-Catholic, non-Protestant, 45 years old, 12 years education, working class, sei=49 (you may not need all of these if you have dropped any variables).
    e. Calculate changes in probability for various changes in independent variables (dummy variable moves from 0 to 1, age moves from 45 to 61, education moves from 12 to 16, working class to middle class, sei moves from 49 to 68). Interpret and explain the results. Use a table.
    f. Plot the relationship between education and (the probability of) regular church attendance for average men and women (where average is a 45-year-old, middle class Protestant with sei = 49).
    g. Discuss the fit of the model generally using chi2, pseudo-R2, percentage predicted correctly, and proportional reduction of error.
    h. Test for misspecification (functional form or omitted variables) and show your results.
    i. Test for multicollinearity and show your results.
    j. Now, use the original ordinal dependent variable to fit an ordered logit model with all of the variables. If there are any insignificant independent variables, see if you can remove them (through a joint significance test), and run the econometric analysis again, but present both sets of results. (Use the simpler specification in what follows.)
    k. Discuss if there are any qualitative differences in the (binary) logit and ordered logit results.
    l. Calculate the predicted probabilities and confidence intervals of different levels of attendance for a baseline case of male, non-Catholic, non-Protestant, 45 years old, 12 years education, working class, sei=49 (you may not need all of these if you have dropped any variables).
    m. Calculate changes in probabilities for changing from the baseline case to a female (with all else remaining the same). Interpret and explain the results. Use a table.
    n. Plot the relationship between education and (the probability of) attending church more than once a week for average men and women (where average is a 45-year-old, middle class Protestant with sei = 49). What happens when you generate a graph like this for a category in the middle (e.g. the median response: attending several times a year)?
    o. Discuss the fit of the model generally using chi2, pseudo-R2, percentage predicted correctly, and proportional reduction of error.
    p. Test for misspecification (functional form or omitted variables) and show your results.
    q. Test for multicollinearity and show your results.
  2. {35} [roughly from Golder 2006] Use the data set here. This is a subset of the 1992 National Election Study (cleaned up a little), which includes Bush, Clinton, and Perot voters. (Use summarize, tab, and codebook to examine the variables.)
    a. Estimate a multinomial logit model where voters are choosing (vote) between Bush (0), Clinton (1), and Perot (2). Make Bush the base category. The explanatory variables should be conservative, economyworse, education, union,, income, and black. Qualitatively discuss (i.e. coefficient signs and statistical significance) which variables affect which vote choices.
    b. How can you compare Clinton and Perot voters? Make that comparison, and qualitatively discuss which variables affect the vote choice between Clinton and Perot.
    c. Estimate a binary logit model using only Bush and Clinton voters using the same explanatory variables as (a). How and why do the results differ or remain the same as (a)?
    d. Interpret the relative risk ratio on black for all three vote comparisons.
    e. Graphically present the effect of economyworse on (the probability of) voting for the three candidates. To do so, set conservative to 4, the other variables to their medians, and obtain the predicted probability of voting for each candidate at each level of economyworse. (economyworse is derived from the question: "Would you say that over the past year the nation's economy has gotten better, stayed about the same, or gotten worse?" It is coded 1: much better, 2: somewhat better, 3: stayed the same, 4: somewhat worse, 5: much worse.) Then create three lines (one for each candidate) on a graph with economyworse on the x-axis and predicted probabilities of voting for each candidate on the y-axis. Interpret the graph. [Hint: marginsplot will not do this for all three lines simultaneously, but 3 margins commands will generate the information you need (which is 3 probabilities at each level of economyworse).]
    f. Discuss the fit of the model generally using chi2, pseudo-R2, percentage predicted correctly, and proportional reduction of error.
    g. Test for multicollinearity (using vif) and show your results.

Back to Assignments page