Assignment #11

Political Science 328

This assignment will be due in hard copy form in the department dropbox (outside 745 KMBL) AND uploaded on Learning Suite before 1:30 pm, Tuesday, April 9. Turn in the assignment electronically on Learning Suite (separately for each part of the assignment), and on paper (in four separate documents) in the Political Science dropbox. Remember that no late assignments will be accepted.

Type your answers in a regular font (e.g. Times Roman 12). (As noted later, Stata .do files and .log files are displayed in Courier 8.)

This assignment is divided into four parts. You must submit your answers to each part separately, as we will have a different TA grade each part. Make sure that your name, section number as well as the problem set and part number (e.g. Assignment 11, Part 1) are clearly listed on each part. Students who fail to do so may be penalized on the assignment.

If necessary, re-read the section in the syllabus on group work in Academic Honesty and Plagiarism (here) to make sure you are giving proper credit to those you work with and/or the text(s) you use for each problem. As a reminder, you are in violation of this course's policies as well as the Honor Code if you are sharing electronic portions of your assignment with other people. That includes emailing other people code (even snippets of code), .do files, Word files, or anything else related to a problem set. Your assignment must represent your own work. Please work together: We encourage you to do so! But remember that when working together you should produce your own independent work product.

Solve the following problems. Show all of your work, but keep your answers concise. Include a copy of your input and output: your .do file and your .log file for Stata. However, do not include unnecessary output (i.e. no data dumps), and format any output so that it is easily readable. Convert Stata input and output (.do files and .log files, respectively) to Courier 8 with single-spacing. Explanation includes statistical and substantive explanation (explain so that a statistical layperson can understand it, and so that a statistical analyst will see your erudition). Highlight your answer.

{1 point} Take the Quantitative Thinking Survey. It will take 15 minutes or so. It is a 20-question multiple-guess exam. The highest scoring student will get a box of donuts (or some equivalent treat of your choosing). Here is how you access the survey:
Go here.
Fill in the following information in the blanks:
Department: POLI
Course: 328
Section: 2019 [regardless of what section you are in]
NetID: [your netID]
and start the survey by clicking on the arrow in the lower right.

State in part 1 whether you took the exam. [1 point if you took it, 0 points if you did not take it, -10 if you say you took but did not]

Remember that you also need to turn in the time survey for Part 4.
{4} Do Exercise 12.9 (p. 463) in Stock & Watson. (Think of this as a Testing Center problem.)
{32} Workplace smoking bans may encourage smokers to quit. A data file (smokeban.xlsx, described in smokeban.pdf) is found on this part of the assignment on Learning Suite. The variables are described within the data set. It consists of a sample of U.S. (indoor) workers from 1991-1993. (Smoking bans were introduced in the 1990s.) The educational variables refer to the highest level of education. Thus, an individual with a college degree is coded as 1 for colgrad and 0 for hsgrad. Report the results of (a) and (b) in a table similar to Problem 2 of Assignment 10, where you report the coefficient and standard error for smokeban, and not for other variables (but you do report whether those variables are included or not, and you report model statistics). Use a second table (provided on Learning Suite) for parts (e)-(h).
1. Estimate a probit model with smoker as the dependent variable and smokeban as a regressor. How does a workplace smoking ban affect smoking? Is smokeban statistically significant?
2. Estimate a probit model with smoker as the dependent variable and the following regressors: smokeban, female, age, age², hsdrop, hsgrad, colsome, colgrad, black, and hispanic. How does a workplace smoking ban affect smoking? Is smokeban statistically significant? Compare the estimated effect of a smoking ban from this regression with your answer from (a). Suggest a reason, based on the substance of this regression (was there some sort of problem with one of the regressions?), explaining the change in the estimated effect of a smoking ban between (a) and (b).
3. Test the hypothesis that the probability of smoking does not depend on the level of education in the probit model of (b). Does the probability of smoking increase or decrease with the level of education?
4. Discuss the fit of the two models generally using chi², pseudo-R², percentage correctly predicted, and proportional reduction of error.
5. Mr. A is white, non-Hispanic, 20 years old, and a high school dropout. Using the probit regression from (b), and assuming that Mr. A. is not subject to a workplace smoking ban, calculate the probability that Mr. A smokes. Carry out the calculation again assuming that he is subject to a workplace smoking ban. What is the effect of the smoking ban on the probability of smoking?
6. Repeat (e) for Ms. B, a female, black, 40-year-old, college graduate.
7. Repeat (e) and (f) using a logit model, using the same independent variables used in (b).
8. Repeat (e) and (f) using a linear probability model, using the same independent variables used in (b).
9. Based on the answers to (e)-(h), fill in the table provided on Learning Suite. Note that the table uses full variable names, something you should use in professional reports. Do the probit and logit and linear probability model results differ? If they do, which results make more sense? Are the estimated effects large in a real-world sense?
10. Test for multicollinearity (using vif) in part (b) and show your results.
11. Test for misspecification (functional form or omitted variables using linktest) in part (b) and show your results.
12. Test for outliers (using deviance residuals). What are the characteristics of the person(s) that fit the model worst?
13. Test for influential observations (using dbeta) in part (g). What are the characteristics of the person(s) that are most influential in the model?
14. Discuss the strengths and weaknesses of this model: in addition to the diagnostics performed above, discuss all aspects of internal and external validity.
{36} [adapted from Alvarez and Brehm 1995] Practice Final Exam Problem: Attitudes on Abortion
You have just been hired as an analyst for the National Pro-Life Association to determine what factors influence attitudes toward abortion. Relevant data (gss96.dta) are found on this part of the assignment on Learning Suite. You believe that age, race, sex, education, political views, religion, and religious intensity (measured by church attendance) might matter. Furthermore, to understand how broad public support is for abortion rights, you will examine two different attitudes toward abortion, abpoor and abdefect. Your task is to advise the organizational executive about what factors affect these attitudes toward abortion. Explain why you have chosen those factors [hypotheses], and the influence that those factors have on abortion attitudes. Include a substantive effects table. Suggest where the organization should focus its resources to recruit new members.

In addition to your analysis and interpretation, what are the strengths and weaknesses of this approach? What would you do (instead) to investigate this issue?

You do not need a professional report, but you should still interpret and present your results well (e.g. in a table), and explain what you did and why. You should still write up the results in a page or less, not including tables and graphs. Remember to include an appendix showing your work. (In other words, treat this like a problem on the final exam that does not require a professional memo. This means that you can use statistical jargon in your main answer. But you must also explain your results so that it can be understood by the layperson.)
{27} [from Bartels 1991] A data file (spending86.txt) is found on this part of the assignment on Learning Suite. This dataset contains information on 250 non-freshmen incumbents in the 1986 election who had major-party challengers in the 1986 and 1984 elections (excluding a couple of outliers). Note that the spending variables are logged to reflect diminishing returns to spending. Investigate how incumbent and challenger spending affect the percentage of the vote that the incumbent receives. Put your results in a table for parts (b), (e), (f), and (h).
1. Before running a regression, hypothesize how party (democrat), challenger quality (hq, = 1 if the challenger has held elected office), previous vote (vote_1), (logged) incumbent spending (lispend), and (logged) challenger spending (lcspend) will affect the two-party vote percentage, vote, and explain why. (Two-party vote percentage is the percentage of the vote the incumbent received among the votes received by the Democratic and Republican candidates. It is meant to control for third-party candidates.)
2. Run an OLS regression using the variables in part (a). (Do not worry about endogeneity yet.) Compare your hypotheses from part (a) with the results. If there are any differences, why?
3. Give two reasons why incumbent spending might be endogenous. Think about this conceptually as well as statistically. If a variable is endogenous, it usually means there is an internal validity problem.
4. The variable (logged) incumbent spending in the previous election cycle (lispend1) is proposed as an instrument for incumbent spending in the current election cycle. Which reason(s) of endogeneity might this fix? Why?
5. Conduct TSLS (sometimes called 2SLS) "by hand," i.e. you run the two stages yourself in Stata, without using the Stata command ivregress (or commands like it).
6. Conduct TSLS using the Stata command ivregress 2sls. Compare the results to the results in parts (b) and (e). Interpret and explain (statistically and substantively) your findings. Compare your hypotheses from part (a) with the results. If there are any differences, why?
7. Conduct a test whether incumbent spending is endogenous. Use whatever tests you can for instrument validity. If you cannot include a test, explain why.
8. There is a theory that challenger quality does not directly affect the vote, but only affects it through incumbent spending. Thus, challenger quality could be used as an instrument and not included as a control variable. Using this specification, test whether incumbent spending is endogenous. Use whatever tests you can for instrument validity. If you cannot include a test, explain why. What do you conclude about this theory?
9. What other instruments could you suggest for incumbent spending? Would these overcome both types of endogeneity discussed in part (c)?
10. Why might challenger spending be endogenous? What instruments could you suggest for challenger spending?
11. Justify or criticize the races that are excluded from the data set: open seats (where there is no incumbent), unopposed incumbents (in either 1984 or 1986), first-year incumbents, and the outliers (who were financing national campaigns).

{1} Complete the Time Spent Survey. State your survey completion code at the top of your Part 4 packet (next to your name, section, etc.).