Assignment #8

PPol 603
Due: Thursday, 1 November 2012

Type up your answers. Give proper credit to those you work with and/or the text(s).

Solve the following problems. Show all of your work, but keep your answers concise. Highlight your (final) answer to distinguish it from your other numbers and text. Include a copy of your input (e.g. do file) or output (e.g. log file), when it is an appropriate way to show your work. However, do not include unnecessary output (i.e. no data dumps), and format any output so that it is easily readable. An appropriate time to include output is when you put your results in a table--if your results are wrong, then graders have no idea how you came to your conclusions (i.e. give partial credit) unless you provide some output. Explanation includes statistical and substantive explanation (explain so that a statistical layperson can understand it, and so that a statistical analyst will see your erudition).

[from Stock 2007] What are the root causes of terrorism? Poverty? Repressive political regimes? Religious or ethnic conflicts arising from heterogeneous populations? In problems 1-3, you will take a look at some empirical evidence on cross-country sources of terrorism. Variables in the data set are defined below. Note that to do this problem set, you will need to create (generate) some new variables, which are functions of the variables in the data set, found here. The variables include:

Variable

Definition

ftmpop

Number of fatalities from terrorist incidents in the country, 1998-2004, per million population (U.S. State Department)

gdppc

GDP per capita in the country (World Bank)

lackpf

Index of the lack of political freedoms (Freedom House), 1-7 scale, 7 = extremely limited political freedoms

ethnic

Index of ethnic fractionalization (0 to 1 scale, 0 = no fractionalization)

religion

Index of religious fractionalization (0 to 1 scale, 0 = no fractionalization)

mideast, latinam, easteurope, africa, eastasia

= 1 if the country is in the indicated region, = 0 otherwise

  1. {15 points} Preliminary data analysis:
    a. Produce the scatterplot of ftmpop vs. gdppc. (This graph looks better if you exclude observations where gdppc is missing.)
    b. Generate the variables lnftmpop = ln(ftmpop) vs. lngdppc = ln(gdppc). Produce the scatterplot of lnftmpop vs. lngdppc. (This graph also looks better if you exclude observations where gdppc is missing.)
    c. Produce the scatterplot of lnftmpop vs. lackpf.
    d. Using the scatterplots from a. and b., would you suggest using the variables (i) ftmpop and gdppc or (ii) lnftmpop and lngdppc for modeling using linear regression?
    e. Using the scatterplot from c., does the relation between lnftmpop and lackpf appear to be linear or nonlinear? If nonlinear, what sort of nonlinear curve might you want to explore (briefly explain)?
  2. {25} Possible non-linear models:
    a. Estimate the regression in Table 1, which you can download here, and fill in the empty entries. Note: for these regressions, only use the countries that have nonzero values of ftmpop.
    b. Using regression (2), test the null hypothesis (at the 5% significance level) that the coefficients on the "other regional dummies" all are zero, against the alternative hypothesis that at least one is nonzero. What is the number of restrictions q in your test? What do R2 and adjusted R2 tell you about the regional dummy variables?
    c. Using regression (1), is there evidence that the relationship between lnftmpop and lackpf is nonlinear?
    d. Using regression (1), estimate the effect on lnftmpop of changing from lackpf = 7 (extremely limited political freedoms) to lackpf = 5 (some political freedoms), holding constant the values of the other regressors in regression (1). Also calculate the effect on lnftmpop of changing from lackpf = 5 to lackpf = 3, and from lackpf = 3 to lackpf = 1, holding constant the values of the other regressors in regression (1).
    e. Using regression (1), plot the estimated relationship between lnftmpop and lackpf. At approximately what value of lackpf is this relationship maximized? Confirm this by calculating the maximum from the coefficients. In words, briefly describe the relationship you found.
  3. {30} Possible interaction models:
    a. Create a new binary variable higdppc, which equals one if gdppc is greater than or equal to the median in the data set, and which equals zero otherwise. Estimate the regressions in Table 2, which you can download here, and fill in the empty entries. Note: for all calculations, only use the countries that have nonzero values of ftmpop and non-missing values of gdppc.
    b. Regression (3) produces two regression lines, one for higdppc = 0 and one for higdppc = 1. Produce a scatterplot of lnftmpop vs. lackpf, showing the two regression lines. This can be done either by producing two scatterplots, one for higdppc = 0 and one for higdppc = 1, or by combining the two scatterplots into a single graph. Use regression (3) to write out the estimated regression lines for the two groups (in slope-intercept form).
    c. Is the difference between the two slopes plotted in the scatterplot statistically significantly different from zero at the 5% significance level? Explain.
    d. In a sentence or two, interpret the sign of the coefficients in regression (3) and the scatterplot; that is, explain in everyday terms the findings shown in that scatterplot.
    e. Using regression (4), test the hypothesis that the coefficients on higdppc×lackpf and higdppc×lackpf2 are zero, against the hypothesis that one or the other (or both) is nonzero. State in words what the hypothesis is that you are testing.
    f. Using regression (4), test the hypothesis that the coefficients on lackpf2 and higdppc×lackpf2 are zero, against the hypothesis that one or the other (or both) is nonzero. State in words what the hypothesis is that you are testing.
  4. {30} Case Study: Sex Discrimination in Employment [From Roberts 1979 via Ramsey and Schafer 2002]
    "Did a bank discriminatorily pay higher starting salaries to men than to women?" The data set "lists data on employees from one job category (skilled, entry-level clerical) of a bank that was sued for sex discrimination." There were "32 male and 61 female employees, hired between 1965 and 1975. The measurements are of annual salary at time of hire, salary as of March 1977, sex, seniority (months since first hired), age (months), education (years), and work experience prior to employment with the bank (months)."
    "Did the females receive lower starting salaries than similarly qualified and similarly experienced males? After accounting for measures of performance, did females receive smaller pay increases than males?" As an analyst for the U.S. Equal Employment Opportunity Commission (EEOC), write a professional report that presents your analysis and gives your opinion regarding sex discrimination. Be sure to describe the assumptions and methodologies used to arrive at your findings. In addition to your analysis and interpretation, what are the strengths and weaknesses of this approach? What would you do (instead) to investigate this issue? Assume that your readers have only a vague familiarity with statistics (i.e., they are laypersons). The data set can be found here.

Back to Assignments page