Assignment #8
- PPol 603
- Due: Thursday, 1 November 2012
Type up your answers. Give proper credit to those you work with and/or the text(s).
Solve the following problems. Show all of your work, but keep your answers concise.
Highlight your (final) answer
to distinguish it from your other numbers and text. Include a copy of your input
(e.g. do file) or output (e.g. log file),
when it is an appropriate way to show your work.
However, do not include unnecessary output (i.e. no data dumps), and format any output
so that it is easily readable.
An appropriate time to include output is when you put your results
in a table--if your results are wrong, then graders have no idea how you came to your
conclusions (i.e. give partial credit) unless you provide some output. Explanation
includes statistical and substantive explanation (explain so that a statistical
layperson can understand it, and so that a
statistical analyst will see your erudition).
[from Stock 2007] What are the root causes of terrorism? Poverty? Repressive political regimes?
Religious or ethnic conflicts arising from heterogeneous populations? In problems 1-3, you will
take a look at some empirical evidence on cross-country sources of terrorism. Variables in the
data set are defined below. Note that to do this problem set, you will need
to create (generate) some new variables, which are functions of the variables in
the data set, found here.
The variables include:
Variable
|
Definition
|
ftmpop
|
Number of fatalities from terrorist incidents in the country, 1998-2004, per million population (U.S. State Department)
|
gdppc
|
GDP per capita in the country (World Bank)
|
lackpf
|
Index of the lack of political freedoms (Freedom House), 1-7 scale,
7 = extremely limited political freedoms
|
ethnic
|
Index of ethnic fractionalization (0 to 1 scale, 0 = no
fractionalization)
|
religion
|
Index of religious fractionalization (0 to 1 scale, 0 = no
fractionalization)
|
mideast, latinam, easteurope,
africa, eastasia
|
= 1 if the country is in the indicated region, = 0 otherwise
|
- {15 points} Preliminary data analysis:
a. Produce the scatterplot of ftmpop vs. gdppc. (This graph looks better if you exclude
observations where gdppc is missing.)
b. Generate the variables lnftmpop = ln(ftmpop) vs. lngdppc = ln(gdppc).
Produce the scatterplot of lnftmpop vs. lngdppc. (This graph also looks better if you exclude
observations where gdppc is missing.)
c. Produce the scatterplot of lnftmpop vs. lackpf.
d. Using the scatterplots from a. and b., would you suggest using the variables
(i) ftmpop and gdppc or (ii) lnftmpop and lngdppc
for modeling using linear regression?
e. Using the scatterplot from c., does the relation between lnftmpop and lackpf appear to be
linear or nonlinear? If nonlinear, what sort of nonlinear curve might you want to explore
(briefly explain)?
- {25} Possible non-linear models:
a. Estimate the regression in Table 1, which you can download
here, and fill in the empty entries.
Note: for these regressions, only use the countries
that have nonzero values of ftmpop.
b. Using regression (2), test the null hypothesis (at the 5% significance level) that the
coefficients on the "other regional dummies" all are zero, against the alternative
hypothesis that at least one is nonzero. What is the number of restrictions q in your test?
What do R2 and adjusted R2 tell you about the regional dummy variables?
c. Using regression (1), is there evidence that the relationship between lnftmpop and
lackpf is nonlinear?
d. Using regression (1), estimate the effect on lnftmpop of changing from lackpf = 7
(extremely limited political freedoms) to lackpf = 5 (some political freedoms), holding
constant the values of the other regressors in regression (1). Also calculate the effect on
lnftmpop of changing from lackpf = 5 to lackpf = 3, and from lackpf = 3
to lackpf = 1, holding
constant the values of the other regressors in regression (1).
e. Using regression (1), plot the estimated relationship between lnftmpop and lackpf. At approximately what value of lackpf is this relationship maximized? Confirm this by calculating
the maximum from the coefficients. In words, briefly describe the relationship you found.
- {30} Possible interaction models:
a. Create a new binary variable higdppc, which equals one if gdppc is greater than
or equal to
the median in the data set, and which equals zero otherwise. Estimate the
regressions in Table 2, which you can download
here, and fill in the empty entries.
Note: for all calculations, only use the countries that have
nonzero values of ftmpop and non-missing values of gdppc.
b. Regression (3) produces two regression lines, one for higdppc = 0 and one for higdppc = 1.
Produce a scatterplot of lnftmpop vs. lackpf, showing the two regression lines. This can be
done either by producing two scatterplots, one for higdppc = 0 and one for higdppc = 1, or by
combining the two scatterplots into a single graph. Use regression (3) to write out the
estimated regression lines for the two groups (in slope-intercept form).
c. Is the difference between the two slopes plotted in the scatterplot
statistically significantly different from zero at the 5% significance level? Explain.
d. In a sentence or two, interpret the sign of the coefficients in regression (3) and the
scatterplot; that is, explain in everyday terms the findings shown in that
scatterplot.
e. Using regression (4), test the hypothesis that the coefficients on
higdppc×lackpf and
higdppc×lackpf2 are zero, against the hypothesis that one or the other (or both) is
nonzero. State in words what the hypothesis is that you are testing.
f. Using regression (4), test the hypothesis that the coefficients on lackpf2 and
higdppc×lackpf2 are zero, against the hypothesis that one or the other (or both) is nonzero. State in
words what the hypothesis is that you are testing.
- {30} Case Study: Sex Discrimination in Employment [From Roberts 1979 via Ramsey and
Schafer 2002]
"Did a bank discriminatorily pay higher starting salaries to men than to women?" The data set
"lists data on employees from one job category (skilled, entry-level clerical) of a bank that
was sued for sex discrimination." There were "32 male and 61 female employees, hired
between 1965 and 1975. The measurements are of annual salary at time of hire, salary as of
March 1977, sex, seniority (months since first hired), age (months), education (years), and
work experience prior to employment with the bank (months)."
"Did the females receive lower starting salaries than similarly qualified and similarly
experienced males? After accounting for measures of performance, did females receive
smaller pay increases than males?" As an analyst for the U.S. Equal Employment Opportunity
Commission (EEOC), write a professional report that presents your analysis and gives your opinion
regarding sex discrimination. Be sure to describe the assumptions and methodologies used to
arrive at your findings. In addition to your analysis and interpretation, what are the strengths and weaknesses of this approach? What would you do (instead) to investigate this issue? Assume that your readers have only a vague familiarity with statistics (i.e., they are laypersons). The data set can be found here.
Back to
Assignments
page