Assignment #4
- PPol 603
- Due: Thursday, 27 September 2012, 9:30 a.m.
Type up your answers.
Read the section in the syllabus on Academic Honesty and Plagiarism
(here)
to make sure you are giving proper credit to those you work with and/or the text(s).
Solve the following problems. Show all of your work, but keep your answers concise.
Highlight your (final) answer
to distinguish it from your other numbers and text. Include a copy of your input
(e.g. do file) or output (e.g. log file),
when it is an appropriate way to show your work.
However, do not include unnecessary output (i.e. no data dumps), and format any output
so that it is easily readable.
An appropriate time to include output is when you put your results
in a table--if your results are wrong, then graders have no idea how you came to your
conclusions (i.e. give partial credit) unless you provide some output. Explanation
includes statistical and substantive explanation (explain so that a statistical
layperson can understand it, and so that a
statistical analyst will see your erudition).
- {30 points} [from Stock 2006]
This problem gives you an opportunity to do some calculations on the relation
between smoking and lung cancer, using a (very) small sample of five countries.
The purpose of this exercise is to illustrate the mechanics of ordinary least squares
(OLS) regression. First you will calculate the regression “by hand” using formulas
from class and the textbook, then (in the next problems) you will use Stata to confirm the calculation.
For the “by hand” calculations, you may relive history and use long multiplication,
long division, and tables of square roots; or you may use an electronic
calculator or a spreadsheet.
The data are summarized in the following table. The variables are per capita
cigarette consumption in 1930 (the independent variable, “X”) and the death
rate from lung cancer in 1950 (the dependent variable, “Y”). The cancer rates
are shown for a later time period because it takes time for lung cancer to develop and
be diagnosed.
Observation # |
Country |
Cigarettes consumed per capita in 1930 (X) |
Lung cancer deaths per million people in 1950 (Y) |
1 |
Switzerland |
530 |
250 |
2 |
Finland |
1115 |
350 |
3 |
Great Britain |
1145 |
465 |
4 |
Canada |
510 |
150 |
5 |
Denmark |
380 |
165 |
Source: Edward R. Tufte, Data Analysis for Politics and Policy, Table 3.3.
Use a calculator, a spreadsheet (using no more than SUM and AVERAGE commands),
or “by hand” methods to compute the following;
refer to the textbook for the necessary formulas. Various formulas from the textbook
are also compiled here. (Note:
if you use a spreadsheet, attach a printout.)
a. The sample means of X and Y.
b. The standard deviations of X and Y.
c. The correlation coefficient, r, between X and Y.
d. b1, the OLS estimated slope coefficient from the regression
Yi = β0 + β1Xi + ui.
e. b0, the OLS estimated intercept term from the same regression.
f. Yi-hat, i = 1,…, n, the predicted values for each country from the regression.
g. ui-hat, the OLS residual for each country.
h. The R2.
i. The SER.
j. Graph the scatterplot of the five
data points and the regression line. Be sure to label the axes, the data points,
the residuals, and the slope and intercept of the regression line. (It is OK to write in
some of these by hand.)
- {20} [from Stock and Watson 2007] Using the data of the previous problem, input the data into Stata.
a. Calculate the same statistics as above using Stata. Present your results in a table.
b. Explain what the coefficient values, b0 and b1, mean.
c. Explain what Yi-hat and ui-hat are (perhaps using a specific
country as an example).
d. Interpret the SER (including its units) and R2 (including its units).
e. Will the regression give reliable predictions for a country that consumes 2000 cigarettes per capita? Why or why not?
f. Do you think that the distribution of cancer deaths is normal? Do you think that it is plausible that the distribution of errors is normal? (Consider the distribution of lung cancer deaths in the U.S. here. Is that normally distributed? Would it be normal internationally?)
g. Compute and interpret the estimated change in deaths for a country which reduces its cigarette consumption by 500 cigarettes per capita.
h. Are the three assumptions in Key Concept 4.3 satisfied? Explain (each one).
- {15} Do Problem E4.4 in Stock and Watson
- {35} Case Study: Sampling Scallops [from Barnett 1995 and McClave, Benson
and Sincich 1998] "The US Fisheries and Wildlife Service requires that in
any given 'harvest,' the average meat per scallop at least 1/36 of a pound. The
requirement is aimed at protecting baby scallops, though less to guarantee them happy
childhoods than to preserve enough adult scallops so that the species does not
disappear.
"The vessel arrived at a Massachusetts port with 11,000 bags of scallops, from
which the harbormaster randomly selected 18 bags for weighing. From each such bag,
his agents took a large scoopful of scallops; then, to estimate the bag's average
meat per scallop, they divided the total weight of meat in the scoopful by the
number of scallops it contained. Based on the 18 statistics thus generated, the
harbormaster estimated that each of the ship's scallops possessed on average 1/39
of a pound of meat (that is, they were about seven percent lighter than the
minimum requirement). Viewing this outcome as conclusive evidence that the
weight standard had been violated, federal authorites at once confiscated 95
percent of the catch (which they then sold in an auction). The fishing
voyage was thus transformed into a financial catastrophe for its participants.
"The ship's owner was as displeased with the US government as Captain Ahab had been
with Moby Dick. He declared that the vessel had fully complied with the weight
standard and saw lunacy in the assertion that sampling 18 bags out of 11,000
could yield a reliable estimate of the mean weight of all the ship's scallops.
He filed a lawsuit against the government and arranged for a Boston law firm
to represent him."
The law firm would like you to evaluate whether the ship’s owner has cause
to file a lawsuit against the federal government. Included below are the actual
scallop weight measurements for each of the 18 sampled bags. For ease of
understanding, each number is expressed as a multiple of 1/36 of a pound,
the minimum permissible average weight per scallop. Consequently, numbers
below one indicate individual bags that do not meet the standard:
0.93 0.88 0.85 0.91 0.91 0.84 0.90 0.98 0.88
0.89 0.98 0.87 0.91 0.92 0.99 1.14 1.06 0.93
Among the questions you should answer are:
- Can a reliable estimate of the mean weight of all the scallops be
obtained from a sample size of 18? If not, how big a sample would give a reliable estimate?
- Are there any flaws in the government’s decision rule to
confiscate a scallop catch if the mean weight of the scallops is less than 1/36 of a pound?
- Is there another procedure for determining whether a
ship is in violation of the minimum weight restriction? Apply your procedure to the data, and draw a conclusion about the ship in question.
Assume that the distribution of scallop weight in the scallop catch is normal. (What
happens if it is not?)
Do not worry about whether the government tried to pick up smaller scallops in the bags.
The government is not smart enough to pull it off—consider each bag/scoop to be
randomly drawn.
Prepare a professional document that presents the results of your analysis
and gives your opinion regarding the case. Be sure to describe the assumptions and
methodologies used to arrive at your findings. Assume that your readers have only a
vague familiarity with statistics (i.e., they are laypersons).
Back to
Assignments
page