Assignment #4

PPol 603
Due: Thursday, 27 September 2012, 9:30 a.m.

Type up your answers. Read the section in the syllabus on Academic Honesty and Plagiarism (here) to make sure you are giving proper credit to those you work with and/or the text(s).

Solve the following problems. Show all of your work, but keep your answers concise. Highlight your (final) answer to distinguish it from your other numbers and text. Include a copy of your input (e.g. do file) or output (e.g. log file), when it is an appropriate way to show your work. However, do not include unnecessary output (i.e. no data dumps), and format any output so that it is easily readable. An appropriate time to include output is when you put your results in a table--if your results are wrong, then graders have no idea how you came to your conclusions (i.e. give partial credit) unless you provide some output. Explanation includes statistical and substantive explanation (explain so that a statistical layperson can understand it, and so that a statistical analyst will see your erudition).

  1. {30 points} [from Stock 2006] This problem gives you an opportunity to do some calculations on the relation between smoking and lung cancer, using a (very) small sample of five countries. The purpose of this exercise is to illustrate the mechanics of ordinary least squares (OLS) regression. First you will calculate the regression “by hand” using formulas from class and the textbook, then (in the next problems) you will use Stata to confirm the calculation. For the “by hand” calculations, you may relive history and use long multiplication, long division, and tables of square roots; or you may use an electronic calculator or a spreadsheet. The data are summarized in the following table. The variables are per capita cigarette consumption in 1930 (the independent variable, “X”) and the death rate from lung cancer in 1950 (the dependent variable, “Y”). The cancer rates are shown for a later time period because it takes time for lung cancer to develop and be diagnosed.

    Observation #

    Country

    Cigarettes consumed
    per capita in 1930 (X)

    Lung cancer deaths per
    million people in 1950 (Y)

    1 Switzerland 530 250
    2 Finland 1115 350
    3 Great Britain 1145 465
    4 Canada 510 150
    5 Denmark 380 165
    Source: Edward R. Tufte, Data Analysis for Politics and Policy, Table 3.3.
    Use a calculator, a spreadsheet (using no more than SUM and AVERAGE commands), or “by hand” methods to compute the following; refer to the textbook for the necessary formulas. Various formulas from the textbook are also compiled here. (Note: if you use a spreadsheet, attach a printout.)
    a. The sample means of X and Y.
    b. The standard deviations of X and Y.
    c. The correlation coefficient, r, between X and Y.
    d. b1, the OLS estimated slope coefficient from the regression Yi = β0 + β1Xi + ui.
    e. b0, the OLS estimated intercept term from the same regression.
    f. Yi-hat, i = 1,…, n, the predicted values for each country from the regression.
    g. ui-hat, the OLS residual for each country.
    h. The R2.
    i. The SER.
    j. Graph the scatterplot of the five data points and the regression line. Be sure to label the axes, the data points, the residuals, and the slope and intercept of the regression line. (It is OK to write in some of these by hand.)
  2. {20} [from Stock and Watson 2007] Using the data of the previous problem, input the data into Stata.
    a. Calculate the same statistics as above using Stata. Present your results in a table.
    b. Explain what the coefficient values, b0 and b1, mean.
    c. Explain what Yi-hat and ui-hat are (perhaps using a specific country as an example).
    d. Interpret the SER (including its units) and R2 (including its units).
    e. Will the regression give reliable predictions for a country that consumes 2000 cigarettes per capita? Why or why not?
    f. Do you think that the distribution of cancer deaths is normal? Do you think that it is plausible that the distribution of errors is normal? (Consider the distribution of lung cancer deaths in the U.S. here. Is that normally distributed? Would it be normal internationally?)
    g. Compute and interpret the estimated change in deaths for a country which reduces its cigarette consumption by 500 cigarettes per capita.
    h. Are the three assumptions in Key Concept 4.3 satisfied? Explain (each one).
  3. {15} Do Problem E4.4 in Stock and Watson
  4. {35} Case Study: Sampling Scallops [from Barnett 1995 and McClave, Benson and Sincich 1998] "The US Fisheries and Wildlife Service requires that in any given 'harvest,' the average meat per scallop at least 1/36 of a pound. The requirement is aimed at protecting baby scallops, though less to guarantee them happy childhoods than to preserve enough adult scallops so that the species does not disappear.
    "The vessel arrived at a Massachusetts port with 11,000 bags of scallops, from which the harbormaster randomly selected 18 bags for weighing. From each such bag, his agents took a large scoopful of scallops; then, to estimate the bag's average meat per scallop, they divided the total weight of meat in the scoopful by the number of scallops it contained. Based on the 18 statistics thus generated, the harbormaster estimated that each of the ship's scallops possessed on average 1/39 of a pound of meat (that is, they were about seven percent lighter than the minimum requirement). Viewing this outcome as conclusive evidence that the weight standard had been violated, federal authorites at once confiscated 95 percent of the catch (which they then sold in an auction). The fishing voyage was thus transformed into a financial catastrophe for its participants.
    "The ship's owner was as displeased with the US government as Captain Ahab had been with Moby Dick. He declared that the vessel had fully complied with the weight standard and saw lunacy in the assertion that sampling 18 bags out of 11,000 could yield a reliable estimate of the mean weight of all the ship's scallops. He filed a lawsuit against the government and arranged for a Boston law firm to represent him."
    The law firm would like you to evaluate whether the ship’s owner has cause to file a lawsuit against the federal government. Included below are the actual scallop weight measurements for each of the 18 sampled bags. For ease of understanding, each number is expressed as a multiple of 1/36 of a pound, the minimum permissible average weight per scallop. Consequently, numbers below one indicate individual bags that do not meet the standard:
            0.93    0.88    0.85    0.91    0.91    0.84    0.90    0.98    0.88
            0.89    0.98    0.87    0.91    0.92    0.99    1.14    1.06    0.93
    Among the questions you should answer are: Assume that the distribution of scallop weight in the scallop catch is normal. (What happens if it is not?) Do not worry about whether the government tried to pick up smaller scallops in the bags. The government is not smart enough to pull it off—consider each bag/scoop to be randomly drawn.
    Prepare a professional document that presents the results of your analysis and gives your opinion regarding the case. Be sure to describe the assumptions and methodologies used to arrive at your findings. Assume that your readers have only a vague familiarity with statistics (i.e., they are laypersons).

Back to Assignments page