Assignment #7

PPol 603
Due: Thursday, 25 October 2012

Type up your answers. Give proper credit to those you work with and/or the text(s).

Solve the following problems. Show all of your work, but keep your answers concise. Highlight your (final) answer to distinguish it from your other numbers and text. Include a copy of your input (e.g. do file) or output (e.g. log file), when it is an appropriate way to show your work. However, do not include unnecessary output (i.e. no data dumps), and format any output so that it is easily readable. An appropriate time to include output is when you put your results in a table--if your results are wrong, then graders have no idea how you came to your conclusions (i.e. give partial credit) unless you provide some output. Explanation includes statistical and substantive explanation (explain so that a statistical layperson can understand it, and so that a statistical analyst will see your erudition).

  1. {15 points} Use the growth data set found in Assignment 6. Include Malta.
    a. Run a graph matrix using the variables in regression (3): growth, tradeshr, school60, capstock60, revc, and civil. Does there appear to be a non-linear relationship between growth and any other variables? Which ones? Does there appear to be any strong correlation (i.e. multicollineartity) between any independent variables?
    b. Run a simple regression of growth on tradeshr including Malta, then excluding Malta.
    c. Create a dummy variable for Malta which equals 1 for Malta and 0 for all other countries. Regress growth on tradeshr and the Malta dummy variable. Compare the regression results to b. Which R2 (or adjusted R2) are you allowed to compare? What does the coefficient on the Malta dummy variable represent? The p-value of the coefficient? [Note: This is called "dummying out" an observation.]
    d. Regress growth on tradeshr, school60, capstock60, revc, civil, and the Malta dummy variable. Would you consider Malta an outlier?
  2. {15} [Wooldridge 2013] The data set found here contains information and career statistics for 269 players in the National Basketball Assocation (NBA).
    a. Estimate a model relating points-per-game (points) to years in the league (exper), age, and years played in college (coll). Include a quadratic in exper (but not the other variables).
    b. Holding college years and age fixed, what is the effect on points of moving from 2 to 3 years of experience? 9 to 10 years of experience? 16 to 17 years? At what value of experience does the next year of experience actually reduce points-per-game? Does this make sense?
    c. Interpret and explain the coefficient on coll. [Note: At the time the data were collected, NBA players could be drafted before finishing their college careers and even directly out of high school.]
    d. Add a quadratic in age to the equation. Is it needed? What does this appear to imply about the effects of age, once experience and education are controlled for?
  3. {15} [Wooldridge 2013] Use the data found here, which come from a midsize research university. The variables are described in the data set.
    a. Produce a scatterplot of sat vs. hsize. Does the relationship look positive or negative or quadratic or something else?
    b. Estimate the model sat = β0 + β1hsize, and present the results in equation form.
    c. Conduct an omitted variable test on the model in part b. Interpret the results.
    d. Estimate the model sat = β0 + β1hsize + β2hsize2, and write the results in equation form.
    e. Conduct an omitted variable test on the model in part d. Interpret the results.
    f. Using the estimated equation in part d., what is the "optimal" high school size? (That is, what high school size maximizes sat?)
    g. Is this analysis representative of the academic performance of all high school seniors? Explain.
  4. {25} [Wooldridge 2013] The data set found here contains information on net financial wealth (nettfa), age of the survey respondent (age), annual family income (inc), family size (fsize), and participation in certain pension plans for people in the united states. The wealth and income variables are both recorded in thousands of dollars. For this question, use only the data for single-person households (so fsize=1).
    a. What is the youngest age of people in this sample? How many people are at that age?
    b. Estimate a model relating net financial wealth to income, age, and age squared. Report and interpret the results.
    c. Graph the relationship between predicted nettfa and age (for the observed range of age) setting inc = 30 (roughly, the average). Describe what you see.
    d. Check for multicollinearity using vif. What do you find?
    e. Center age at 40 (roughly, the average) by creating a new variable (e.g. gen age40 = age - 40). Re-estimate b. using the centered age variable (for both the linear and squared term). What changes from b.? What stays the same?
    f. Check for multicollinearity in the model estimated in e. using vif. What do you find? [Note: This is a standard way to alleviate multicollinearity with polynomials.]
    g. Check to see whether including a quadratic in inc is necessary.
    h. Use hettest to see if there is any heteroskedasticity left in the model estimated in e. Explain what you find.
    i. Use dfbeta on the model estimated in e. to see if there are any influential outliers. Find these influential outliers on a graph matrix of wealth, income and age. Drop the outliers and re-run the model from part e. How do your results change?
  5. {30} Research Project Outline:
    Turn in a combination of initial results displayed in tables and figures, including model diagnostics, along with some bullet points interpreting your results. Some suggestions on writiing a research paper (by Stock and Watson) can be found here.

Back to Assignments page