## Spring 2010

## Econometric Project #4

## Estimating Models with Limited Dependent Variables

### Due 6am, Wednesday, March 31

This project asks you to analyze two different datasets, each being suited for one or more limited-dependent-variable models. For the second exercise, you are not only to perform some statistic analysis, but also to write up the results in a paper-like format.

### Project Teams

Project teams for this assignment are below. You should make contact with your partner as soon as possible to arrange a work schedule.

Skye Aaron | Luis López |

Raphael Deem | David Krueger |

Andrew Dubay | Tyrone Lee |

Tian Jiang | Ethan Knudson |

Robert Kahn | Li Zha |

Kelsey Lucas | Thomas Verghese |

Cori Savaiano | Nina Showell |

Justin Stewart | Trey Sands |

Erik Swanson | Suraj Pant |

### Exercise 1: Boy-Baby Bias and Fertility Decisions

The cover story in a recent issue (March 6. 2010) of the *Economist* described an international dearth of female children. In some countries, where people want or are compelled to have small families and where the economic incentives favor male offspring, the ratio of male to female births is very large. This exercise uses data from the 1980 U.S. Census to investigate fertility patterns among pre-1980 parents.

The dataset fertility.dta was obtained by Stock and Watson from the authors of Joshua Angrist and William Evans, "Children and Their Parents' Labor Supply: Evidence from Exogenous Variation in Family Size," *American Economic Review* 88(3), June 1988, 450-477. (Reading the paper is not likely to be very helpful in this exercise as you have only a small subset of their variables and are answering a somewhat different question.) The dataset contains information about 254,654 women who were between the ages of 21 and 35 and who had two or more children at the time of the Census.

#### a. Exploring the dataset

Inspect the data visually and calculate relevant summary statistics for the important variables. Because some of the variables have small integer values, the tabulate command (probably for one variable at a time) may provide additional information that summarize does not. It is often said that the "natural" ratio is 105 live boy births for every 100 live girls births (with more boys to offset higher mortality rates). Do the U.S. data for first children and for second children conform to the natural ratio?

#### b. Testing for boy-baby and both-sex bias in fertility decisions

Abortion was less common in the United States before 1980 than it is in many countries now, but access to convenient birth control became easy in the 1960s. It might be expected, then, that the primary impact of a preference for boy babies would be on fertility: families that had only girls would continue to reproduce more often than families in which one of the first children was a boy. A related hypothesis is that parents want to have at least one child of each sex, so they will have additional children if the first two are of the same sex. Use the dataset to test these hypotheses using both OLS and whatever alternative estimator(s) you think might be appropriate. You should decide whether to include the mother's age in 1980 and the dummy variables for race in your equation as controls. (Make an argument for or against doing so.) Write up a short summary of your results to accompany the table(s). In you summary, interpret your key coefficient estimates: how much does changing the key *X* variable(s) by an appropriate unit affect the probability of having more children?

#### c. Racial differences in bias

Sex bias in preferences about offspring is often attributed to cultural traditional. To the extent that individuals of different races have different cultures, this suggests that boy-baby and both-sex biases might be different for whites (the omitted racial group), blacks, hispanics, and the "other" races category. Test to see whether this is true. Again, write up a short summary of your results with appropriate tables and interpretations of coefficients.

### Exercise 2: Allocation of Time to Extramarital Affairs

In a novel exercise applying the economic theory of time allocation,
Ray Fair
developed and estimated a model of the amount of time devoted to
extra-marital
affairs (published in *Journal of Political Economy*, February
1978). The
theoretical
and data sections of Fair's paper
are available at the link. Although the original paper is
available
in the library and on JSTOR, please do not read the remainder of the
paper until
you have done your analysis. There are two reasons for this: (1) I
don't want
this to be just a replication exercise, and (2) there are several
alternative
methods that could be used to analyze this problem and I don't want
you to be
biased toward the one that Fair chose.

One of Fair's two datasets is in the Wooldridge textbook data file **affairs.dta**.
You
are to use these data to perform statistical analyses of
the propensity
to have affairs, using both OLS and alternative estimators that you think are appropriate, then write up the results as though you were writing
the remainder
of Fair's paper. This should include four sections:

**A complete description of the statistical analysis.**Explain the regressions that you performed and why you chose these particular methods.**A description of your regression results.**Your description should include both verbal explanation (emphasizing what is important) and tables.**A discussion of the implied effects of each explanatory variable on the dependent variable(s).**Regression results are often difficult to grasp, even for sophisticated readers. To illustrate your results, you are to include an analysis showing the predicted outcome for an individual with particular characteristics (values of the explanatory variables) of your choosing. Depending on your method, the predicted outcome could be one or more of: the probability of having an affair, the number of affairs, the probability that the number of affairs is in various ranges, etc. Then use your estimated model to explain how a change in each of the explanatory variables would affect this outcome (e.g., an increase in age from 30 to 40, ceteris paribus, would increase the probability of having an affair by 10 percentage points). These changes and their effects should be explained in simple language that someone with no economics or mathematics background would understand.**Conclusion.**The concluding section should summarize the results of your work and what is important about them.

The sections of Fair's paper that cover these points total a little over 3 journal pages, but he does not present as much as you will on item 3 above. Yours should probably be 2-5 including tables. As always, I'd like the Stata outputs you used in an appendix (or a separate file) so that I can see exactly what you did and replicate if necessary.