Econometric Project #3: Analyzing Panel Data on Airfares
Due 6am, Thursday, March 25
The dataset airfare.dta contains data for airfares, number of passengers, distance, and the market share of the largest carrier for each of the top 1149 city-pair markets within the contguous 48 states for the fourth quarters of 1997, 1998, 1999, and 2000. The data are from the Domestic Airline Fares Consumer Report published by the U.S. Department of Transportation. The city pairs in the sample account for about 75 percent of total within-48-state passenger trips. This dataset is taken from the collection published with the Wooldridge text and the assignment is a greatly expanded version of one of his empirical exercises.
Project teams for this assignment are below. You should make contact with your partner as soon as possible to arrange a work schedule, particularly given that this project spans the spring-break interval and people's spring-break plans will vary.
|Skye Aaron||Trey Sands|
|Raphael Deem||Suraj Pant
|Andrew Dubay||Kelsey Lucas|
|Tian Jiang||Luis López|
|Robert Kahn||Tyrone Lee|
|David Krueger||Ethan Knudson|
|Cori Savaiano||Thomas Verghese|
|Justin Stewart||Erik Swanson|
|Nina Showell||Li Zha|
Exercise 1: Distance and Airfares
a. Exploring the data
Use the data viewer to examine the dataset. Does the panel seem balanced for all variables or are there missing values? (The downloadable command mvpatterns is useful for determining the pattern of missing data.) The variables for the cities are labeled origin and destination. Is each city-pair included twice or just once? Is this what you would want? Calculate summary statistics for the major variables in the dataset for each year. (If you sort by year, you can use the by year: prefix to automate this.) Is there anything unexpected here?
b. Using OLS to estimate the airfare/distance relationship
One of the principal determinants of a route's fare is likely to be the cost of flying the route. Think about the costs of a particular flight: flight-crew wages, fuel, airplane user cost, landing rights, ground-crew and ticket-staff wages, gate rental, overhead costs, etc. Which components of cost are likely related to the flight's distance and which are not. Based on these considerations, does a linear specification seem appropriate? How about a log-log specification? Do you think, based on theory, that quadratic terms might be appropriate in either the linear or log-log models?
Explore the airfare/distance relationship by estimating linear and log-log specifications. Try adding quadratic terms to both the linear specification and the log-log specification (adding the square of the log) to see if they are important. What is the shape of the quadratic relationship? Is it what you expected? Is the function increasing or decreasing at all distances in the sample or does the estimated function have a maximum or minimum within the sample range? What are your conclusions about quadratic terms? Which is your preferred overall specification and why?
c. Time dummies and an airfare price index
Why might the relationship between fares and distance change over time? What kinds of variables would time dummies be proxying for? Does adding time dummies to the specification improve the fit? Using time dummies in a price regression with other qualitative variables (such as the hedonic house-price functions that we have discussed occasionally in class) is a good way to estimate a quality-adjusted price index. Re-estimate your preferred specification with time dummies. Does this improve the fit? Does it change your conclusions about the airfare/distance relationship? Are airfares going up or down during this period? By how much? Is your measure of price change in dollars or in percent?
d. Examining some particular fares
Choose two city-pairs involving Portland, perhaps ones that you and your partner fly frequently. For each of these pairs, examine the residuals from the regression you preferred in part (b), including time dummies if appropriate. Is that route more or less expensive than predicted by its distance? By how much (and is your measure in percentage or absolute dollars)? Does this surprise you?
e. Fixed-effects and random-effects estimation
What problems arise with you attempt to estimate the fare/distance relationship with fixed effects? Do these problems also arise with random effects? What assumptions are necessary in order for the random-effects model to be appropriate? Are these assumptions likely to be justified or violated in this particular case? Estimate your preferred model (including time dummies if appropriate) using random effects and compare the results with those of the earlier parts. How much of the variation in the error term is due to the unit component of the error and how much is due to the idiosyncratic error? Is it still your preferred model or does random-effect estimation suggest a different choice?
f. An alternative fixed-effects estimator
Each city-pair consists of two cities, the origin and destination cities. Thus a true fixed-effects estimator has a dummy variable for each route. An alternative in this case would be to construct dummy variables for each city, setting the dummy equal to one if the city is either the origin or destination city. Each observation will have two dummies with values of one, so this isn't exactly a fixed-effects model. But it is the proper model if the unit-specific variation in fares is city-specific rather than route-specific. Why can the effect of distance be estimated in this model more easily than in the route-based fixed-effects model?
The Stata do-file city_dum_gen.do will create this set of dummies in your dataset. Download the do-file, then open a Stata do-file editor window using the button near the middle of the button bar. Once in the do-file editor, use the open button at the top left to retrieve the city_dum_gen.do file from your hard drive, then press the Execute (do) button at the far right of the do-file-editor button bar to run the do-file. You should see a bunch of new variables of the form d_CITY__ST in your dataset. Be sure that you check a few observations from the top, middle, and bottom of your dataset to make sure that the variables were created properly. So that you don't run into the dummy-variable trap, delete the dummy for Portland, OR using the Stata drop command: drop d_PORTLAND__OR. (It's a double-underscore between city and state.)
Add these city dummies to your preferred OLS regression specification (and, if different, your preferred random-effects specification), estimating by OLS (including time dummies if appropriate). Is the estimated airfare/distance relationship substantially different than the OLS estimates? Are the city dummies jointly statistically significant? What does it mean for a particular city to have a large positive (or negative) coefficient on its dummy variable? Examine the estimated coefficients on the dummies. Do you see a pattern? Are there characteristics that the cities with positive (negative) dummy coefficients have in common? If you see a pattern, can you think of reasons why this might be true?
Use t-tests on dummy variables to test whether fares from/to Portland (OR) are the same as fares from/to Seattle, Los Angeles, San Francisco, Sacramento, and Spokane, controlling for distance. What do you find?
Exercise 2: Effects of Market Concentration
In addition to costs, one of the factors that is likely to influence airfares is the amount of competition on the route. More concentrated routes would be expected to have higher fares than routes on which many airlines compete because with more monopoly power airlines will be able to mark up fares by more relative to costs. We measure concentration here by the market share of the largest carrier on each route, which is the variable bmktshr in the dataset. A higher value of bmktshr is associated with greater market concentration.
a. Estimating the effects of market concentration
Add market concentration to your preferred specification using OLS, full fixed effects, and random effects. What do you conclude about the effects of market concentration on airfares? Is this result robust across the three estimation methods? Does including the concentration measure change the estimated relationship between airfare and distance?
b. Is Portland expensive or cheap when market concentration is considered?
Re-examine the residuals from your two city-pairs that you used in Exercise 1 (d) using the OLS residuals when market concentration is included. Does market concentration help explain why these fares were unusually high or low when you controlled only for distance? Now re-run the alternative fixed-effects estimator from Exercise 1 (f) with the city dummies and re-do the comparison of Portland against the other West Coast cities. Does market concentration help explain differences between Portland and these cities?
Exercise 3: Effects of Market Size
Full planes are cheaper (per passenger) than empty ones, so costs, and therefore fares, might be lower on heavily traveled routes. The variable passen is the average number of passengers per day flying the route.
a. Estimating the effect of market size
Add the variable passen to your regressions. Should it be entered in log form? Does it need a quadratic term? What effect does market size seem to have on fares? What is the estimated effect? Does it correspond with what you expected? How, if at all, and why, does including market size affect the relationship between airfares and the variables (distance, market concentration, year, and perhaps the city dummies) that you previously entered in the regression? Is this result consistent across OLS, fixed effects, random effects, and the alternative fixed-effects estimators?
b. What are we estimating?
Consider the equation you estimated in part (a). Is passen likely to be exogenous? What effects will this have on your estimates? Are you estimating a supply curve, a demand curve, or something else? How might we deal with this difficulty?