Econometric Project #6
Forecasting and analysis of macroeconomic time series
Due 6am, Thursday, April 22
This project asks you to download data for several macroeconomic variables for a country of your choice, then analyze the dynamics of the variables using a vector autoregression.
Project teams for this assignment are below. You should make contact with your partner as soon as possible to arrange a work schedule.
|Skye Aaron||Kelsey Lucas|
|Raphael Deem||Trey Sands
|Andrew Dubay||Robert Kahn|
|Tian Jiang||Li Zha|
|Cori Savaiano||Suraj Pant|
|David Krueger||Erik Swanson|
|Ethan Knudson||Thomas Verghese|
|Tyrone Lee||Justin Stewart|
|Luis López||Nina Showell|
Exercise 1: Getting your data
a. Downloading your data
The publication International Financial Statistics (published by the International Monetary Fund) is one of the leading sources of macroeconomic data for a vast number of countries. The electronic version of the IFS is available through the Reed library Web site under databases. <Link to list> To download your data, follow some semblance of the following steps:
- Follow the link from the library list to the IFS site.
- Set your time period by clicking on the blue Change button next to the Retrieval Period on the left. Set the frequency to monthly and the time period to begin in January 1974.
- Click Country Tables from the list (the upper one) on the left of the screen. Select your country from the box at the right. For best results, you should probably select one of the more advanced countries that has fairly complete and reliable data. Try not to select a country that others have taken, though you probably won't know for sure what everyone else is doing. If you initially select a country and then find that it is difficult to work with, you can change.
- Next you need to check the boxes next to the data series that you want to download. At the minimum, you should select a short-term interest rate (Treasury bill rate is ideal) from the Interest Rates section and the consumer price index, industrial production, and the unemployment rate from the Prices, Production, Labor section. You may add more variables to these four if you wish. For simplicity, select the seasonally adjusted version of series if they are available.
- When you have selected the series that you want, click the blue + button on the right at the top of the list to add them to your download list. (You can add more by selecting them and clicking + again; the process is cumulative.)
- Click the Retrieve button to begin the download of all selected series. For easy conversion into Stata, I use the Excel file format option. Type a name into the Description box so that you can come back to this download if you need to. (Include your own name in the description because Reed has a single account: all recent downloads at Reed will show up on the list.) When you are set, click the Retrieve button again.
- A list of recently created files will appear, including yours. Click on your file name to download your file.
- Open the downloaded file in Excel. Check to make sure that you don't have large blocks of missing data. It is not unusual for some variables to be unavailable ("n.a.") for earlier years. That probably means that the data series changed since 1974 and that data for the early sample years are not available in a form that is comparable with current data. You are probably OK if you have complete data for a variable since about 1990. If there are data gaps since 1990 or if you are generally unsatisfied with data availability, then go back to the IFS and look for a substitute series. For example, the number unemployed could substitute for the unemployment rate (you might want to download the population series and divide by that to eliminate the trend associated with population growth), another short-term interest rate (commercial paper) could substitute for the Treasury bill rate, or another price index could be used in place of the CPI. If you have questions or problems at this point, email me right away with detailed information about what you can and cannot get.
b. Cleaning your data
Now that you have your dataset in Excel format, you need to clean it up and bring it into Stata. The downloaded IFS spreadsheet is organized with the series in rows rather than columns and has lots of gobbledy-gook in the first 6 columns. check to make sure that the units of measurement make sense and that all series pertain to your intended country, then you can probably delete columns A through F.
Next you want to transpose the data matrix so that the variables are in columns and the months in rows. Select the block of cells that contain data (I would include the dates for safety) and copy them to the clipboard. Then find an empty block of spreadsheet cells with two empty columns to the left of it and one empty row about it (perhaps column C below your data or on another sheet). Put the cursor in the upper left corner of this empty block of cells (perhaps cell C12?) and find the command "Paste Transpose" in the menus. (It will be in different places depending on your version of Excel.) Now your data should be in columns rather than rows.
IFS puts "n.a." in cells that contain no data. Stata will interpret this as a (text) value, so you must get rid of it. Use the Excel "find and replace" feature to replace "n.a." with nothing.
There are two more things that you'll want to do before transferring your data to Stata. First, add Stata-valid variables names in the empty row about the data. Second, add year and month variables in the two empty columns to the left of the data.
Here's an easy way to create the year and month variables:
- Type the first year number into the January cell and copy it to the other 11 months.
- Then go to the January cell of the second year, type the equal sign, click the cursor on the cell 12 rows above that contains the first year number, then type + 1 and enter. The second-year-January cell should now contain the proper year---one more than the January-first-year cell.
- Now go to the month variable for January of the first year and type 1. Drop down to February and type the equal sign, click the cursor on the January cell one cell above, then type + 1 and enter. This cell should now contain the number 2: the cell immediately above plus one.
- Copy the February cell into the 10 cells directly below for March through December. They should now be 1 through 12 corresponding to the months.
- Now go to the January-second-year cell for the month variable, type =, click on the January-first-year cell 12 cells above, and hit enter. This should now be a one (= to the cell 12 above).
- Now, finally, select the year and month cells for January-second-year, copy, and paste them into all the remaining cells of these two columns.
- Check to make sure, but you should not have accurate year and month numbers for the entire dataset without having to type them in repeatedly.
- Once you have the year and month variables and are sure they are accurate, you can delete the IFS date column from the spreadsheet.
Just to check, your spreadsheet should now have the variables year, month, and all of the ones you downloaded in columns, with variable names above each column of data and any missing data shown as empty cells.
c. Transferring the data to Stata
Now you should be ready for the easiest step of all: moving the data from Excel to Stata.
- Open an empty Stata dataset and click the "edit" button (the one that looks like a spreadsheet with a pencil) on the button bar to open the (empty) data spreadsheet.
- Go to your Excel data spreadsheet. Select the block of cells containing all of the data, including the columns with the month and year variables and the row of variables names at the top, and copy it to the clipboard.
- Now go to the Stata data editor, click the top left cell, and paste. Stata will ask you whether your pasted data have variable names in the top row: tell the truth.
- At this point, you should have all your data in Stata and almost ready to use.
- Are there any Stata data cells that are displayed in red. If so, this variable was read as text rather than as numbers, perhaps because of a stray "n.a." in the spreadsheet. You'll need to fix this before you do any analysis.
- Are any missing-data cells shown with a single "." in them. That is correct and means that Stata has understood that these are missing values.
If all is well, then your dataset is ready.
d. Setting up Stata to do the analysis
In order for Stata properly to understand the time-series structure of your dataset, you should create a new variable "date" that incorporates the month/year information into a single variable.
- First, create the date variable with gen date=ym(year, month), where month and year are the variables containing this information and ym is a built-in Stata function designed for exactly this purpose.
- Next, set the format of this variable to display properly by typing format date %tm with %tm being the Stata format for monthly dates.
- Finally, type tsset date to inform Stata that the data are time series and that the date variable is the appropriate time indicator.
Now you should be ready to proceed with the time-series econometric analysis.
Exercise 2: Assessing the stationarity of your variables
a. Transformations and descriptive statistics
For variables such as CPI, industrial production, employment/unemployment numbers (not rates), it is conventional to take logs before analyzing them. That is because the differences of the logged variables are growth rates, which are both more interesting and also more plausibly stationary than the differences of levels.. One typically does not take logs of interest rates or other variables that are already pure percentages (such as the unemployment rate). Take the logs of any variables that seem appropriate
In addition to the means and standard deviations that are printed out by the summarize command, with time-series data we are usually interested in the autocorrelations of the series. The Stata command corrgram will calculate and graph the autocorrelations ("correlogram") of a series. Highly persistent series, which may have a root equal to or near unity, have autocorrelations that decay very slowly to zero. Calculate the correlograms of each of your variables and of the first difference of each variable. For which ones does the correlogram suggest that nonstationarity may be a concern?
b. Testing the series for stationarity
Determine the order of integration for each of your series using the augmented Dickey-Fuller test (dfuller in Stata) and the Dickey-Fuller GLS (dfgls) test. Explain your conclusions, being careful to remember what the null and alternative hypotheses for these tests are.
Exercise 3: Examining causality using VARs
a. Interest rates and industrial production
Short-term interest rates are often an instrument of monetary policy, so if monetary policy affects output we would expect the interest rate to have a causal influence on industrial production. However, industrial production may also influence short-term interest rates either through its effect on credit markets in general or by causing changes in monetary policy. Estimate a bivariate VAR (Stata command var) with the short-term interest rate and the growth rate (difference of log) of industrial production. Use an appropriate criterion to choose your lag length, keeping in mind that with montly data a 12th lag may be important even if not all intervening lags are. After completing your estimation, use the vargranger command to perform Granger causality tests using your VAR. What do you conclude?
b. Causality in a larger VAR
Now examine the bivariate causality relationships in a VAR that includes price inflation (change in log of price index) and the unemployment rate along with the interest rate and industrial-production growth. Why is this test of the causal relationship between the interest rate and IP growth different (in theory) from the one in part (a)? Are your results different? What do you find for the bivariate causal relationships among the other pairs of variables in the larger system?
Exercise 4: Impulse-response functions
a. Establishing an identification ordering
In order to interpret the error terms of the VAR equations as structural shocks to the variables of the system, we must make identifying assumptions about which variables affect which other variables within the one-month time unit. Based on economic theory, common sense, and perhaps, cautiously, your Granger causality results (keeping in mind that Granger causality cannot detect contemporaneous causality), propose an ordering for contemporaneous causality among your four variables.
b. Computing and evaluating IRFs
Based on the ordering proposed above, calculate the impulse-response functions for each shock on the full set of variables. Run the horizon of the IRFs out far enough to examine long-run effects (perhaps 10 years or even more). Which shocks seem to have statistically significant effects on which variables in the short run or the long run? Interpret these results. Do they make macroeconomic sense?
c. Examining alternative identification assumptions
One can never be fully confident in an identification ordering in a VAR. Consider one or two plausible alternatives to the ordering that you proposed in part (a) and recompute the IRFs for these alternatives. Are your results robust or are they sensitive to the ordering? What do you conclude about the relationships among your variables?
Exercise 5: Forecasting with a VAR
a. Forecasting the Great Recession
Most of the industrial world entered a severe recession in about 2008, from which it is still attempting to recover. Could your VAR have forecast this recession as a natural evolution of the path that the economy was on before it hit or was this recession due to shocks that could not have been predicted at the end of 2007? Use the fcast command to generate forecasts for the variables of your preferred VAR system starting in January 2008. Compare your forecasts to the actual behavior of the variables from january 2008 to the end of your sample. How well does your VAR forecast? Interpret your results.
b. Forecasting the recovery
Economists and politicians, among others, are keenly interested in knowing how the economy with evolve during 2010 to 2015. Use your preferred VAR specification to make forecasts starting at the end of your sample and going out at least to 2015. Describe your results, focusing in some detail on the predicted paths of industrial output, unemployment, and inflation.
Exercise 6: Just for fun
What is the pattern in the following sequence?
8, 5, 4, 9, 1, 7, 6, 10, 3, 2