Data @ Reed

Reshaping datasets

Depending on your goal in your analysis or visualization, you may need your data in “long” format or in “wide” format. This is illustrated below, using the 1990, 2000, and 2010 US Census data.

Example of long format:

     STATE_NAME  YEAR    POPULATION 

     CA          1990     29,760,021     
     CA          2000     33,871,648
     CA          2010     38,802,500
     OR          1990      2,842,321
     OR          2000      3,421,399
     OR          2010      3,831,074
    

Example of wide format:

     STATE_NAME  POP1990     POP2000     POP2010

     CA          29,760,021  33,871,648  38,802,500
     OR           2,842,321   3,421,399   3,831,074

Stata is able to convert data back and forth between these two formats. Here is their generalized example from the Stata documentation:

from http://www.stata.com/manuals13/dreshape.pdf

If you started with the wide format example dataset and you wanted to convert from wide format to long format, you would use the following code:

reshape long POP, i(STATE_NAME) j(YEAR)

That code tells Stata to reshape the data to long, and to construct a dataset that will contain values of STATE_NAME (groups observations together), POP (the values currently listed out yearly) and a new variable YEAR (formed from the information in the variable name itself). Note that Stata is looking at the variable name (e.g. POP1990, POP2000) to create the grouping for YEAR.

If you started with the long format example dataset and you wanted to convert from long format to wide format, you would use the following code:

reshape wide POP, i(STATE_NAME) j(YEAR)