Data @ Reed

Sorting, dropping, keeping data

These examples use the (long) blood pressure dataset, which you can load using the command sysuse bplong.

Sorting data

For some analyses, you may want to examine your data by groupings in the data. For the blood pressure dataset, for example, I would like to visualize the distribution of blood pressure values by age group. I need to sort the data first, and then Stata will build the histograms.

sysuse bplong

sort agegrp

hist bp, by(agegrp)

Once the data are sorted, I could also run a regression of variables of interest (bp, sex, when) by the sorted group (agegrp)

by agegrp: reg bp sex when

Dropping data

Perhaps you are only interested in a certain subset of your data. You could analyze the data using conditional statements and keep your dataset intact. This example would look at the effect of sex and when the measurement was taken on blood pressure, within the age group 1

reg bp sex when if agegrp == 1 

However, if you are only interested in working with a subset of your data for a set of analyses, you could drop all other data. You can do this piece by piece:

drop if agegrp == 2

drop if agegrp == 3

... or combine those statements using a pipe (meaning "OR", the "|" is above your return key)

drop if agegrp == 2 | agegrp == 3 | agegrp == 4

... or use a 'does not equal' statement to save yourself time and keystrokes.drop if agegrp != 1

Keeping data

Sometimes rather than telling Stata what to drop, it is faster to specify what to keep. This works as you would expect (the opposite of drop). So, if I wanted to keep all cases within age group 1:

keep if agegrp == 1