Stata Help

Remove cases or variables from datasets

Sometimes, especially in instances where you are acquiring the data set from a third party or doing new analyses on old data, extraneous cases or variables can cause unnecessary clutter. Stata makes it very easy to drop such clutter. This can also be useful if you accidentally make a variable you don't want to. In either case, the basic command is simply drop

From there, you can specify what precisely you want Stata to drop. For example, if I was using an old dataset with three conditions and the third condition was not relevant to my latest batch of analyses, I might tell Stata to drop if condition>2 When you do this, the Results window will display, in green, the number of cases dropped. If you used a criterion that does not apply to any of your variables, Stata simply drops 0 cases.

To drop entire variables, simply type drop [variable name]

If you have a giant data set and only want to keep one or two variables, it is a lot faster to do the opposite command. In this case you can also tell Stata to keep [list of variables] This will cause Stata to only retain the specified variables. 

You'll notice from the examples above that the basic commands drop and keep took both less than and greater than type modifiers (as well as if). In addition to these modifiers, they can use the double equal sign == which will tell Stata to drop or keep cases where a certain variable is equal to the specified value. For example drop if condition==3 would also cause Stata to delete every case that belonged to the third condition.

In addition to things like if, by, and ==, you can use the * as a wildcard. Thus if you had five variables belonging to the sq scale and you no longer wanted any of them, you could tell Stata to drop sq* which would cause Stata to drop every single variable with a name that began sq.

Note: This command is not reversible. One you have dropped a variable, or only kept a certain set of variables, it's gone. It is recommended that before you begin to drop or add variables, you make sure that you have a complete set with all the data in it saved. Then, you probably want to save a new copy with a name like "altered" or "dropped" so that you don't have to reimport your data or variables if you mess up.