Stata Help

Basic Data Entry in Stata

Neither Stata nor I recommend that you do your data entry in Stata. Stata requires that you hit enter after entering or changing a cell, which means that entering data in a line by line fashion is a relatively painful activity. In fact, you are probably much better off doing the initial data entry in a spreadsheet program (such as Excel or Numbers), then moving it into Stata, then cleaning them up. However, assuming your dataset is fairly small, you could certainly enter the data in Stata.

Data entry in Stata is fairly unintuitive. To bring up an empty spreadsheet, you can type edit into the Command window. You can also generate a new variable and then change the values. For example, if I typed generate newvar=1 and then edit there would now be a blank column under the variable newvar to which I could add my data. You can also do data entry through the input command. Click here for an in-depth explanation of that.

If you start typing data into the spreadsheet or without specifying a type, Stata will guess. Stata has two different ways of storing data, though numeric data has many subtypes. The major different ways are 1) string and 2) numeric.

String variables are how Stata will store columns with nonnumeric characters. Sometimes when you import your data from another program, such as one of those mentioned above, what would be a numeric variable is imported as a string variable because of leading or trailing spaces. A variable that is being stored as a string will have a type that begins str in the Variables window. See here for more information about turning string variables to numeric variables.

Numeric variables have five different storage methods: byte, int, long, float and double. Additionally, Stata provides nine ways to encode the date and time numerically. Whichever of the five types Stata uses depends on the number of digits. Byte is used for three digits, int for 4 or 5, long for up to 10, float for up to 50, and double for up to 318. Often when you import data, Stata will incorrectly give a variable more bytes than it needs (which bloats the data set), so it is always a good idea to run the compress command, which compresses the data to the smallest encoding format that still contains the information. To learn more about data entry and how Stata handles data, see one of the following:

Back to Tutorials