A first glance at your data
Once you have loaded your data in Stata, you need to get a sense of your dataset before you move forward with analysis or visualization. (I invite you to re-read that sentence for emphasis.)
Before you do anything else, look at your data.
In the data browser, numeric variables will be black. String variables (text/non-numeric) variables will be red. Any data that is blue has been labeled, meaning that Stata "sees" the underlying (usually numeric) data but you as user see the more human-friendly labeled data. (Example: variable for school_year may have values 1,2,3,4 but labeled as "first-year", "sophomore", "junior", "senior".)
Look at your data and make sure that the values make sense to you. Are there extra variables or cases because of a data import issue? Are your data formatted as you would expect? Once you have given your data a once-over, you may want to look at some basic summary statistics. (Note: to save keystrokes, you can type "br" instead of the full word "browse".)
All of the below examples use the built-in "cancer" dataset. Load this dataset with the command sysuse cancer.
returns number of observations, mean, standard deviation, minimum and maximum values for either a variable or the whole dataset. Specify "detail" option for percentiles, variance, skewness, and kurtosis.
summarize age, detail
returns variable name, type of variable (storage type), display format, and information on labels (value, variable). Can be used on one variable or the entire dataset.
returns an extremely rough histogram of data (recommendation: use a separate command, hist, for a more clear graph) as well as counts of the number of observations and how many observations are integer, non-inteteger, and missing values. Can be used on one variable or the entire dataset.
returns the variable name, frequency of each value, and percentage of the dataset represented by that value
tabulate [variable1] [variable2]
a cross-tabulation of your data can be useful to see how variables are related/spread across categories. Example
tab died drug
if a codebook is associated with your dataset (possible if you are working with a well-curated Stata dataset), this command will tell you the data type, range, what units are being used, how many unique values your dataset contains, the number of missing values, as well as the mean, standard deviation, and percentiles for your dataset. Can be used on one variable or the entire dataset.