next up previous
Up: Introduction to S Previous: Data Structures

Data Frames

Data frames are S objects (data structures) which combine features of matrices and lists, that is a list of variables all containing the same number of observations. Typically, the variables are sets of measurements on a collection of cases, so that each row of the data frame is the set of measurements for one case (subject), and each column is the set of measurements for all subjects on a single variable.

Creating Data Frames

Subscripting

A data frame may be subscripted as if it were a matrix object:
	X["row",]  # select the row labeled "row"
	X[,"col"]  # select the column labeled "col"
	X[2,]      # select row 2
	X[,3]      # select column 3
	X[2,3]     # select the element in row 2, column 3.

Attaching Data Frames

It is often useful to use the attach command to facilitate access to the columns of a data frame directly by column name. Suppose the data frame X has column names "A" and "B":
	attach(X)
	plot(A,B)
vs.
	plot(X[,"A"],X[,"B"])
Note: if there is a variable in the .Data directory named "A", then the reference to "A" will select it, rather than the desired column of the data frame, unless the search order is modified. To "detach" a data frame, use the "search" function to find the posistion of the data frame in the search list, and then use the "detach" function to remove it from the search list.

Use in Statistical Modelling

Many S functions are set up to take advantage of the data frame format. If X is a data frame with columns "A" and "B", then
	plot(X)
has a special meaning, and
	X.lm <- lm(A ~ B,data=X)
causes S to fit a "linear model", ie. regression line to the columns of X specified.

Albyn Jones
Tue Jun 25 11:03:47 PDT 1996