mutate()Another common data wrangling task is to create a new variable, using function mutate(). When creating a new variable, you provide a name for the new column and a method for calculating the new value.
Continuing with the penguins data from palmerpenguins, the code below creates a new column for the mean body mass in kilograms:
penguins %>%
mutate(body_mass_kg = body_mass_g / 1000)
## # A tibble: 344 × 9
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 3 more variables: sex <fct>, year <int>,
## # body_mass_kg <dbl>
The syntax for mutating a column follows the pattern of mutate(new_column_name = expression), where expression is some sort of instruction for combining values in existing columns. In the above example, new_column_name is body_mass_kg, and expression is body_mass_g / 1000.
Perhaps you realized that all flipper measurements were 4 mm short of the true length; you could use mutate() to adjust the data:
penguins %>%
mutate(flipper_length_mm = flipper_length_mm + 4)
## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <dbl> <int>
## 1 Adelie Torgersen 39.1 18.7 185 3750
## 2 Adelie Torgersen 39.5 17.4 190 3800
## 3 Adelie Torgersen 40.3 18 199 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 197 3450
## 6 Adelie Torgersen 39.3 20.6 194 3650
## 7 Adelie Torgersen 38.9 17.8 185 3625
## 8 Adelie Torgersen 39.2 19.6 199 4675
## 9 Adelie Torgersen 34.1 18.1 197 3475
## 10 Adelie Torgersen 42 20.2 194 4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
You can also combine mutate() with other functions. The below code calculates total body mass of all penguins on each island.
penguins %>%
group_by(island) %>%
mutate(island_penguin_mass = sum(body_mass_g, na.rm = T))
## # A tibble: 344 × 9
## # Groups: island [3]
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 3 more variables: sex <fct>, year <int>,
## # island_penguin_mass <int>
It may be useful to give R rules for creating new variables. For example, the below code divides all penguins into flipper length categories, based on the mean flipper length of the dataset (201mm), using the case_when() function. You can think of case_when as being a multilevel if statement. Essentially, the case_when() function in the code below is saying "for each observation (row), when the variable flipper_length_mm meets a certain condition (greater than, equal to, or less than 201mm), the new column should contain the respective category: "long", "average", or "short".
penguins %>%
mutate(flipper_category =
case_when( flipper_length_mm > 201 ~ "long",
flipper_length_mm == 201 ~ "average",
flipper_length_mm < 201 ~ "short"))
## # A tibble: 344 × 9
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 3 more variables: sex <fct>, year <int>,
## # flipper_category <chr>