Introduction
Maintaining data frame consistency within Base R can be difficult. The library purrr
1 from the tidyverse
solves this problem with its map_df()
function. However, we can achieve similar results and expand upon them with base R functions. To do so, two methods will be used.
Method 1: Use lapply()
, data.frame()
, and do.call()
To replicate purrr
’s map_df()
, we use three functions: lapply()
to apply the function to some data; data.frame()
to convert the output to a data frame; and do.call()
to iteratively make said conversion.
apply_df <- function(x, f, ...) {
# 1. Apply the function.
## Use other inputs if necessary.
apply_it <- lapply(x, f, ...)
# 2. Combine the elements into a data frame.
output <- do.call(data.frame, apply_it)
# 3. Return the result.
output
}
Let’s test it out!
The case of a single function
apply_df(mtcars, mean)
## mpg cyl disp hp drat wt qsec vs am
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625
## gear carb
## 1 3.6875 2.8125
The case of two functions
What if we used two functions inside of apply_df()
? With purrr
’s map_df
(), we would obtain two rows without rownames to identify the functions being used:
# install.packages('purrr') # Install beforehand.
# Let's create a mean-SD function.
msd <- function(x) {
c(mean = mean(x), sd = sd(x))
}
# Then let's use map_df() and msd() on mtcars.
purrr::map_df(mtcars, msd)
## # A tibble: 2 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81
## 2 6.03 1.79 124. 68.6 0.535 0.978 1.79 0.504 0.499 0.738 1.62
With our new apply_df()
function, the rownames are maintained:
apply_df(mtcars, msd)
## mpg cyl disp hp drat wt qsec
## mean 20.090625 6.187500 230.7219 146.68750 3.5965625 3.2172500 17.848750
## sd 6.026948 1.785922 123.9387 68.56287 0.5346787 0.9784574 1.786943
## vs am gear carb
## mean 0.4375000 0.4062500 3.6875000 2.8125
## sd 0.5040161 0.4989909 0.7378041 1.6152
Method 2: Simply use as.data.frame()
and apply()
What if we wanted to apply the function row-wise? Let’s rewrite our function to take into account whether the user wants to apply a function either row-wise or column-wise.
apply_df2 <- function(x, m, f, ...) {
# 1. Apply the function.
## Use other inputs if necessary.
apply_it <- apply(x, m, f, ...)
# 2. Convert to data frame.
## This conversion depends on the "m" input.
output <- as.data.frame(apply_it)
# 3. Return the result.
output
}
Let’s test it out! We’ll output only the first three results for the following examples:
apply_df2(mtcars, 1, msd)[, 1:3]
## Mazda RX4 Mazda RX4 Wag Datsun 710
## mean 29.90727 29.98136 23.59818
## sd 53.53888 53.51210 38.86999
apply_df2(mtcars, 2, msd)[, 1:3]
## mpg cyl disp
## mean 20.090625 6.187500 230.7219
## sd 6.026948 1.785922 123.9387
Conclusion
So, in conclusion, we can maintain data-frame consistency in two ways:
Combine
lapply()
,data.frame()
, anddo.call()
; orSimply use
as.data.frame()
over ourapply()
call.
The second method is useful for row-wise outputs. Try these functions out and expand upon them!
purrr.tidyverse.org/↩︎