Mapping Techniques to Maintain Data Frame Consistency in Base R

Introduction

Maintaining data frame consistency within Base R can be difficult. The library purrr1 from the tidyverse solves this problem with its map_df() function. However, we can achieve similar results and expand upon them with base R functions. To do so, two methods will be used.

Method 1: Use lapply(), data.frame(), and do.call()

To replicate purrr’s map_df(), we use three functions: lapply() to apply the function to some data; data.frame() to convert the output to a data frame; and do.call() to iteratively make said conversion.

apply_df <- function(x, f, ...) {
  
  # 1. Apply the function. 
  ## Use other inputs if necessary.
  apply_it <- lapply(x, f, ...)
  
  # 2. Combine the elements into a data frame.
  output   <- do.call(data.frame, apply_it)
  
  # 3. Return the result.
  output
  
}

Let’s test it out!

The case of a single function

apply_df(mtcars, mean)
##        mpg    cyl     disp       hp     drat      wt     qsec     vs      am
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625
##     gear   carb
## 1 3.6875 2.8125

The case of two functions

What if we used two functions inside of apply_df()? With purrr’s map_df(), we would obtain two rows without rownames to identify the functions being used:

# install.packages('purrr') # Install beforehand.

# Let's create a mean-SD function.
msd <- function(x) {
 
  c(mean = mean(x), sd = sd(x)) 
  
}

# Then let's use map_df() and msd() on mtcars.
purrr::map_df(mtcars, msd)
## # A tibble: 2 x 11
##     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20.1   6.19  231. 147.  3.60  3.22  17.8  0.438 0.406 3.69   2.81
## 2  6.03  1.79  124.  68.6 0.535 0.978  1.79 0.504 0.499 0.738  1.62

With our new apply_df() function, the rownames are maintained:

apply_df(mtcars, msd)
##            mpg      cyl     disp        hp      drat        wt      qsec
## mean 20.090625 6.187500 230.7219 146.68750 3.5965625 3.2172500 17.848750
## sd    6.026948 1.785922 123.9387  68.56287 0.5346787 0.9784574  1.786943
##             vs        am      gear   carb
## mean 0.4375000 0.4062500 3.6875000 2.8125
## sd   0.5040161 0.4989909 0.7378041 1.6152

Method 2: Simply use as.data.frame() and apply()

What if we wanted to apply the function row-wise? Let’s rewrite our function to take into account whether the user wants to apply a function either row-wise or column-wise.

apply_df2 <- function(x, m, f, ...) {
  
  # 1. Apply the function. 
  ## Use other inputs if necessary.
  apply_it <- apply(x, m, f, ...) 
  
  # 2. Convert to data frame.
  ## This conversion depends on the "m" input.
  output   <- as.data.frame(apply_it)
  
  # 3. Return the result.
  output
  
}

Let’s test it out! We’ll output only the first three results for the following examples:

apply_df2(mtcars, 1, msd)[, 1:3]
##      Mazda RX4 Mazda RX4 Wag Datsun 710
## mean  29.90727      29.98136   23.59818
## sd    53.53888      53.51210   38.86999
apply_df2(mtcars, 2, msd)[, 1:3]
##            mpg      cyl     disp
## mean 20.090625 6.187500 230.7219
## sd    6.026948 1.785922 123.9387

Conclusion

So, in conclusion, we can maintain data-frame consistency in two ways:

  1. Combine lapply(), data.frame(), and do.call(); or

  2. Simply use as.data.frame() over our apply() call.

The second method is useful for row-wise outputs. Try these functions out and expand upon them!


  1. purrr.tidyverse.org/↩︎