mop(): a New Way to sweep()

Introduction

This blog post will compare sweep() and a function I’ve created called mop(). I argue that the latter is preferred over the former, as it is more concise in nature.

The Old Way: sweep()

The function sweep()1 allows one to process data based on a summary statistic function–for example, dividing each element by a column’s mean. A problem, however, arises: you are required to explicitly state the summary statistic value in the STATS input. So, taking our example of dividing each element by the respective column’s average, we would have to do something along the lines of the following:

sweep(mtcars, 2, sapply(mtcars, mean), `/`)

Because this way is quite verbose, I have written a function back in 2018 called mop()2 to handle this issue.

The New Way: mop()

Essentially, mop() is a wrapper for sweep(x, MARGIN, apply(...), FUN). This function is useful for indexing variables by their means, for example, so that the magnitude of a value relative to its average is known.

The four required arguments are x, m, s, and f–the collection (e.g. matrix), margin (1 for row-wise or 2 for column-wise), summary statistic function, and binary opertaor function, respectively. A fifth, optional argument ... passes to sweep(). The output is typically a matrix or dataframe, depending on the inputs and functions being passed.

mop <- function(x, m, s, f, ...) {
  
  # 1. Check inputs.
  f <- match.fun(f)
  s <- match.fun(s)
  
  diml <- length(dim(x))
  if (!diml) {

    stop('dim(x) must have a positive length. 
         Please make sure x is 2D!')
    
  }
  
  if (!any(m == 1:2)) {
   
    stop('The m (margin) input must either 
         be 1 (row-wise) or 2 (column-wise).') 
    
  }
  
  # 2. Sweep out the summary statistic function
  ## apply() allows us to control for margins.
  summ_stats <- apply(x, m, s) 
  
  output <- sweep(x, m, summ_stats, f, ...)
  
  # 3. Output should be 2D.
  ## its class should be the same as x.
  output
  
}
head(mop(mtcars, 2, mean, `/`))
##                         mpg       cyl      disp        hp      drat        wt
## Mazda RX4         1.0452636 0.9696970 0.6934756 0.7498935 1.0843688 0.8143601
## Mazda RX4 Wag     1.0452636 0.9696970 0.6934756 0.7498935 1.0843688 0.8936203
## Datsun 710        1.1348577 0.6464646 0.4680961 0.6340009 1.0704666 0.7211128
## Hornet 4 Drive    1.0651734 0.9696970 1.1182295 0.7498935 0.8563733 0.9993006
## Hornet Sportabout 0.9307824 1.2929293 1.5603202 1.1930124 0.8758363 1.0692361
## Valiant           0.9009177 0.9696970 0.9752001 0.7158074 0.7673994 1.0754526
##                        qsec       vs       am      gear      carb
## Mazda RX4         0.9221934 0.000000 2.461538 1.0847458 1.4222222
## Mazda RX4 Wag     0.9535682 0.000000 2.461538 1.0847458 1.4222222
## Datsun 710        1.0426500 2.285714 2.461538 1.0847458 0.3555556
## Hornet 4 Drive    1.0891519 2.285714 0.000000 0.8135593 0.3555556
## Hornet Sportabout 0.9535682 0.000000 0.000000 0.8135593 0.7111111
## Valiant           1.1328524 2.285714 0.000000 0.8135593 0.3555556
# == head(sweep(mtcars, 2, apply(mtcars, 2, mean), `/`))

Conclusion

In conclusion, mop() offers a superior version to sweep() by having the user pass a summary statistic function rather than an explicit summary statistic value. As such, the former function should be preferred over the latter one.

You can find this function and other functionals from my package, afp (Applied Functional Programming).3