New Package and Book: dm, Statistical Data Management Tools for R

Introduction

A couple years ago, I started a writing a package on Github that was inspired by the data managment functionalities in other statistical software such as Stata and SPSS. I got distracted by life, especially with work, and I practically stopped developing the package in 2019. This year, however, I finally sat down and finished developing this package, R documentation and all: the end-result was dm.

You can read the documentation on this package as a Gitbook online.

Installation

Currently, this package is only available on Github, so please use devtools to install this package.

if (!require(devools)) {
  
  install.packages('devtools')
  library(devtools)
  
}

install_github('robertschnitman/dm')
library(dm)

Examples

recode()

The recode() function presents a way to recode variables differently than SPSS’s method.

mtcars$am
##  [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
recode(mtcars$am, 0:1, 2:3)
##  [1] 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 2 2 2 2 2 3 3 3 3 3 3 3

numNA()

As inspired by Stata, numNA() counts the number of missing values in a dataset; rowNA() counts the number of missing values by row; and colNA() counts the number of missing values by column.

numNA(airquality) # Total number of missing values.
## [1] 44
rowNA(airquality) # Number of missing values by row.
##   [1] 0 0 0 0 2 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 0 0 0 0 1 1 1 1 1 1
##  [38] 0 1 0 0 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0
##  [75] 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 0 0 0
## [112] 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 1 0 0 0
colNA(airquality) # Number of missing values by column.
##   Ozone Solar.R    Wind    Temp   Month     Day 
##      37       7       0       0       0       0

load_libraries()

The load_libraries() function tests whether a set of libraries has been installed: if not, then it installs and loads them; if already installed, then the function loads the libraries as intended.

load_libraries(c('tidyverse', 'ggformula', 'abind'))

References

Kent State University. SPSS Recode. https://libguides.library.kent.edu/SPSS/RecodeVariables

Stata. missing(). https://www.stata.com/manuals13/m-5missing.pdf