New Package: stringops, String-Processing Tools and Synonyms for R

Introduction

When creating syntax, one has to ask themselves about the naming scheme: should I make the functions short for typing efficiency, or long for increased readability? Ruby has the former benefit, but sometimes the methods can be difficult to remember (e.g. is it len or length? Is it swapcase or swap_case?), as there isn’t a consistent naming scheme–however, some functions have synonyms to help those from other programming languages learn Ruby faster (e.g. reduce and inject do the same thing). On the other hand, the stringr library has a consisent naming scheme for its functions, but does not have synonyms, so you are forced to learn the stringr way. Thirdly, and perhaps tagentially, R does not have concatenation operator (only functions) like in Ruby and BASIC, which is odd, as many situations require concatenation; so using the paste/paste0() functions can make code less readable. As such, I am introducing a new package to take these considerations into account: stringops, a work-in-progress library consisting of tools for processing strings in R.

What this package brings are (1) a consistent naming-scheme for functions, (2) synonyms for said functions, and (3) a concatenation operator. The first item benefits users of all skill levels, as it makes certain functions easier to remember while making use of RStudio’s predictive text. The second item is useful when one tires of typing string_cull(), for example, and wishes to use a shorthand to simplify the code (in this case, the shorthand would be cull()). The third item’s benefit is more readable code by avoiding the function syntax of paste/paste0(). Ultimately, these items will hopefully make processing strings in R more fun for the user!

The following sections detail the installation procedure and a sample of functions from this package. The documentation for this library is at https://rs-stringops.netlify.app/.

Installation

This package currently is only available on GitHub–there are no plans to submit this package to CRAN at this time. As such, please use the devtools library to install stringops.

# install.packages('devtools')

devtools::install_github('robertschnitman/stringops')
library(stringops)

%&%

As inspired by AutoIt, the %&% (concatenation) operator joins two strings together. This operator facilitates readable code by avoiding the function syntax of the paste/paste0().

'a' %&% 'b'
## [1] "ab"
"Car: " %&% rownames(mtcars)
##  [1] "Car: Mazda RX4"           "Car: Mazda RX4 Wag"      
##  [3] "Car: Datsun 710"          "Car: Hornet 4 Drive"     
##  [5] "Car: Hornet Sportabout"   "Car: Valiant"            
##  [7] "Car: Duster 360"          "Car: Merc 240D"          
##  [9] "Car: Merc 230"            "Car: Merc 280"           
## [11] "Car: Merc 280C"           "Car: Merc 450SE"         
## [13] "Car: Merc 450SL"          "Car: Merc 450SLC"        
## [15] "Car: Cadillac Fleetwood"  "Car: Lincoln Continental"
## [17] "Car: Chrysler Imperial"   "Car: Fiat 128"           
## [19] "Car: Honda Civic"         "Car: Toyota Corolla"     
## [21] "Car: Toyota Corona"       "Car: Dodge Challenger"   
## [23] "Car: AMC Javelin"         "Car: Camaro Z28"         
## [25] "Car: Pontiac Firebird"    "Car: Fiat X1-9"          
## [27] "Car: Porsche 914-2"       "Car: Lotus Europa"       
## [29] "Car: Ford Pantera L"      "Car: Ferrari Dino"       
## [31] "Car: Maserati Bora"       "Car: Volvo 142E"

string_cull()

As inspired by Ruby, the function string_cull() culls (or extracts) pattern matches from strings–if a pattern is not found, then NA is returned. The synonyms for this function are s_cull(), and cull(). To get all matches, you can set get_all = TRUE.

# extract beginning "M" from each element.
string_cull(rownames(mtcars), '^M')
##  [1] "M" "M" NA  NA  NA  NA  NA  "M" "M" "M" "M" "M" "M" "M" NA  NA  NA  NA  NA 
## [20] NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  "M" NA
# extract beginning "M" and any "a".
# By default, multiple matches are separated by a comma.
string_cull(rownames(mtcars), "^M|a", get_all = TRUE, collapse = ',')
##  [1] "M,a,a"   "M,a,a,a" "a"       NA        "a"       "a,a"     NA       
##  [8] "M"       "M"       "M"       "M"       "M"       "M"       "M"      
## [15] "a,a"     "a"       "a"       "a"       "a"       "a,a"     "a,a"    
## [22] "a"       "a"       "a,a"     "a"       "a"       NA        "a"      
## [29] "a,a"     "a"       "M,a,a,a" NA

string_spot()

The function string_spot() acts similar to grep(string, x, value = TRUE): it subsets a vector to found pattern matches, returning the full element. The synonyms for this function are s_spot(), and spot(). Optional inputs get passed to grep().

# Get elements that start with an "M".
string_spot(rownames(mtcars), '^M')
##  [1] "Mazda RX4"     "Mazda RX4 Wag" "Merc 240D"     "Merc 230"     
##  [5] "Merc 280"      "Merc 280C"     "Merc 450SE"    "Merc 450SL"   
##  [9] "Merc 450SLC"   "Maserati Bora"
# Get elements that do NOT start with an "M".
string_spot(rownames(mtcars), '^M', invert = TRUE)
##  [1] "Datsun 710"          "Hornet 4 Drive"      "Hornet Sportabout"  
##  [4] "Valiant"             "Duster 360"          "Cadillac Fleetwood" 
##  [7] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [10] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [13] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [16] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [19] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [22] "Volvo 142E"

Other variations of string_spot() are

  • string_spoti(), which returns the indices of matches (optional inputs get passed to grep());

  • string_spotl(), which returns TRUE if a match is found and FALSE otherwise (optional inputs get passed to grepl()); and

  • string_spotm(), which returns the element in full if a match is found and NA otherwise (optional inputs get pased to grepl().

The three functions above have the s_*()/*() synonym pattern, as well.

# Get indices of matches.
string_spoti(rownames(mtcars), '^M')
##  [1]  1  2  8  9 10 11 12 13 14 31
# Detect whether a match is found.
string_spotl(rownames(mtcars), '^M')
##  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
# Get matching element in full.
string_spotm(rownames(mtcars), '^M')
##  [1] "Mazda RX4"     "Mazda RX4 Wag" NA              NA             
##  [5] NA              NA              NA              "Merc 240D"    
##  [9] "Merc 230"      "Merc 280"      "Merc 280C"     "Merc 450SE"   
## [13] "Merc 450SL"    "Merc 450SLC"   NA              NA             
## [17] NA              NA              NA              NA             
## [21] NA              NA              NA              NA             
## [25] NA              NA              NA              NA             
## [29] NA              NA              "Maserati Bora" NA

Conclusion

Hopefully, this package’s API, synonyms, and concatenation operator make processing strings in R more fun for you! You can find source code for these functions and more at the Github repository, stringops.