Data Analysis

New Book: A Short Introduction to Applied Statistical Programming in R

I have a new book in progress called A Short Introduction to Applied Statistical Programming in R, which can be viewed online as a Gitbook or as a PDF. [EDIT 2020-04-01: I will primarily focus on the Gitbook version, as I am running into some typesetting issues with the PDF at the moment.] [EDIT 2020-04-02: The Gitbook version is fairly complete and I do not foresee many major updates to it unless they are requested or if I think of anything else significant to add.

ABSTRACT: The Narrator and the Noise

Preface This blog post is simply the summary of The Narrator and the Noise. Please read the full version at either of the following locations: Gitbook version: https://rs-ddlc.netlify.com/ PDF version: https://github.com/robertschnitman/RS_Reports/blob/master/DDLC/DDLC.pdf Abstract The focus of this study is to determine the existence and extent of statistical bias towards any of the Doki Doki Literature Club characters with respect to the points distribution of the Poem Minigame.

Web Mining bankrate.com

Introduction The purpose of this blog post is to demonstrate how to web mine the bankrate.com, primarily focusing on extracting and graphing with the R programming language the APY and minimum deposits for 1-year1, 3-year2, and 5-year3 CD Rates. Setup Before the analysis, some necessary libraries will be loaded. First, tidyverse4 and magrittr5 for their data management functions; second, flextable6 for table formatting; third, rvest7 for web mining; and fourth, plotly8 to display interactive graphs.

Data Science and Beyblades

Introduction Beyblade has proven itself to be a strong-running franchise, spanning several TV series and toys. In the shows and on the boxes of said toys, there is an emphasis on the attributes of the beyblades: their Attack, Defense, and Stamina for each component that make up the beyblade. While the validity of these statistics can be questioned, one cannot help but wonder about the relationship between these three traits among the beyblades.

Mapping Techniques to Maintain Data Frame Consistency in Base R

Introduction Maintaining data frame consistency within Base R can be difficult. The library purrr1 from the tidyverse solves this problem with its map_df() function. However, we can achieve similar results and expand upon them with base R functions. To do so, two methods will be used. Method 1: Use lapply(), data.frame(), and do.call() To replicate purrr’s map_df(), we use three functions: lapply() to apply the function to some data; data.

Presenting the Sachse, TX January 2020 Special Election Pre-runoff Results with R

Preface The contents of this blog post originate from the PDF version (https://github.com/robertschnitman/RS_Reports/blob/master/Polls/Sachse/sachse2020.pdf) and its GitBook equivalent (https://rs-sachse2020.netlify.com/). EDIT 2020-03-27: This post has been updated to use flextable instead of kableExtra to produce a cleaner table for HTML. The PDF and Gitbook reports use the latter, as it is better for PDFs. Introduction The purpose of this document is to demonstrate the utility of using the R programming language in reporting polls by walking through the process via the software itself.

mop(): a New Way to sweep()

Introduction This blog post will compare sweep() and a function I’ve created called mop(). I argue that the latter is preferred over the former, as it is more concise in nature. The Old Way: sweep() The function sweep()1 allows one to process data based on a summary statistic function–for example, dividing each element by a column’s mean. A problem, however, arises: you are required to explicitly state the summary statistic value in the STATS input.

Scatter-text Plots with Base R

To make scatter plots with text as points in Base R, we simply need to use plot(), set the scatter points to be white, and then plot the text with text(). # Trick R into not displaying points. with(mtcars, plot(wt ~ mpg, pch = 1, col = 'white', xlab = 'MPG', ylab = 'Weight', main = 'Weight vs. MPG')) # Plot the labels on the graph. with(mtcars, text(mpg, wt, row.

Using Residuals Percent in OLS Diagnostics

Many students (myself included) were taught to analyze the raw residuals when diagnosing regression models, but not in terms of percent. The benefit of the latter is that we can assess the relative magnitude of error from our regression model. To display the residuals as a percent (henceforth Residuals, %), let’s first load some necessary libraries. libs <- c('tidyverse', 'magrittr', 'ggthemes', 'gridExtra') # For each library, check if they are installed.

Welcome!

Hello, my name is Robert Schnitman, and welcome to my site! This site describes myself and the services I provide as an independent contractor, as well as a blog to record data analysis and R programming ideas. Please check out the “About”, “Curriculum Vitae”, and “Services” pages for more information! Thank you for your time!