Package 'rcamisc' reference manual

Title:	rcamisc: Plot, wrangle and stay sane
Description:	Miscellaneous helpers for plotting and staying sane.
Authors:	Ruben Arslan [aut, cre]
Maintainer:	Ruben C. Arslan <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.0
Built:	2025-02-12 03:40:55 UTC
Source:	https://github.com/rubenarslan/rcamisc

aggregates two variables from two sources into one

Description

Takes two variables with different missings and gives one variable with values of the second variable substituted where the first had missings.

Usage

aggregate2sources(
  df,
  new_var = NULL,
  var1 = NULL,
  var2 = NULL,
  remove_old_variables = TRUE
)
aggregate2sources(
  df,
  new_var = NULL,
  var1 = NULL,
  var2 = NULL,
  remove_old_variables = TRUE
)

Arguments

`df`	data.frame or variable
`new_var`	new variable name
`var1`	first source. Assumed to be new_var.x (default suffixes after merging)
`var2`	second source. Assumed to be new_var.y (default suffixes after merging)
`remove_old_variables`	Defaults to not keeping var1 and var2 in the resulting df.

Examples

cars$dist.x = cars$dist
cars$dist.y = cars$dist
cars$dist.y[2:5] = NA
cars$dist.x[10:15] = NA # sprinkle missings
cars$dist = NULL # remove old variable
cars = aggregate2sources(cars, 'dist')
cars$dist.x = cars$dist
cars$dist.y = cars$dist
cars$dist.y[2:5] = NA
cars$dist.x[10:15] = NA # sprinkle missings
cars$dist = NULL # remove old variable
cars = aggregate2sources(cars, 'dist')

Am I going mad?

Description

It's easy to attach packages that overwrite functions from other packages. Especially dplyr has a lot of conflicts with base packages, MASS and plyr. Because some of these conflicts do not always lead to error messages, sometimes just incorrect behaviour, this function exists. Don't trust your faulty memory, just check whether dplyr's (or any other package's) functions are 'on top' if you so desire.

Usage

amigoingmad(package = "dplyr", fix = TRUE, iteration = 0)
amigoingmad(package = "dplyr", fix = TRUE, iteration = 0)

Arguments

`package`	the package you want to be on top (loaded last), defaults to dplyr
`fix`	defaults to true. Detaches the desired package (without unloading) and loads it again. Won't work for base packages and can't overwrite functions that you defined yourself.
`iteration`	for internal use only, if set to 0 the function will call itself to check that it worked, if set to 1, it won't.

Examples

amigoingmad(fix = FALSE, package = 'rcamisc')
amigoingmad(fix = FALSE, package = 'rcamisc')

build a bibliography bibtex file from your lockfile

Description

Renv helps you maintain consistent package versions for a project. To be able to give due credit in a way that academics understand, it's helpful to be able to generate citations.

Usage

bibliography(
  overwrite_bib = FALSE,
  silent = FALSE,
  cite_only_directly_called = TRUE,
  lockfile_path = "renv.lock",
  bibliography_path = "bibliography.bibtex",
  cite_renv = !cite_only_directly_called
)
bibliography(
  overwrite_bib = FALSE,
  silent = FALSE,
  cite_only_directly_called = TRUE,
  lockfile_path = "renv.lock",
  bibliography_path = "bibliography.bibtex",
  cite_renv = !cite_only_directly_called
)

Arguments

`overwrite_bib`	whether to overwrite an existing bibtex file of the same name
`silent`	defaults to false. whether to cat out a nocite string to use in your header
`cite_only_directly_called`	whether to call only the packages you called yourself (default) or also their dependencies
`lockfile_path`	path to the packrat lock file to use
`bibliography_path`	path to the bibtex file to generate
`cite_renv`	whether to cite renv even if it's not loaded explicitly, defaults to the reverse of cite_only_directly_called

iterate adding ribbons to a ggplot2 plot at varying confidence levels to shade by confidence. Horribly inefficient, because smooth stat is computed every time, but flexible.

Description

iterate adding ribbons to a ggplot2 plot at varying confidence levels to shade by confidence. Horribly inefficient, because smooth stat is computed every time, but flexible.

Usage

geom_shady_smooth(
  mapping = NULL,
  data = NULL,
  stat = "smooth",
  method = "auto",
  formula = y ~ x,
  se = TRUE,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  levels = c(0.6, 0.8, 0.95),
  base_alpha = 1,
  fill_gradient = NULL,
  fill = "black",
  ...
)
geom_shady_smooth(
  mapping = NULL,
  data = NULL,
  stat = "smooth",
  method = "auto",
  formula = y ~ x,
  se = TRUE,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  levels = c(0.6, 0.8, 0.95),
  base_alpha = 1,
  fill_gradient = NULL,
  fill = "black",
  ...
)

Arguments

`mapping`	Set of aesthetic mappings created by `aes()` or `aes_()`. If specified and `inherit.aes = TRUE` (the default), it is combined with the default mapping at the top level of the plot. You must supply `mapping` if there is no plot mapping.
`data`	The data to be displayed in this layer. There are three options: If `NULL`, the default, the data is inherited from the plot data as specified in the call to `ggplot()`. A `data.frame`, or other object, will override the plot data. All objects will be fortified to produce a data frame. See `fortify()` for which variables will be created. A `function` will be called with a single argument, the plot data. The return value must be a `data.frame`, and will be used as the layer data. A `function` can be created from a `formula` (e.g. `~ head(.x, 10)`).
`stat`	defaults to smooth
`method`	Smoothing method (function) to use, accepts either `NULL` or a character vector, e.g. `"lm"`, `"glm"`, `"gam"`, `"loess"` or a function, e.g. `MASS::rlm` or `mgcv::gam`, `stats::lm`, or `stats::loess`. `"auto"` is also accepted for backwards compatibility. It is equivalent to `NULL`. For `method = NULL` the smoothing method is chosen based on the size of the largest group (across all panels). `stats::loess()` is used for less than 1,000 observations; otherwise `mgcv::gam()` is used with `formula = y ~ s(x, bs = "cs")` with `method = "REML"`. Somewhat anecdotally, `loess` gives a better appearance, but is $O(N^{2})$ in memory, so does not work for larger datasets. If you have fewer than 1,000 observations but want to use the same `gam()` model that `method = NULL` would use, then set `⁠method = "gam", formula = y ~ s(x, bs = "cs")⁠`.
`formula`	Formula to use in smoothing function, eg. `y ~ x`, `y ~ poly(x, 2)`, `y ~ log(x)`. `NULL` by default, in which case `method = NULL` implies `formula = y ~ x` when there are fewer than 1,000 observations and `formula = y ~ s(x, bs = "cs")` otherwise.
`se`	Display confidence interval around smooth? (`TRUE` by default, see `level` to control.)
`position`	Position adjustment, either as a string, or the result of a call to a position adjustment function.
`na.rm`	If `FALSE`, the default, missing values are removed with a warning. If `TRUE`, missing values are silently removed.
`show.legend`	logical. Should this layer be included in the legends? `NA`, the default, includes if any aesthetics are mapped. `FALSE` never includes, and `TRUE` always includes. It can also be a named logical vector to finely select the aesthetics to display.
`inherit.aes`	If `FALSE`, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. `borders()`.
`levels`	the confidence levels that are supposed to be displayed, defaults to 0.6, 0.8, 0.95
`base_alpha`	divided by length(levels)
`fill_gradient`	a vector of colors that has at least the same length as levels. Color each ribbon differently
`fill`	a single color for the ribbon
`...`	everything else is passed to and documented in `ggplot2::geom_smooth()`

Examples

data(beavers)
plot = ggplot2::ggplot(beaver1, ggplot2::aes(time, temp))
plot + geom_shady_smooth() + ggplot2::facet_wrap(~ day)
plot + geom_shady_smooth(fill = 'blue', levels = seq(0.05,0.95,0.1))
plot + geom_shady_smooth(size = 0.1, fill = '#49afcd', levels = seq(0.1,0.8,0.01))
plot + geom_shady_smooth(fill_gradient = c('red', 'orange', 'yellow'), base_alpha = 3)
data(beavers)
plot = ggplot2::ggplot(beaver1, ggplot2::aes(time, temp))
plot + geom_shady_smooth() + ggplot2::facet_wrap(~ day)
plot + geom_shady_smooth(fill = 'blue', levels = seq(0.05,0.95,0.1))
plot + geom_shady_smooth(size = 0.1, fill = '#49afcd', levels = seq(0.1,0.8,0.01))
plot + geom_shady_smooth(fill_gradient = c('red', 'orange', 'yellow'), base_alpha = 3)

missingness patterns

Description

this function shows how common possible missingness patterns are. Emulates misschk in stata.

excludes any variables that don't have any missings, so as not to clutter output. Disable using omit_complete
sorts variables by number of missings, so that the usual suspects show up at the front.
displays number of missings accounted for by each pattern

Usage

missingness_patterns(
  df,
  min_freq = ifelse(relative, 1/nrow(df), 1),
  long_pattern = FALSE,
  print_legend = ifelse(long_pattern, FALSE, TRUE),
  show_culprit = TRUE,
  relative = FALSE,
  omit_complete = TRUE
)
missingness_patterns(
  df,
  min_freq = ifelse(relative, 1/nrow(df), 1),
  long_pattern = FALSE,
  print_legend = ifelse(long_pattern, FALSE, TRUE),
  show_culprit = TRUE,
  relative = FALSE,
  omit_complete = TRUE
)

Arguments

`df`	dataset
`min_freq`	show only patterns that occur at least this often. Defaults to 1 observation.
`long_pattern`	by default (FALSE) only shows column indices for space and legibility reasons.
`print_legend`	prints a legend for the column indices, defaults to FALSE if long_pattern is set
`show_culprit`	defaults to TRUE. In case a missingness pattern boils down to one variable, it will be shown here.
`relative`	defaults to FALSE. If true, percentages are shown (relative to total before excluding minimum frequency).
`omit_complete`	defaults to TRUE. Columns that don't have any missings are excluded.

Examples

data(ChickWeight)
ChickWeight[1:2,c('weight','Chick')] = NA
ChickWeight[3:5,'Diet'] = NA
names(ChickWeight); nrow(ChickWeight)
missingness_patterns(ChickWeight)
data(ChickWeight)
ChickWeight[1:2,c('weight','Chick')] = NA
ChickWeight[3:5,'Diet'] = NA
names(ChickWeight); nrow(ChickWeight)
missingness_patterns(ChickWeight)

multi trait multi method matrix

Description

renders a MTMM using ggplot2. This function will split the variable names in a correlation matrix, or a data.frame. The first part will be used as the trait, the second as the method. Correlations are displayed as text, with the font size corresponding to absolute size. You can optionally supply a data frame of reliabilites to show in the diagonal.

Usage

mtmm(variables = NULL, reliabilities = NULL, split_regex = "_", cors = NULL)
mtmm(variables = NULL, reliabilities = NULL, split_regex = "_", cors = NULL)

Arguments

`variables`	data frame of variables that are supposed to be correlated
`reliabilities`	data frame of reliabilties: column 1: scale, column 2: rel. coefficient
`split_regex`	regular expression to separate construct and method from the variable name, splits on '.' by default
`cors`	you can also supply a (named) correlation matrix

Examples

data.mtmm = data.frame(
`Ach_self_report` = rnorm(200), `Pow_self_report` = rnorm(200), `Aff_self_report`= rnorm(200),
`Ach_peer_report` = rnorm(200),`Pow_peer_report`= rnorm(200),`Aff_peer_report` = rnorm(200),
`Ach_diary` = rnorm(200), `Pow_diary` = rnorm(200),`Aff_diary` = rnorm(200))
reliabilities = data.frame(scale = names(data.mtmm), rel = stats::runif(length(names(data.mtmm))))
mtmm(data.mtmm, reliabilities = reliabilities)

data.mtmm = data.frame(
`Ach_self_report` = rnorm(200), `Pow_self_report` = rnorm(200), `Aff_self_report`= rnorm(200),
`Ach_peer_report` = rnorm(200),`Pow_peer_report`= rnorm(200),`Aff_peer_report` = rnorm(200),
`Ach_diary` = rnorm(200), `Pow_diary` = rnorm(200),`Aff_diary` = rnorm(200))
reliabilities = data.frame(scale = names(data.mtmm), rel = stats::runif(length(names(data.mtmm))))
mtmm(data.mtmm, reliabilities = reliabilities)

Waffle plot

Description

Pass in a a variable and get a waffle plot. Useful to display simple counts or if the variable has different values, a square pie chart. If the variable has a length that makes the individual squares hard to see, consider showing hundreds, thousands etc.

Usage

qplot_waffle(
  x,
  shape = 15,
  rows = NULL,
  cols = NULL,
  drop_shadow_h = -0.3,
  drop_shadow_v = 0.3
)
qplot_waffle(
  x,
  shape = 15,
  rows = NULL,
  cols = NULL,
  drop_shadow_h = -0.3,
  drop_shadow_v = 0.3
)

Arguments

`x`	a variable with not too many unique values
`shape`	defaults to a filled square
`rows`	defaults to the rounded up square root of the number of values
`cols`	defaults to the rounded down square root of the number of values
`drop_shadow_h`	horizontal offset of the drop shadow, tinker with this to get a proper shadow effect
`drop_shadow_v`	vertical offset of the drop shadow

Details

To avoid the Hermann grid illusion, don't use dark colours.

Examples

qplot_waffle(rep(1:2,each=5))
qplot_waffle(rep(1:2,each=5))

Waffle plot (text)

Description

Usage

qplot_waffle_text(
  x,
  symbol = fontawesome_square,
  rows = NULL,
  cols = NULL,
  drop_shadow_h = -0.9,
  drop_shadow_v = 0.9,
  font_family = "FontAwesome",
  font_face = "Regular",
  font_size = round(140/sqrt(length(x)))
)
qplot_waffle_text(
  x,
  symbol = fontawesome_square,
  rows = NULL,
  cols = NULL,
  drop_shadow_h = -0.9,
  drop_shadow_v = 0.9,
  font_family = "FontAwesome",
  font_face = "Regular",
  font_size = round(140/sqrt(length(x)))
)

Arguments

`x`	a variable with not too many unique values
`symbol`	pass a unicode symbol from FontAwesome here. Defaults to a square with rounded edges
`rows`	defaults to the rounded up square root of the number of values
`cols`	defaults to the rounded down square root of the number of values
`drop_shadow_h`	horizontal offset of the drop shadow, tinker with this to get a proper shadow effect
`drop_shadow_v`	vertical offset of the drop shadow
`font_family`	defaults to FontAwesome
`font_face`	defaults to Regular
`font_size`	defaults to round(140/sqrt(length(x)))

Details

This functions is like waffle_plot but it allows you to specify custom symbols from FontAwesome. Copypaste them from here: http://fontawesome.io/cheatsheet

To avoid the Hermann grid illusion, don't use dark colours.

Examples

## Not run: 
qplot_waffle_text(rep(1:2,each=30), rows = 5)

## End(Not run)
## Not run: 
qplot_waffle_text(rep(1:2,each=30), rows = 5)

## End(Not run)

Waffle plot (tile)

Description

Usage

qplot_waffle_tile(x, rows = NULL, cols = NULL)
qplot_waffle_tile(x, rows = NULL, cols = NULL)

Arguments

`x`	a variable with not too many unique values
`rows`	defaults to the rounded up square root of the number of values
`cols`	defaults to the rounded down square root of the number of values

Details

This function allows and requires the least tinkering, but also does not drop shadows. To avoid the Hermann grid illusion, don't use dark colours.

adapted from http://shinyapps.stat.ubc.ca/r-graph-catalog/ who adapted it from http://www.techques.com/question/17-17842/How-to-make-waffle-charts-in-R who adapted it from http://ux.stackexchange.com/a/46543/56341

Examples

qplot_waffle_tile(rep(1:2,each=500))
qplot_waffle_tile(rep(1:2,each=500))

render an rmarkdown file in background using RStudio Jobs

Description

if you want to

Usage

render_job(input, params = NULL, output_file = NULL)
render_job(input, params = NULL, output_file = NULL)

Arguments

`input`	.Rmd document to be knitted
`params`	params to pass to the .Rmd
`output_file`	name of the output_file (and the job)

Examples

## Not run: 
   render_job("document.Rmd", list(dataset = "df1"), "summary_df1.html")

## End(Not run)

## Not run: 
   render_job("document.Rmd", list(dataset = "df1"), "summary_df1.html")

## End(Not run)

repeat last non-NA value

Description

Will repeat the last non-NA value. This is also known as carrying the last observation forward/backward. It's faster than zoo::na.locf http://rpubs.com/rubenarslan/repeat_last_na_locf and other alternatives. By specifying maxgap, you can choose not to bridge overly long gaps. By specifying forward = FALSE, you can carry the last observation backward.

Usage

repeat_last(x, forward = TRUE, maxgap = Inf, na.rm = FALSE)
repeat_last(x, forward = TRUE, maxgap = Inf, na.rm = FALSE)

Arguments

`x`	vector to be repeated
`forward`	carry last observation forward? or backward (FALSE)
`maxgap`	bridge only up to x NAs (defaults to Inf)
`na.rm`	whether to omit NAs at the beginning (defaults to FALSE)

Examples

x = c(NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,2,3,4,NA,NA,NA,NA,NA,5, NA)
data.frame(x,
   repeat_last(x),
   repeat_last(x, forward = FALSE),
   repeat_last(x, maxgap = 5),
check.names = FALSE)

x = c(NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,2,3,4,NA,NA,NA,NA,NA,5, NA)
data.frame(x,
   repeat_last(x),
   repeat_last(x, forward = FALSE),
   repeat_last(x, maxgap = 5),
check.names = FALSE)

take only nonmissing

Description

this function takes a subset of a dataset, omitting all cases with missings in variables specified in 'keep' and omitting all variables that still have missings after that. Good to see how large your dataset for a certain analysis will be and which covariates are 'free' in terms of sample size.

Usage

take_nonmissing(df, keep = c())
take_nonmissing(df, keep = c())

Arguments

`df`	dataset
`keep`	defaults to empty vector

Examples

data(ChickWeight)
ChickWeight[1:2,c('weight','Chick')] = NA
ChickWeight[3:4,'Diet'] = NA
names(ChickWeight); nrow(ChickWeight)
ChickWeight2 = take_nonmissing(ChickWeight, keep = c('weight'))
names(ChickWeight2); nrow(ChickWeight2)
data(ChickWeight)
ChickWeight[1:2,c('weight','Chick')] = NA
ChickWeight[3:4,'Diet'] = NA
names(ChickWeight); nrow(ChickWeight)
ChickWeight2 = take_nonmissing(ChickWeight, keep = c('weight'))
names(ChickWeight2); nrow(ChickWeight2)

Open in Excel

Description

Simple helper, so I don't complain about the slugginess of RStudio's View so much

Usage

view_in_excel(x)
view_in_excel(x)

Arguments

`x`	a dataframe to open in Excel

Examples

## Not run: 
view_in_excel(Titanic)

## End(Not run)
## Not run: 
view_in_excel(Titanic)

## End(Not run)

Package 'rcamisc'

Help Index

aggregates two variables from two sources into one

Description

Usage

Arguments

Examples

Am I going mad?

Description

Usage

Arguments

Examples

build a bibliography bibtex file from your lockfile

Description

Usage

Arguments

iterate adding ribbons to a ggplot2 plot at varying confidence levels to shade by confidence. Horribly inefficient, because smooth stat is computed every time, but flexible.

Description

Usage

Arguments

Examples

missingness patterns

Description

Usage

Arguments

Examples

multi trait multi method matrix

Description

Usage

Arguments

Examples

Waffle plot

Description

Usage

Arguments

Details

Examples

Waffle plot (text)

Description

Usage

Arguments

Details

Examples

Waffle plot (tile)

Description

Usage

Arguments

Details

Examples

render an rmarkdown file in background using RStudio Jobs

Description

Usage

Arguments

Examples

repeat last non-NA value

Description

Usage

Arguments

Examples

take only nonmissing

Description

Usage

Arguments

Examples

Open in Excel

Description

Usage

Arguments

Examples