Title: | rcamisc: Plot, wrangle and stay sane |
---|---|
Description: | Miscellaneous helpers for plotting and staying sane. |
Authors: | Ruben Arslan [aut, cre] |
Maintainer: | Ruben C. Arslan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2024-11-14 05:31:09 UTC |
Source: | https://github.com/rubenarslan/rcamisc |
Takes two variables with different missings and gives one variable with values of the second variable substituted where the first had missings.
aggregate2sources( df, new_var = NULL, var1 = NULL, var2 = NULL, remove_old_variables = TRUE )
aggregate2sources( df, new_var = NULL, var1 = NULL, var2 = NULL, remove_old_variables = TRUE )
df |
data.frame or variable |
new_var |
new variable name |
var1 |
first source. Assumed to be new_var.x (default suffixes after merging) |
var2 |
second source. Assumed to be new_var.y (default suffixes after merging) |
remove_old_variables |
Defaults to not keeping var1 and var2 in the resulting df. |
cars$dist.x = cars$dist cars$dist.y = cars$dist cars$dist.y[2:5] = NA cars$dist.x[10:15] = NA # sprinkle missings cars$dist = NULL # remove old variable cars = aggregate2sources(cars, 'dist')
cars$dist.x = cars$dist cars$dist.y = cars$dist cars$dist.y[2:5] = NA cars$dist.x[10:15] = NA # sprinkle missings cars$dist = NULL # remove old variable cars = aggregate2sources(cars, 'dist')
It's easy to attach packages that overwrite functions from other packages. Especially dplyr has a lot of conflicts with base packages, MASS and plyr. Because some of these conflicts do not always lead to error messages, sometimes just incorrect behaviour, this function exists. Don't trust your faulty memory, just check whether dplyr's (or any other package's) functions are 'on top' if you so desire.
amigoingmad(package = "dplyr", fix = TRUE, iteration = 0)
amigoingmad(package = "dplyr", fix = TRUE, iteration = 0)
package |
the package you want to be on top (loaded last), defaults to dplyr |
fix |
defaults to true. Detaches the desired package (without unloading) and loads it again. Won't work for base packages and can't overwrite functions that you defined yourself. |
iteration |
for internal use only, if set to 0 the function will call itself to check that it worked, if set to 1, it won't. |
amigoingmad(fix = FALSE, package = 'rcamisc')
amigoingmad(fix = FALSE, package = 'rcamisc')
Renv helps you maintain consistent package versions for a project. To be able to give due credit in a way that academics understand, it's helpful to be able to generate citations.
bibliography( overwrite_bib = FALSE, silent = FALSE, cite_only_directly_called = TRUE, lockfile_path = "renv.lock", bibliography_path = "bibliography.bibtex", cite_renv = !cite_only_directly_called )
bibliography( overwrite_bib = FALSE, silent = FALSE, cite_only_directly_called = TRUE, lockfile_path = "renv.lock", bibliography_path = "bibliography.bibtex", cite_renv = !cite_only_directly_called )
overwrite_bib |
whether to overwrite an existing bibtex file of the same name |
silent |
defaults to false. whether to cat out a nocite string to use in your header |
cite_only_directly_called |
whether to call only the packages you called yourself (default) or also their dependencies |
lockfile_path |
path to the packrat lock file to use |
bibliography_path |
path to the bibtex file to generate |
cite_renv |
whether to cite renv even if it's not loaded explicitly, defaults to the reverse of cite_only_directly_called |
iterate adding ribbons to a ggplot2 plot at varying confidence levels to shade by confidence. Horribly inefficient, because smooth stat is computed every time, but flexible.
geom_shady_smooth( mapping = NULL, data = NULL, stat = "smooth", method = "auto", formula = y ~ x, se = TRUE, position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, levels = c(0.6, 0.8, 0.95), base_alpha = 1, fill_gradient = NULL, fill = "black", ... )
geom_shady_smooth( mapping = NULL, data = NULL, stat = "smooth", method = "auto", formula = y ~ x, se = TRUE, position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, levels = c(0.6, 0.8, 0.95), base_alpha = 1, fill_gradient = NULL, fill = "black", ... )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
defaults to smooth |
method |
Smoothing method (function) to use, accepts either
For If you have fewer than 1,000 observations but want to use the same |
formula |
Formula to use in smoothing function, eg. |
se |
Display confidence interval around smooth? ( |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
levels |
the confidence levels that are supposed to be displayed, defaults to 0.6, 0.8, 0.95 |
base_alpha |
divided by length(levels) |
fill_gradient |
a vector of colors that has at least the same length as levels. Color each ribbon differently |
fill |
a single color for the ribbon |
... |
everything else is passed to and documented in |
data(beavers) plot = ggplot2::ggplot(beaver1, ggplot2::aes(time, temp)) plot + geom_shady_smooth() + ggplot2::facet_wrap(~ day) plot + geom_shady_smooth(fill = 'blue', levels = seq(0.05,0.95,0.1)) plot + geom_shady_smooth(size = 0.1, fill = '#49afcd', levels = seq(0.1,0.8,0.01)) plot + geom_shady_smooth(fill_gradient = c('red', 'orange', 'yellow'), base_alpha = 3)
data(beavers) plot = ggplot2::ggplot(beaver1, ggplot2::aes(time, temp)) plot + geom_shady_smooth() + ggplot2::facet_wrap(~ day) plot + geom_shady_smooth(fill = 'blue', levels = seq(0.05,0.95,0.1)) plot + geom_shady_smooth(size = 0.1, fill = '#49afcd', levels = seq(0.1,0.8,0.01)) plot + geom_shady_smooth(fill_gradient = c('red', 'orange', 'yellow'), base_alpha = 3)
this function shows how common possible missingness patterns are. Emulates misschk in stata.
excludes any variables that don't have any missings, so as not to clutter output. Disable using omit_complete
sorts variables by number of missings, so that the usual suspects show up at the front.
displays number of missings accounted for by each pattern
missingness_patterns( df, min_freq = ifelse(relative, 1/nrow(df), 1), long_pattern = FALSE, print_legend = ifelse(long_pattern, FALSE, TRUE), show_culprit = TRUE, relative = FALSE, omit_complete = TRUE )
missingness_patterns( df, min_freq = ifelse(relative, 1/nrow(df), 1), long_pattern = FALSE, print_legend = ifelse(long_pattern, FALSE, TRUE), show_culprit = TRUE, relative = FALSE, omit_complete = TRUE )
df |
dataset |
min_freq |
show only patterns that occur at least this often. Defaults to 1 observation. |
long_pattern |
by default (FALSE) only shows column indices for space and legibility reasons. |
print_legend |
prints a legend for the column indices, defaults to FALSE if long_pattern is set |
show_culprit |
defaults to TRUE. In case a missingness pattern boils down to one variable, it will be shown here. |
relative |
defaults to FALSE. If true, percentages are shown (relative to total before excluding minimum frequency). |
omit_complete |
defaults to TRUE. Columns that don't have any missings are excluded. |
data(ChickWeight) ChickWeight[1:2,c('weight','Chick')] = NA ChickWeight[3:5,'Diet'] = NA names(ChickWeight); nrow(ChickWeight) missingness_patterns(ChickWeight)
data(ChickWeight) ChickWeight[1:2,c('weight','Chick')] = NA ChickWeight[3:5,'Diet'] = NA names(ChickWeight); nrow(ChickWeight) missingness_patterns(ChickWeight)
renders a MTMM using ggplot2. This function will split the variable names in a correlation matrix, or a data.frame. The first part will be used as the trait, the second as the method. Correlations are displayed as text, with the font size corresponding to absolute size. You can optionally supply a data frame of reliabilites to show in the diagonal.
mtmm(variables = NULL, reliabilities = NULL, split_regex = "_", cors = NULL)
mtmm(variables = NULL, reliabilities = NULL, split_regex = "_", cors = NULL)
variables |
data frame of variables that are supposed to be correlated |
reliabilities |
data frame of reliabilties: column 1: scale, column 2: rel. coefficient |
split_regex |
regular expression to separate construct and method from the variable name, splits on '.' by default |
cors |
you can also supply a (named) correlation matrix |
data.mtmm = data.frame( `Ach_self_report` = rnorm(200), `Pow_self_report` = rnorm(200), `Aff_self_report`= rnorm(200), `Ach_peer_report` = rnorm(200),`Pow_peer_report`= rnorm(200),`Aff_peer_report` = rnorm(200), `Ach_diary` = rnorm(200), `Pow_diary` = rnorm(200),`Aff_diary` = rnorm(200)) reliabilities = data.frame(scale = names(data.mtmm), rel = stats::runif(length(names(data.mtmm)))) mtmm(data.mtmm, reliabilities = reliabilities)
data.mtmm = data.frame( `Ach_self_report` = rnorm(200), `Pow_self_report` = rnorm(200), `Aff_self_report`= rnorm(200), `Ach_peer_report` = rnorm(200),`Pow_peer_report`= rnorm(200),`Aff_peer_report` = rnorm(200), `Ach_diary` = rnorm(200), `Pow_diary` = rnorm(200),`Aff_diary` = rnorm(200)) reliabilities = data.frame(scale = names(data.mtmm), rel = stats::runif(length(names(data.mtmm)))) mtmm(data.mtmm, reliabilities = reliabilities)
Pass in a a variable and get a waffle plot. Useful to display simple counts or if the variable has different values, a square pie chart. If the variable has a length that makes the individual squares hard to see, consider showing hundreds, thousands etc.
qplot_waffle( x, shape = 15, rows = NULL, cols = NULL, drop_shadow_h = -0.3, drop_shadow_v = 0.3 )
qplot_waffle( x, shape = 15, rows = NULL, cols = NULL, drop_shadow_h = -0.3, drop_shadow_v = 0.3 )
x |
a variable with not too many unique values |
shape |
defaults to a filled square |
rows |
defaults to the rounded up square root of the number of values |
cols |
defaults to the rounded down square root of the number of values |
drop_shadow_h |
horizontal offset of the drop shadow, tinker with this to get a proper shadow effect |
drop_shadow_v |
vertical offset of the drop shadow |
To avoid the Hermann grid illusion, don't use dark colours.
qplot_waffle(rep(1:2,each=5))
qplot_waffle(rep(1:2,each=5))
Pass in a a variable and get a waffle plot. Useful to display simple counts or if the variable has different values, a square pie chart. If the variable has a length that makes the individual squares hard to see, consider showing hundreds, thousands etc.
qplot_waffle_text( x, symbol = fontawesome_square, rows = NULL, cols = NULL, drop_shadow_h = -0.9, drop_shadow_v = 0.9, font_family = "FontAwesome", font_face = "Regular", font_size = round(140/sqrt(length(x))) )
qplot_waffle_text( x, symbol = fontawesome_square, rows = NULL, cols = NULL, drop_shadow_h = -0.9, drop_shadow_v = 0.9, font_family = "FontAwesome", font_face = "Regular", font_size = round(140/sqrt(length(x))) )
x |
a variable with not too many unique values |
symbol |
pass a unicode symbol from FontAwesome here. Defaults to a square with rounded edges |
rows |
defaults to the rounded up square root of the number of values |
cols |
defaults to the rounded down square root of the number of values |
drop_shadow_h |
horizontal offset of the drop shadow, tinker with this to get a proper shadow effect |
drop_shadow_v |
vertical offset of the drop shadow |
font_family |
defaults to FontAwesome |
font_face |
defaults to Regular |
font_size |
defaults to round(140/sqrt(length(x))) |
This functions is like waffle_plot but it allows you to specify custom symbols from FontAwesome. Copypaste them from here: http://fontawesome.io/cheatsheet
To avoid the Hermann grid illusion, don't use dark colours.
## Not run: qplot_waffle_text(rep(1:2,each=30), rows = 5) ## End(Not run)
## Not run: qplot_waffle_text(rep(1:2,each=30), rows = 5) ## End(Not run)
Pass in a a variable and get a waffle plot. Useful to display simple counts or if the variable has different values, a square pie chart. If the variable has a length that makes the individual squares hard to see, consider showing hundreds, thousands etc.
qplot_waffle_tile(x, rows = NULL, cols = NULL)
qplot_waffle_tile(x, rows = NULL, cols = NULL)
x |
a variable with not too many unique values |
rows |
defaults to the rounded up square root of the number of values |
cols |
defaults to the rounded down square root of the number of values |
This function allows and requires the least tinkering, but also does not drop shadows. To avoid the Hermann grid illusion, don't use dark colours.
adapted from http://shinyapps.stat.ubc.ca/r-graph-catalog/ who adapted it from http://www.techques.com/question/17-17842/How-to-make-waffle-charts-in-R who adapted it from http://ux.stackexchange.com/a/46543/56341
qplot_waffle_tile(rep(1:2,each=500))
qplot_waffle_tile(rep(1:2,each=500))
if you want to
render_job(input, params = NULL, output_file = NULL)
render_job(input, params = NULL, output_file = NULL)
input |
.Rmd document to be knitted |
params |
params to pass to the .Rmd |
output_file |
name of the output_file (and the job) |
## Not run: render_job("document.Rmd", list(dataset = "df1"), "summary_df1.html") ## End(Not run)
## Not run: render_job("document.Rmd", list(dataset = "df1"), "summary_df1.html") ## End(Not run)
Will repeat the last non-NA value. This is also known as carrying the last observation forward/backward. It's faster than zoo::na.locf http://rpubs.com/rubenarslan/repeat_last_na_locf and other alternatives. By specifying maxgap, you can choose not to bridge overly long gaps. By specifying forward = FALSE, you can carry the last observation backward.
repeat_last(x, forward = TRUE, maxgap = Inf, na.rm = FALSE)
repeat_last(x, forward = TRUE, maxgap = Inf, na.rm = FALSE)
x |
vector to be repeated |
forward |
carry last observation forward? or backward (FALSE) |
maxgap |
bridge only up to x NAs (defaults to Inf) |
na.rm |
whether to omit NAs at the beginning (defaults to FALSE) |
x = c(NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,2,3,4,NA,NA,NA,NA,NA,5, NA) data.frame(x, repeat_last(x), repeat_last(x, forward = FALSE), repeat_last(x, maxgap = 5), check.names = FALSE)
x = c(NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,2,3,4,NA,NA,NA,NA,NA,5, NA) data.frame(x, repeat_last(x), repeat_last(x, forward = FALSE), repeat_last(x, maxgap = 5), check.names = FALSE)
this function takes a subset of a dataset, omitting all cases with missings in variables specified in 'keep' and omitting all variables that still have missings after that. Good to see how large your dataset for a certain analysis will be and which covariates are 'free' in terms of sample size.
take_nonmissing(df, keep = c())
take_nonmissing(df, keep = c())
df |
dataset |
keep |
defaults to empty vector |
data(ChickWeight) ChickWeight[1:2,c('weight','Chick')] = NA ChickWeight[3:4,'Diet'] = NA names(ChickWeight); nrow(ChickWeight) ChickWeight2 = take_nonmissing(ChickWeight, keep = c('weight')) names(ChickWeight2); nrow(ChickWeight2)
data(ChickWeight) ChickWeight[1:2,c('weight','Chick')] = NA ChickWeight[3:4,'Diet'] = NA names(ChickWeight); nrow(ChickWeight) ChickWeight2 = take_nonmissing(ChickWeight, keep = c('weight')) names(ChickWeight2); nrow(ChickWeight2)
Simple helper, so I don't complain about the slugginess of RStudio's View so much
view_in_excel(x)
view_in_excel(x)
x |
a dataframe to open in Excel |
## Not run: view_in_excel(Titanic) ## End(Not run)
## Not run: view_in_excel(Titanic) ## End(Not run)