--- title: "Codebook example with formr.org data" author: "Ruben Arslan" date: "`r Sys.Date()`" output: html_document: toc: true toc_depth: 4 toc_float: true code_folding: 'hide' self_contained: true fig_width: 7 fig_height: 7 fig_retina: null vignette: > %\VignetteIndexEntry{Using formr.org data and metadata} \%VignetteKeyword{codebook} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- In this vignette, you can see what a codebook generated from a dataset with rich metadata looks like. This dataset includes mock data for a short German Big Five personality inventory and an age variable. The dataset follows the format created when importing data from [formr.org](https://formr.org). However, data imported using the `haven` package uses similar metadata. You can also add such metadata yourself, or use the codebook package for unannotated datasets. As you can see below, the `codebook` package automatically computes reliabilities for multi-item inventories, generates nicely labelled plots and outputs summary statistics. The same information is also stored in a table, which you can export to various formats. Additionally, `codebook` can show you different kinds of (labelled) missing values, and show you common missingness patterns. As _you_ cannot see, but _[search engines](https://datasetsearch.research.google.com)_ will, the `codebook` package also generates [JSON-LD](https://json-ld.org/) metadata for the [dataset](https://developers.google.com/search/docs/data-types/dataset). If you share your codebook as an HTML file online, this metadata should make it easier for others to find your data. [See what Google sees here](https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Frubenarslan.github.io%2Fcodebook%2Farticles%2Fcodebook.html). ```{r warning=FALSE,message=FALSE} knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina")) knitr::opts_chunk$set(warning = FALSE, message = TRUE, error = FALSE) ggplot2::theme_set(ggplot2::theme_bw()) library(codebook) data("bfi", package = 'codebook') if (!knit_by_pkgdown) { library(dplyr) bfi <- bfi %>% select(-starts_with("BFIK_extra"), -starts_with("BFIK_open"), -starts_with("BFIK_consc")) } set.seed(1) bfi$age <- rpois(nrow(bfi), 30) library(labelled) var_label(bfi$age) <- "Alter" ``` By default, we only set the required metadata attributes `name` and `description` to sensible values. However, there is a number of attributes you can set to describe the data better. [Find out more](https://developers.google.com/search/docs/data-types/dataset). ```{r} metadata(bfi)$name <- "MOCK Big Five Inventory dataset (German metadata demo)" metadata(bfi)$description <- "a small mock Big Five Inventory dataset" metadata(bfi)$identifier <- "doi:10.5281/zenodo.1326520" metadata(bfi)$datePublished <- "2016-06-01" metadata(bfi)$creator <- list( "@type" = "Person", givenName = "Ruben", familyName = "Arslan", email = "ruben.arslan@gmail.com", affiliation = list("@type" = "Organization", name = "MPI Human Development, Berlin")) metadata(bfi)$citation <- "Arslan (2016). Mock BFI data." metadata(bfi)$url <- "https://rubenarslan.github.io/codebook/articles/codebook.html" metadata(bfi)$temporalCoverage <- "2016" metadata(bfi)$spatialCoverage <- "Goettingen, Germany" ``` ```{r} # We don't want to look at the code in the codebook. knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE) ``` ```{r cb} codebook(bfi, metadata_table = knit_by_pkgdown, metadata_json = TRUE) ``` `r ifelse(knit_by_pkgdown, '', '### Codebook table')` ```{r} if (!knit_by_pkgdown) { codebook:::escaped_table(codebook_table(bfi)) } ```