class: inverse, center, title-slide, middle # Writing Functions with the {tidyverse} ### Jessica A. Lavery & Daniel D. Sjoberg #### Last Updated: October 28, 2022 #### Originally Presented: February 15, 2022 <p align="center"><img src="Images/White-Transparent.png" width=30%></p> .medium[ @jessicalavs <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#FFFFFF;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @statistishdan @jalavery <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#FFFFFF;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @ddsjoberg ] --- class: inverse # Poll Have you ever heard of tidy evaluation? * No * I've heard of it, but I'm not sure what it is * I know what it is, but I'm not comfortable using it * Yes, I feel comfortable using it --- # Goals for Today Be able to write a function that utilizes `{tidyverse}` functions (e.g., `mutate`, `select`, `filter`, etc.) by using tidy evaluation --- # Outline * Function Basics * Nuances of writing functions with the tidyverse - data-masking - tidy-select * Writing functions that use data-masking tidyverse functions * Writing functions that use tidy-select tidyverse functions * Practice * Resources & Additional Material --- # Functions in R **Rule of thumb**: If you're copy/pasting more than twice, write a function! Functions: * Prevent inconsistencies because they force multiple computations to follow a single recipe * Emphasize what varies (arguments) and what is constant (everything else) * Make updates easier since you only modify one place * Make code clearer, especially if you give function and arguments informative names If you're a `{tidyverse}` user, you'll likely want to use the `{tidyverse}` for your functions. However, because of some technical features of the `{tidyverse}` (*data masking*), there are a few nuances to writing tidyverse-style functions. --- # Anatomy of a Function ```r function_name <- function(argument1, argument2){ # body of the function # code to repeat } # the function call function_name(argument1 = "arg1", argument2 = "arg2") ``` --- # What's Returned by a Function * By default, only the last known operation is returned by the function * This is *different* from SAS macros * Can use a named list to return multiple objects ```r example_return_list <- function(arg1){ # function body # create object 1 # create object 2 return(list("object1" = object1, "object2" = object2)) } # store the function call call_return_list <- example_return_list(arg1) # will return call_return_list$object1 call_return_list$object2 ``` --- # Steps for Writing a Function 1. Write and test your code in R, outside of a function 2. Enclose in `function(){[working code here]}` with an informatively named function and arguments 3. Adapt code to replace variable names and/or values with function arguments, as needed --- class: inverse, center, middle # Writing tidyverse-style functions --- # Attempt #1: Writing a tiyverse-style function ```r # outside of a function, this code works gtsummary::trial %>% select(trt) # A tibble: 200 x 1 # trt # <chr> # 1 Drug A # 2 Drug B # etc. ``` ```r # putting the exact (working) code into a function test_function <- function(select_var){ gtsummary::trial %>% select(select_var) } # Error: object 'trt' not found test_function(select_var = trt) ``` * Why does this error occur? Because the `{tidyverse}` utilizes *tidy evaluation* --- # What is tidy evaluation? **Tidy evaluation:** A framework for controlling how expressions and variables in your code are evaluated by tidyverse functions. * Allows programmers to select variables based on their position, name, or type * Useful for passing variable names as inputs to functions that use tidyverse packages like `dplyr` and `ggplot2` * {dplyr} verbs rely on tidy evaluation to resolve programming commands --- # Tidy evaluation * Two types of tidy evaluation: - **Data-masking**, used by: `arrange()`, `count()`, `filter()`, `group_by()`, `mutate()`, and `summarise()` - **Tidy-select**, used by: `across()`, `relocate()`, `rename()`, `select()`, and `pull()` * To determine which type of tidy evaluation a function uses, look at the help file <!-- <p align="center"><img src="Images/filter_help_file.png" height=10%></p> --> --- class: inverse, center, middle # data-masking --- # Data-masking .pull-left[ * Data-masking is a distinctive feature of R whereby programming is performed directly on a data set, with columns defined as normal objects. ] .pull-right[ * While data-masking makes it easy to program interactively with data frames, it makes it harder to create functions. ] ```r # Almost all base R functions use unmasked programming mean(mtcars$cyl + mtcars$am) #> [1] 6.59375 # Referring to columns without `$` is an error - Where is the data? mean(cyl + am) #> Error in mean(cyl + am): object 'cyl' not found # R is looking in the global environment for an object named 'cyl' # Equivalent code with functions from dplyr that use data masking # Data masking allows you to reference columns without using $ mtcars %>% summarize(new_mean = mean(cyl + am)) #> new_mean #> dbl #> 6.59375 ``` --- # Writing functions using {dplyr} verbs that use data-masking * Data masking introduces ambiguity with respect to what you mean by "variable". * **env-variables**: objects (variables) that live in the environment; usually created with `<-` ```r # example of an env-variable x <- 3 ``` * **data-variables**: variables that live in the data frame; usually arise from reading in data or manipulating data that was read in to create new variables in a data frame ```r # example of a data variable: mpg on df mtcars (and mtcars is an env-variable!) mtcars$mpg ``` * Relation to data masking: Data masking allows you to reference `data-variables` without specifying the `env-variable` they arise from. Allows for shorthand in code, but introduces problems when writing functions. --- # Writing functions using {dplyr} verbs that use data-masking * {dplyr} verbs that use data masking: - `arrange()`, `count()`, `filter()`, `group_by()`, `mutate()`, and `summarise()` * Data masking introduces ambiguity with respect to what you mean by "variable". * Ambiguity is clarified by indicating to R where to look for an object (within a data frame or within the environment) via `.data$varname`, `.data[[group_var]]`, `.env$global_var_name` --- # Passing quoted arguments to group_by() A first attempt: ```r my_group_function <- function(group_vars){ gtsummary::trial %>% dplyr::group_by(group_vars) %>% dplyr::summarize(n = n()) } my_group_function(group_vars = "trt") # Error: Must group by variables found in `.data`. # * Column `group_vars` is not found. ``` --- # Passing quoted arguments to group_by() * Passing a single variable to group by ```r my_group_function <- function(group_var){ gtsummary::trial %>% dplyr::group_by(.data[[group_var]]) } my_group_function(group_var = "trt") # A tibble: 200 x 8 # Groups: trt [2] # trt age marker stage grade response death ttdeath # <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> # 1 Drug A 23 0.16 T1 II 0 0 24 # 2 Drug B 9 1.11 T2 I 1 0 24 ``` --- # Passing quoted arguments to group_by() * Passing multiple variables to group by * `across()`: allows you to use `select()` semantics inside data-masking functions ```r my_group_function <- function(group_vars){ gtsummary::trial %>% dplyr::group_by(across(group_vars)) } my_group_function(group_vars = c("trt", "stage")) # A tibble: 200 x 8 # Groups: trt, stage [8] # trt age marker stage grade response death ttdeath # <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> # 1 Drug A 23 0.16 T1 II 0 0 24 # 2 Drug B 9 1.11 T2 I 1 0 24 ``` --- # Passing quoted arguments to filter() A first attempt: ```r # using a function to select a single variable my_filter_function <- function(filter_condition){ gtsummary::trial %>% dplyr::filter(filter_condition) } my_filter_function(filter_condition = "age > 65") # Error: Problem with `filter()` input `..1`. # i Input `..1` is `filter_condition`. # x Input `..1` must be a logical vector, not a character. ``` Why did we get this note? *The character condition needs to be an expression* --- # Passing quoted arguments to filter() * Can use `!!` injector & `rlang::parse_expr()` * `rlang::parse_expr`: transforms text into an un-evaluated expression (i.e., it removes the quotation marks but doesn't evaluate the text) * `!!` to be covered in next section ```r my_filter_function <- function(filter_condition){ gtsummary::trial %>% dplyr::filter(!!rlang::parse_expr(filter_condition)) } my_filter_function(filter_condition = "age > 65") # A tibble: 23 x 8 # trt age marker stage grade response death ttdeath # <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> # 1 Drug B 71 0.445 T4 III 0 1 8.71 # 2 Drug B 67 1.16 T1 II 0 0 24 # 3 Drug B 68 0.105 T4 II 0 1 15.4 # etc. ``` --- # Passing quoted arguments to mutate() A first attempt: ```r my_mutate_function <- function(mutate_var){ gtsummary::trial %>% dplyr::mutate(mean = mean(mutate_var, na.rm = TRUE)) } my_mutate_function(mutate_var = "age") # mean column entirely missing # A tibble: 200 x 9 # trt age marker stage grade response death ttdeath mean # <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> <dbl> # 1 Drug A 23 0.16 T1 II 0 0 24 NA # 2 Drug B 9 1.11 T2 I 1 0 24 NA # 3 Drug A 31 0.277 T1 II 0 0 24 NA # etc. # Warning message: # Problem with `mutate()` column `mean`. # i `mean = mean(mutate_var, na.rm = TRUE)`. ``` --- # Passing quoted arguments to mutate() ```r my_mutate_function <- function(mutate_var){ gtsummary::trial %>% dplyr::mutate(mean = mean(.data[[mutate_var]], na.rm = TRUE)) } my_mutate_function(mutate_var = "age") # A tibble: 200 x 9 # trt age marker stage grade response death ttdeath mean # <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> <dbl> # 1 Drug A 23 0.16 T1 II 0 0 24 47.2 # 2 Drug B 9 1.11 T2 I 1 0 24 47.2 ``` --- # Passing unquoted arguments to Data-Masking Functions * Passing data-masked arguments to functions requires injection (also known as quasiquotation) with the embracing operator `{{ · }}` or, in more complex cases, the injection operator `!!`. * This is needed because under the hood data-masking works by defusing R code to prevent its immediate evaluation. * The defused code is resumed later on in a context where data frame columns are defined. ```r my_mean <- function(data, var1, var2) { dplyr::summarise(data, mean(var1 + var2)) } my_mean(mtcars, cyl, am) #> Error in `dplyr::summarise()`: #> ! Problem while computing `..1 = mean(var1 + var2)`. #> Caused by error in `mean()`: #> ! object 'cyl' not found ``` ??? - The problem here is that `summarise()` defuses the R code it was supplied, i.e. mean(var1 + var2). - Instead we want it to see mean(cyl + am). - This is why we need injection, we need to modify that piece of code by injecting the code supplied to the function in place of var1 and var2. --- # How to use `{{ · }}`? * To inject a function argument in data-masked context, just embrace it with `{{ · }}` ```r my_mean <- function(data, var1, var2) { dplyr::summarise(data, mean({{ var1 }} + {{ var2 }})) } my_mean(mtcars, cyl, am) #> # A tibble: 1 x 1 #> `mean(cyl + am)` #> <dbl> #> 1 6.59 ``` --- class: inverse, center, middle # dplyr verbs that use tidy-select --- # Writing functions using {dplyr} verbs that use tidy-select * Details of tidy-select available [online](https://tidyselect.r-lib.org/articles/syntax.html) for the curious programmer * Slides reflect {tidyselect} 1.2.0, released October 2022 * To write functions using {dplyr} verbs that use tidy-select: In selecting functions, can put the variable name in quotes or use `all_of()`, `any_of()` helper functions to select variables --- # Writing functions using {dplyr} verbs that use tidy-select * `any_of()`: selecting **any** of the listed variables * `all_of()`: for **strict** selection. If any of the variables in the character vector is missing, an error is thrown * Can also use `!all_of()` to select all variables not found in the character vector supplied to `all_of()` ```r # using a function to select multiple variables my_select_function <- function(select_variable){ gtsummary::trial %>% dplyr::select(dplyr::all_of(select_variable)) } my_select_function(select_variable = c("trt", "age")) # A tibble: 200 x 2 # trt age # <chr> <dbl> # 1 Drug A 23 # 2 Drug B 9 # etc. ``` --- # Passing quoted arguments to select() First attempt: ```r # using a function to select a single variable my_select_function <- function(select_variables){ gtsummary::trial %>% dplyr::select(select_variables) } my_select_function(select_variables = c("trt", "age")) # Warning message: # Using an external vector in selections was deprecated in tidyselect 1.1.0. # i Please use `all_of()` or `any_of()` instead. # # Was: # data %>% select(select_variables) # # # Now: # data %>% select(all_of(select_variables)) # # See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>. ``` --- # Passing quoted arguments to select() * No messages or warnings returned after `all_of` is added to `select` ```r # using a function to select a single variable my_select_function <- function(select_variables){ gtsummary::trial %>% dplyr::select(all_of(select_variables)) } my_select_function(select_variables = c("trt", "age")) # # A tibble: 200 x 1 # trt # <chr> # 1 Drug A # 2 Drug B # 3 Drug A # etc. ``` --- # Passing unquoted arguments to dplyr verbs that use tidy-select First attempt: ```r # using a function to select a single variable my_select_function <- function(select_variable){ gtsummary::trial %>% dplyr::select(select_variable) } my_select_function(select_variable = trt) # Error: object 'trt' not found ``` --- # Passing unquoted arguments to dplyr verbs that use tidy-select * Similar to with data-masking functions, `{{ }}` ("curly-curly") allows us to pass unquoted arguments to dplyr verbs ```r # using a function to select a single variable my_select_function <- function(select_variable){ gtsummary::trial %>% dplyr::select({{select_variable}}) } my_select_function(select_variable = trt) # Error: object 'trt' not found ``` See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>. --- # Passing quoted arguments to select() We need to account for `select()` using tidyselect that by using `any_of()` or `all_of()` ```r my_select_function <- function(select_variable){ gtsummary::trial %>% dplyr::select(all_of(select_variable)) } my_select_function(select_variable = "trt") # A tibble: 200 x 1 # trt # <chr> # 1 Drug A # 2 Drug B # etc. ``` --- class: inverse, center, middle # Questions? --- # Conclusion 1. Don't copy/paste your code >2x, write a function! 2. When writing functions using the `{tidyverse}`, need to account for the back-end design of the `{tidyverse}`, namely data-masking and tidy-select 3. This is tricky! (and not super consistent, though recent updates to {tidyselect} make it easier to distinguish data-masking from tidy-select evaluation) --- class: inverse, center, middle # Let's Practice --- # Practice with {ggplot} Write a function that uses `{ggplot}` to create a scatterplot of age versus marker value from the `gtsummary::trial` dataset, where you can vary: * The subset of the data that is plotted (e.g., trt = Drug A or age > 65) * The variable used to color code the points (stage or grade) ```r head(gtsummary::trial) # A tibble: 6 x 8 # trt age marker stage grade response death ttdeath # <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> # 1 Drug A 23 0.16 T1 II 0 0 24 # 2 Drug B 9 1.11 T2 I 1 0 24 # 3 Drug A 31 0.277 T1 II 0 0 24 # 4 Drug A NA 2.07 T3 III 1 1 17.6 # 5 Drug A 51 2.77 T4 III 1 1 16.4 # 6 Drug B 39 0.613 T4 I 0 1 15.6 ``` --- # Step 1 Write ggplot code that works and is close to what we want --- # Step 1 Write ggplot code that works and is close to what we want ```r library(ggplot2) gtsummary::trial %>% dplyr::filter(trt == "Drug A") %>% ggplot(aes(x = age, y = marker, color = stage)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-30-1.png)<!-- --> --- # Step 2 Put the working code into a function (haven't yet used function arguments) --- # Step 2 Put the working code into a function (haven't yet used function arguments) ```r plot_age_marker <- function(filter_condition, color_variable){ gtsummary::trial %>% dplyr::filter(trt == "Drug A") %>% ggplot(aes(x = age, y = marker, color = stage)) + geom_point() } ``` --- # Step 3 Replace the value of stage with the color_variable function input --- # Step 3 Replace the value of stage with the color_variable function input ```r plot_age_marker <- function(filter_condition, color_variable){ gtsummary::trial %>% dplyr::filter(!!rlang::parse_expr(filter_condition)) %>% ggplot(aes(x = age, y = marker, color = .data[[color_variable]])) + geom_point() } ``` --- # Try to run our function ```r # try to run our function plot_age_marker(filter_condition = "age > 40", color_variable = "stage") ``` ![](index_files/figure-html/unnamed-chunk-33-1.png)<!-- --> --- # Try to run our function Now, what if we want to subset the data based on a categorical variable? ```r plot_age_marker(filter_condition = 'trt == "Drug A"', color_variable = "stage") ``` ```r plot_age_marker(filter_condition = "trt == \"Drug A\"", color_variable = "stage") ``` ![](index_files/figure-html/unnamed-chunk-35-1.png)<!-- --> --- # Step 4 Re-write the function using bare (unquoted) arguments --- # Step 4 Re-write the function using bare (unquoted) arguments ```r plot_age_marker_bare <- function(filter_condition, color_variable){ gtsummary::trial %>% dplyr::filter({{filter_condition}}) %>% ggplot(aes(x = age, y = marker, color = {{color_variable}})) + geom_point() } plot_age_marker_bare(filter_condition = age > 40, color_variable = stage) ``` ![](index_files/figure-html/unnamed-chunk-36-1.png)<!-- --> --- # Resources * [Tidy Evaluation: Getting up to Speed](https://tidyeval.tidyverse.org/sec-up-to-speed.html) * [Programming with `dplyr` Vignette](https://dplyr.tidyverse.org/articles/programming.html) * `?dplyr_data_masking` and `?dplyr_tidy_select` * [What is data-masking and why do I need {{?](https://rlang.r-lib.org/reference/topic-data-mask.html) * [{tidyselect} 1.2.0 Release Notes](https://github.com/r-lib/tidyselect/blob/main/NEWS.md#tidyselect-120) * [Using ggplot2 in packages](https://ggplot2.tidyverse.org/articles/ggplot2-in-packages.html) * [Functional Programming: Tidy Evaluation](https://dcl-prog.stanford.edu/tidy-eval-detailed.html) * [GitHub repo for this presentation](https://github.com/MSKCC-Epi-Bio/Writing-Function-with-the-tidyverse) Advanced: * [rlang](https://rlang.r-lib.org/) (the underlying package that implements tidy evaluation) * [Advanced R](https://adv-r.hadley.nz/) --- class: inverse, center, middle # Additional Material --- # Alternatives to `{{ · }}`? * `{{ · }}` is a shortcut for `!!enquo(·)` * Use the `!!` and `rlang::enquo(·)` combination when you need to pass `var1` and `var2` around before it's injected. * `!!` is a part of `{rlang}` ```r my_mean <- function(data, var1, var2) { var1_quo <- rlang::enquo(var1) var2_quo <- rlang::enquo(var2) dplyr::summarise(data, mean(!!var1_quo + !!var2_quo)) } my_mean(mtcars, cyl, am) #> # A tibble: 1 x 1 #> `mean(cyl + am)` #> <dbl> #> 1 6.59 ``` --- # Can I avoid `{{ · }}` and `!!enquo(·)`? * YES! (mostly) * Immediately convert inputs into column name **strings**, and utilize all the what you learned in the first portion of this presentation. ```r my_mean <- function(data, var1, var2) { var1 <- dplyr::select(data, {{ var1 }}) |> names() var2 <- dplyr::select(data, {{ var2 }}) |> names() dplyr::summarise(data, mean(.data[[var1]] + .data[[var2]])) } my_mean(mtcars, cyl, am) #> # A tibble: 1 x 1 #> `mean(.data[["cyl"]] + .data[["am"]])` #> <dbl> #> 1 6.59 ``` --- # Naming new variables created within your function * Cue, the walrus operator `:=` * Part of the `rlang` package * `{glue}` syntax is automatically recognized on the left-hand side of the walrus operator ```r my_naming_function <- function(variable){ gtsummary::trial %>% dplyr::mutate("mean_{variable}" := mean(.data[[variable]], na.rm = TRUE)) } my_naming_function(variable = "age") # A tibble: 200 x 9 # trt age marker stage grade response death ttdeath mean_age # <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> <dbl> # 1 Drug A 23 0.16 T1 II 0 0 24 47.2 # 2 Drug B 9 1.11 T2 I 1 0 24 47.2 ``` --- # rlang * Examples for when you can't use the walrus operator and have to use `rlang::sym()` or `rlang::syms()` * `rlang::sym()` is used when you have a string and you want it to be unquoted * `rlang::syms()` is used when you have a vector and you want it to be unquoted ```r # e.g., a vector of c("A", "B", "C") vec <- c("mpg", "hp") # Error: Can't subset columns that don't exist. # x Column `vec` doesn't exist. mtcars %>% dplyr::select(vec) # now works # similar to `all_of()` but there are times when you need the below mtcars %>% dplyr::select(!!!rlang::syms(vec)) ``` * Can also be used to name variables within functions --- # rlang * Can use `rlang::expr()` to see what `rlang::sym()` is evaluating to ```r vec <- c("mpg", "hp") rlang::expr(mtcars %>% dplyr::select(!!!rlang::syms(vec))) # mtcars %>% dplyr::select(mpg, hp) ``` --- # rlang: unquoted expresssions * You can also write functions that accept unquoted expressions as arguments ```r my_filter <- function(data, condition) { condition <- rlang::enquo(condition) dplyr::filter(data, !!condition) } my_filter(gtsummary::trial, age < 30) |> head() #> # A tibble: 6 x 8 #> trt age marker stage grade response death ttdeath #> <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> #> 1 Drug A 23 0.16 T1 II 0 0 24 #> 2 Drug B 9 1.11 T2 I 1 0 24 #> 3 Drug B 21 0.258 T4 I 0 1 12.9 #> 4 Drug B 28 0.803 T4 II 0 1 18 #> 5 Drug B 25 2.45 T1 I 1 0 24 #> 6 Drug B 25 0.531 T4 III 0 1 23.2 ```