Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Provide option to reject a script if using unapproved packages #168

Open
parmsam-pfizer opened this issue Mar 3, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@parmsam-pfizer
Copy link
Collaborator

Feature Idea

Allow the user to select a "ignore", "warn", "error" with a "warn" default when running aexecute() on unapproved packages. The aexecute function could show the selected diagnostic message for unapproved package use to the user. This would be a good feature enhancement to the vignette on logging unapproved packages. I think this could be added in the part of the log_write function that checks for log.rx.approved:

logrx/R/log.R

Lines 233 to 243 in ef8428e

if (file.exists(getOption("log.rx.approved"))) {
approved_functions <- readRDS(getOption("log.rx.approved"))
unapproved_functions <- get_unapproved_use(used_functions, approved_functions)
set_log_element("unapproved_packages_functions", unapproved_functions)
cleaned_log_vec <- c(cleaned_log_vec,
write_log_header("Unapproved Package and Functions"),
write_unapproved_functions())
cleaned_log <- cleaned_log[!(names(cleaned_log)) %in% "unapproved_packages_functions"]
}
Curious about others thoughts on this.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

@nicholas-masel
Copy link
Collaborator

I like the feature, but I'm wondering what the expected outcome is here?

Function and package use is found after the script has executed. Is the expectation is the script would error immediately when an unapproved function is used?

@parmsam-pfizer
Copy link
Collaborator Author

Yes, that's a great point you raise, @nicholas-masel. I think currently we can only support the feedback after script execution. That's what I was leaning towards. Not sure if we would have the ability to find the functions/packages used prior to executing the script.

@thomas-neitmann
Copy link

How about parsing the script prior to executing and then looking for all instances of library(), require() or pkg::fun() to discover any packages used? That way the script would not even get executed.

@thomas-neitmann
Copy link

So I found this to be an intriguing problem and went down the rabbit hole. Here's what I've come up with:

library(testthat)

extract_used_pkgs <- function(script) {
  ast <- parse(script)
  
  .extract_used_pkgs <- function(expr) {
    if (!is.call(expr)) {
      return(character())
    }
    
    if (expr[[1L]] == quote(library) || expr[[1L]] == quote(require) || expr[[1L]] == quote(`::`)) {
      if (isFALSE(expr$character.only) || is.null(expr$character.only)) {
        return(as.character(expr[[2L]]))
      }
      stop("Do not set `character.only` to `TRUE` inside `library()` or `require()`", call. = FALSE)
    }
    
    unlist(lapply(expr, .extract_used_pkgs))
  }
  
  pkgs <- lapply(ast, .extract_used_pkgs)
  unique(unlist(pkgs))
}

test_that("packages attached with `library()` are detected", {
  script <- c(
    "library(dplyr)",
    "library(ggplot2)",
    "library(stringr)",
    "x <- 0",
    "y <- x + 1"
  )
  file <- tempfile()
  writeLines(script, file)
  
  expect_identical(extract_used_pkgs(file), c("dplyr", "ggplot2", "stringr"))
})

test_that("packages attached with `require()` are detected", {
  script <- c(
    "require(dplyr)",
    "require(ggplot2)",
    "require(stringr)",
    "x <- 0",
    "y <- x + 1"
  )
  file <- tempfile()
  writeLines(script, file)
  
  expect_identical(extract_used_pkgs(file), c("dplyr", "ggplot2", "stringr"))
})

test_that("packages loaded with `::` are detected", {
  script <- c(
    "adsl <- haven::read_sas('./data/adsl.sas7bdat')",
    "dplyr::select(adsl, USUBJID, AGE, SEX)"
  )
  file <- tempfile()
  writeLines(script, file)
  
  expect_identical(extract_used_pkgs(file), c("haven", "dplyr"))
})

test_that("packages loaded with `::` are detected", {
  script <- c(
    "library('magrittr')",
    "adsl <- haven::read_sas('./data/adsl.sas7bdat')",
    "adsl %>% dplyr::select(USUBJID, AGE, SEX) %>% tidyr::filter(SEX == 'M')",
    "pkg1::select(pkg2::filter(adsl, SEX == 'M'), USUBJID, AGE, SEX)"
  )
  file <- tempfile()
  writeLines(script, file)
  
  expect_identical(extract_used_pkgs(file), c("magrittr", "haven", "dplyr", "tidyr", "pkg1", "pkg2"))
})

test_that("error when `character.only = TRUE`", {
  script <- c(
    "pkg1 <- 'dplyr'",
    "pkg2 <- 'ggplot2'",
    "library(pkg1, character.only = TRUE)",
    "require(pkg2, character.only = TRUE)"
  )
  file <- tempfile()
  writeLines(script, file)
  
  expect_error(extract_used_pkgs(file))
})

@parmsam-pfizer
Copy link
Collaborator Author

Great suggestion! That's exactly what we discussed at our last logrx team meeting. Thanks for sharing your code! We'll add this to our roadmap.

@nicholas-masel
Copy link
Collaborator

nicholas-masel commented Mar 6, 2025

@thomas-neitmann @parmsam-pfizer We finally got around to this feature and it falls over with base and default package usage. More details here: #222 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: 📋 Backlog
Development

No branches or pull requests

5 participants