Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request parser #99

Closed
kennedymwavu opened this issue Feb 3, 2025 · 9 comments · Fixed by #100
Closed

request parser #99

kennedymwavu opened this issue Feb 3, 2025 · 9 comments · Fixed by #100
Assignees

Comments

@kennedymwavu
Copy link
Contributor

ambiorix currently only has parse_multipart(), which is very slow for file uploads as noted in #65

my proposal is we create a request parser which can comfortably handle the 3 most common content types:

  • multipart/form-data
  • application/x-www-form-urlencoded
  • application/json

webutils::parse_http() instantly comes to mind. it has proven to be fast enough in the past for me.

following the comments in #95, we can:

  • make use of yyjsonr::read_json_raw() to parse "application/json".
  • allow user to specify custom parsers globally

once implemented:

  • no need to deprecate ambiorix::parse_multipart(), just make it use the new parser.
@kennedymwavu kennedymwavu self-assigned this Feb 3, 2025
@kennedymwavu
Copy link
Contributor Author

for "application/x-www-form-urlencoded" content type, fields come in as query parameters and ambiorix handles that well. that means we only need to focus on "multipart/form-data" and "application/json".

@kennedymwavu
Copy link
Contributor Author

orrrr... we can make use of webutils::parse_query() and replace this whole function:

.parse_query_string = function(query){

with:

    .parse_query_string = function(query) {
      is_empty <- is.null(query) || identical(query, character())
      if (is_empty) {
        return()
      }

      self$query <- webutils::parse_query(query)
      invisible()
    }

@JohnCoene
Copy link
Collaborator

This function is used for the query in the URL part of the request though I believe, not for application/x-www-form-urlencoded (I'm not sure how/if we support that).

But happy to replace the query parser too

@kennedymwavu kennedymwavu linked a pull request Feb 3, 2025 that will close this issue
@kennedymwavu
Copy link
Contributor Author

This function is used for the query in the URL part of the request though I believe, not for application/x-www-form-urlencoded (I'm not sure how/if we support that).

yes, you're right, i've addressed that in the linked PR. you can test with this example:

Click to expand
devtools::load_all()
library(htmltools)
library(readxl)

page_links <- \() {
  Map(
    f = \(href, label) {
      tags$a(href = href, label)
    },
    c("/", "/about", "/contact"),
    c("Home", "About", "Contact")
  )
}

forms <- \() {
  form1 <- tags$form(
    action = "/url-form-encoded",
    method = "GET",
    enctype = "application/x-www-form-urlencoded",
    tags$h4("form-url-encoded:"),
    tags$label(`for` = "first_name", "First Name"),
    tags$input(id = "first_name", name = "first_name", value = "John"),
    tags$label(`for` = "last_name", "Last Name"),
    tags$input(id = "last_name", name = "last_name", value = "Coene"),
    tags$button(type = "submit", "Submit")
  )

  form2 <- tags$form(
    action = "/multipart-form-data",
    method = "POST",
    enctype = "multipart/form-data",
    tags$h4("multipart/form-data:"),
    tags$label(`for` = "email", "Email"),
    tags$input(id = "email", name = "email", value = "[email protected]"),
    tags$label(`for` = "framework", "Framework"),
    tags$input(id = "framework", name = "framework", value = "ambiorix"),
    tags$label(`for` = "file", "Upload CSV file"),
    tags$input(type = "file", id = "file", name = "file", accept = ".csv"),
    tags$label(`for` = "file2", "Upload xlsx file"),
    tags$input(type = "file", id = "file2", name = "file2", accept = ".xlsx"),
    tags$button(type = "submit", "Submit")
  )

  form3 <- tags$form(
    action = "/multipart-form-data2",
    method = "POST",
    enctype = "multipart/form-data",
    tags$h4("multipart/form-data (specific fields extracted & renamed):"),
    tags$label(`for` = "family_name", "Family Name"),
    tags$input(id = "family_name", name = "family_name", value = "the johns"),
    tags$label(`for` = "user_book", "User Book"),
    tags$input(id = "user_book", name = "user_book", value = "JavaScript for R"),
    tags$label(`for` = "user_age", "User Age"),
    tags$input(id = "user_age", name = "user_age", value = "15"),
    tags$button(type = "submit", "Submit")
  )

  tagList(form1, form2, form3)
}

home_get <- \(req, res) {
  html <- tagList(
    page_links(),
    tags$h3("hello, world!"),
    forms()
  )

  res$send(html)
}

url_form_encoded_get <- \(req, res) {
  query <- req$query
  list_items <- lapply(
    X = names(query),
    FUN = \(nm) {
      tags$li(
        nm,
        ":",
        query[[nm]]
      )
    }
  )
  input_vals <- tags$ul(list_items)

  html <- tagList(
    page_links(),
    tags$h3("Request processed"),
    input_vals
  )

  res$send(html)
}

multipart_form_data_post <- \(req, res) {
  body <- parse_req(req)

  list_items <- lapply(
    X = names(body),
    FUN = \(nm) {
      field <- body[[nm]]

      # if 'field' is a file, parse it & print on console:
      is_file <- "filename" %in% names(field)
      is_csv <- is_file && identical(field[["content_type"]], "text/csv")
      is_xlsx <- is_file &&
        identical(
          field[["content_type"]],
          "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
        )

      if (is_file) {
        file_path <- tempfile()
        writeBin(object = field$value, con = file_path)
        on.exit(unlink(x = file_path))
      }

      if (is_csv) {
        read.csv(file = file_path) |> print()
      }

      if (is_xlsx) {
        readxl::read_xlsx(path = file_path) |> print()
      }

      tags$li(
        nm,
        ":",
        if (is_file) "printed on console" else field
      )
    }
  )
  input_vals <- tags$ul(list_items)

  html <- tagList(
    page_links(),
    tags$h3("Request processed"),
    input_vals
  )

  res$send(html)
}

multipart_form_data_post2 <- \(req, res) {
  body <- parse_req(
    req,
    fields_to_extract = c("user_book", "user_age"),
    new_field_names = c("book", "age")
  )

  list_items <- lapply(
    X = names(body),
    FUN = \(nm) {
      tags$li(nm, ":", body[[nm]])
    }
  )
  input_vals <- tags$ul(list_items)

  html <- tagList(
    page_links(),
    tags$h3("Request processed, only these fields extracted & renamed:"),
    input_vals
  )

  res$send(html)
}

about_get <- \(req, res) {
  html <- tagList(
    page_links(),
    tags$h3("About Us")
  )
  res$send(html)
}

contact_get <- \(req, res) {
  html <- tagList(
    page_links(),
    tags$h3("Get In Touch!")
  )
  res$send(html)
}

home_post <- \(req, res) {
  body <- parse_req(req)
  response <- list(
    code = 200L,
    msg = "hello, world"
  )
  res$json(response)
}

app <- Ambiorix$new(port = 5000L)

app$
  get("/", home_get)$
  post("/", home_post)$
  get("/about", about_get)$
  get("/contact", contact_get)$
  get("/url-form-encoded", url_form_encoded_get)$
  post("/multipart-form-data", multipart_form_data_post)$
  post("/multipart-form-data2", multipart_form_data_post2)

app$start()

@kennedymwavu
Copy link
Contributor Author

in retrospect, i don't think this point i mentioned earlier makes sense at all:

allow user to specify custom parsers globally

this is just the same as a user defining their own custom request parser and calling it in the request handlers: something ambiorix has allowed since the beginning. we even have it in a blog post: custom parsers!!!

@JohnCoene
Copy link
Collaborator

Ah hmmmm but it feels odd to me.

If I were to do it I'd likely want to have it as a Middleware: it's quite literally pre-processing the request.

But then as a global it makes more sense, i think.

  1. the default parser still executes so in the example you gave (slow for files) it's bad :(
  2. I would want the same parser across an entire app and not parse some JSON one way in one place and another way elsewhere.

I believe we should have sensible defaults (as we currently have but can be improved by webutils) but allow the "advanced" user to overwrite them globally.

What do you think?

@kennedymwavu
Copy link
Contributor Author

ahaa! i see what you mean... so the user will still call ambiorix::parse_req() which would then:

  • use a globally set "data parser" if found, or...
  • proceed to use the default parser

@JohnCoene
Copy link
Collaborator

Ah yes, I'm not sure why I have that option for the JSON serialiser but didn't do deserialisers

But that's my opinion, what do you think?

@kennedymwavu
Copy link
Contributor Author

totally agree! that's the way.
i've made an update on the linked PR. try it and lmk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants