request parser #99

kennedymwavu · 2025-02-03T12:10:52Z

ambiorix currently only has parse_multipart(), which is very slow for file uploads as noted in #65

my proposal is we create a request parser which can comfortably handle the 3 most common content types:

multipart/form-data
application/x-www-form-urlencoded
application/json

webutils::parse_http() instantly comes to mind. it has proven to be fast enough in the past for me.

following the comments in #95, we can:

make use of yyjsonr::read_json_raw() to parse "application/json".
allow user to specify custom parsers globally

once implemented:

no need to deprecate ambiorix::parse_multipart(), just make it use the new parser.

The text was updated successfully, but these errors were encountered:

kennedymwavu · 2025-02-03T12:56:48Z

for "application/x-www-form-urlencoded" content type, fields come in as query parameters and ambiorix handles that well. that means we only need to focus on "multipart/form-data" and "application/json".

kennedymwavu · 2025-02-03T13:06:22Z

orrrr... we can make use of webutils::parse_query() and replace this whole function:

ambiorix/R/request.R

Line 180 in d602a8a

.parse_query_string = function(query){

with:

    .parse_query_string = function(query) {
      is_empty <- is.null(query) || identical(query, character())
      if (is_empty) {
        return()
      }

      self$query <- webutils::parse_query(query)
      invisible()
    }

JohnCoene · 2025-02-03T13:59:51Z

This function is used for the query in the URL part of the request though I believe, not for application/x-www-form-urlencoded (I'm not sure how/if we support that).

But happy to replace the query parser too

kennedymwavu · 2025-02-03T15:28:49Z

This function is used for the query in the URL part of the request though I believe, not for application/x-www-form-urlencoded (I'm not sure how/if we support that).

yes, you're right, i've addressed that in the linked PR. you can test with this example:

Click to expand

devtools::load_all()
library(htmltools)
library(readxl)

page_links <- \() {
  Map(
    f = \(href, label) {
      tags$a(href = href, label)
    },
    c("/", "/about", "/contact"),
    c("Home", "About", "Contact")
  )
}

forms <- \() {
  form1 <- tags$form(
    action = "/url-form-encoded",
    method = "GET",
    enctype = "application/x-www-form-urlencoded",
    tags$h4("form-url-encoded:"),
    tags$label(`for` = "first_name", "First Name"),
    tags$input(id = "first_name", name = "first_name", value = "John"),
    tags$label(`for` = "last_name", "Last Name"),
    tags$input(id = "last_name", name = "last_name", value = "Coene"),
    tags$button(type = "submit", "Submit")
  )

  form2 <- tags$form(
    action = "/multipart-form-data",
    method = "POST",
    enctype = "multipart/form-data",
    tags$h4("multipart/form-data:"),
    tags$label(`for` = "email", "Email"),
    tags$input(id = "email", name = "email", value = "[email protected]"),
    tags$label(`for` = "framework", "Framework"),
    tags$input(id = "framework", name = "framework", value = "ambiorix"),
    tags$label(`for` = "file", "Upload CSV file"),
    tags$input(type = "file", id = "file", name = "file", accept = ".csv"),
    tags$label(`for` = "file2", "Upload xlsx file"),
    tags$input(type = "file", id = "file2", name = "file2", accept = ".xlsx"),
    tags$button(type = "submit", "Submit")
  )

  form3 <- tags$form(
    action = "/multipart-form-data2",
    method = "POST",
    enctype = "multipart/form-data",
    tags$h4("multipart/form-data (specific fields extracted & renamed):"),
    tags$label(`for` = "family_name", "Family Name"),
    tags$input(id = "family_name", name = "family_name", value = "the johns"),
    tags$label(`for` = "user_book", "User Book"),
    tags$input(id = "user_book", name = "user_book", value = "JavaScript for R"),
    tags$label(`for` = "user_age", "User Age"),
    tags$input(id = "user_age", name = "user_age", value = "15"),
    tags$button(type = "submit", "Submit")
  )

  tagList(form1, form2, form3)
}

home_get <- \(req, res) {
  html <- tagList(
    page_links(),
    tags$h3("hello, world!"),
    forms()
  )

  res$send(html)
}

url_form_encoded_get <- \(req, res) {
  query <- req$query
  list_items <- lapply(
    X = names(query),
    FUN = \(nm) {
      tags$li(
        nm,
        ":",
        query[[nm]]
      )
    }
  )
  input_vals <- tags$ul(list_items)

  html <- tagList(
    page_links(),
    tags$h3("Request processed"),
    input_vals
  )

  res$send(html)
}

multipart_form_data_post <- \(req, res) {
  body <- parse_req(req)

  list_items <- lapply(
    X = names(body),
    FUN = \(nm) {
      field <- body[[nm]]

      # if 'field' is a file, parse it & print on console:
      is_file <- "filename" %in% names(field)
      is_csv <- is_file && identical(field[["content_type"]], "text/csv")
      is_xlsx <- is_file &&
        identical(
          field[["content_type"]],
          "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
        )

      if (is_file) {
        file_path <- tempfile()
        writeBin(object = field$value, con = file_path)
        on.exit(unlink(x = file_path))
      }

      if (is_csv) {
        read.csv(file = file_path) |> print()
      }

      if (is_xlsx) {
        readxl::read_xlsx(path = file_path) |> print()
      }

      tags$li(
        nm,
        ":",
        if (is_file) "printed on console" else field
      )
    }
  )
  input_vals <- tags$ul(list_items)

  html <- tagList(
    page_links(),
    tags$h3("Request processed"),
    input_vals
  )

  res$send(html)
}

multipart_form_data_post2 <- \(req, res) {
  body <- parse_req(
    req,
    fields_to_extract = c("user_book", "user_age"),
    new_field_names = c("book", "age")
  )

  list_items <- lapply(
    X = names(body),
    FUN = \(nm) {
      tags$li(nm, ":", body[[nm]])
    }
  )
  input_vals <- tags$ul(list_items)

  html <- tagList(
    page_links(),
    tags$h3("Request processed, only these fields extracted & renamed:"),
    input_vals
  )

  res$send(html)
}

about_get <- \(req, res) {
  html <- tagList(
    page_links(),
    tags$h3("About Us")
  )
  res$send(html)
}

contact_get <- \(req, res) {
  html <- tagList(
    page_links(),
    tags$h3("Get In Touch!")
  )
  res$send(html)
}

home_post <- \(req, res) {
  body <- parse_req(req)
  response <- list(
    code = 200L,
    msg = "hello, world"
  )
  res$json(response)
}

app <- Ambiorix$new(port = 5000L)

app$
  get("/", home_get)$
  post("/", home_post)$
  get("/about", about_get)$
  get("/contact", contact_get)$
  get("/url-form-encoded", url_form_encoded_get)$
  post("/multipart-form-data", multipart_form_data_post)$
  post("/multipart-form-data2", multipart_form_data_post2)

app$start()

kennedymwavu · 2025-02-03T19:31:34Z

in retrospect, i don't think this point i mentioned earlier makes sense at all:

allow user to specify custom parsers globally

this is just the same as a user defining their own custom request parser and calling it in the request handlers: something ambiorix has allowed since the beginning. we even have it in a blog post: custom parsers!!!

JohnCoene · 2025-02-03T19:38:48Z

Ah hmmmm but it feels odd to me.

If I were to do it I'd likely want to have it as a Middleware: it's quite literally pre-processing the request.

But then as a global it makes more sense, i think.

the default parser still executes so in the example you gave (slow for files) it's bad :(
I would want the same parser across an entire app and not parse some JSON one way in one place and another way elsewhere.

I believe we should have sensible defaults (as we currently have but can be improved by webutils) but allow the "advanced" user to overwrite them globally.

What do you think?

kennedymwavu · 2025-02-03T19:48:11Z

ahaa! i see what you mean... so the user will still call ambiorix::parse_req() which would then:

use a globally set "data parser" if found, or...
proceed to use the default parser

JohnCoene · 2025-02-03T20:00:36Z

Ah yes, I'm not sure why I have that option for the JSON serialiser but didn't do deserialisers

But that's my opinion, what do you think?

kennedymwavu · 2025-02-03T20:53:41Z

totally agree! that's the way.
i've made an update on the linked PR. try it and lmk.

kennedymwavu self-assigned this Feb 3, 2025

kennedymwavu linked a pull request Feb 3, 2025 that will close this issue

ft: request parser #100

Merged

kennedymwavu mentioned this issue Feb 3, 2025

ft: request parser #100

Merged

JohnCoene closed this as completed in #100 Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

request parser #99

request parser #99

kennedymwavu commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

JohnCoene commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

JohnCoene commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

JohnCoene commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

request parser #99

request parser #99

Comments

kennedymwavu commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

JohnCoene commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

JohnCoene commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025

JohnCoene commented Feb 3, 2025

kennedymwavu commented Feb 3, 2025