Skip to content

feat: support new metrics firehose api with get_usage() #404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .lintr
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
linters: linters_with_defaults(
line_length_linter = line_length_linter(120L),
object_name_linter = object_name_linter(styles = c("snake_case", "symbols", "CamelCase")),
cyclocomp_linter = cyclocomp_linter(30L),
cyclocomp_linter = NULL, # Issues with R6 classes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also get rid of any::cyclocomp here too, yeah?

extra-packages: local::., any::lintr, any::devtools, any::testthat, any::cyclocomp
needs: lint

object_length_linter(32L),
indentation_linter = indentation_linter(hanging_indent_style = "tidy"),
return_linter = NULL
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ export(get_tag_data)
export(get_tags)
export(get_thumbnail)
export(get_timezones)
export(get_usage)
export(get_usage_shiny)
export(get_usage_static)
export(get_user_permission)
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# connectapi (development version)

## New features

- New `get_usage()` function returns content usage data from Connect's `GET
v1/instrumentation/content/hits` endpoint on Connect v2025.04.0 and higher.
(#390)

## Enhancements and fixes

Expand Down
27 changes: 27 additions & 0 deletions R/connect.R
Original file line number Diff line number Diff line change
Expand Up @@ -818,6 +818,33 @@ Connect <- R6::R6Class(
self$GET(path, query = query)
},

#' @description Get content usage data.
#' @param from Optional `Date` or `POSIXt`; start of the time window. If a
#' `Date`, coerced to `YYYY-MM-DDT00:00:00` in the caller's time zone.
#' @param to Optional `Date` or `POSIXt`; end of the time window. If a
#' `Date`, coerced to `YYYY-MM-DDT23:59:59` in the caller's time zone.
Comment on lines +821 to +825
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is doing what we expect with timezones. And actually, I'm not even totally sure what is intended. So maybe we should work that out here and then adapt the code to fit? When we send timestamps to Connect with this function do we want them to be transformed to UTC from the caller's local timezone before being sent? Or some other behavior?

One thing to note: If I'm reading this correctly, make_timestamp() has slightly different behavior if one sends a non-character than if one sends a character input. Where the character string version will not be parsed and also not be transformed into UTC. And so this function will do the same.

connectapi/R/parse.R

Lines 16 to 28 in e8c8075

make_timestamp <- function(input) {
if (is.character(input)) {
# TODO: make sure this is the right timestamp format
return(input)
}
# In the call to `safe_format`:
# - The format specifier adds a literal "Z" to the end of the timestamp, which
# tells Connect "This is UTC".
# - The `tz` argument tells R to produce times in the UTC time zone.
# - The `usetz` argument says "Don't concatenate ' UTC' to the end of the string".
safe_format(input, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC", usetz = FALSE)
}

inst_content_hits = function(from = NULL, to = NULL) {
error_if_less_than(self$version, "2025.04.0")

# If this is called with date objects with no timestamp attached, it's
# reasonable to assume that the caller is indicating the days as an
# inclusive range.
if (inherits(from, "Date")) {
from <- as.POSIXct(paste(from, "00:00:00"))
}
if (inherits(to, "Date")) {
to <- as.POSIXct(paste(to, "23:59:59"))
}

self$GET(
v1_url("instrumentation", "content", "hits"),
query = list(
from = make_timestamp(from),
to = make_timestamp(to)
)
)
},

#' @description Get running processes.
procs = function() {
warn_experimental("procs")
Expand Down
66 changes: 66 additions & 0 deletions R/get.R
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,72 @@ get_usage_static <- function(
return(out)
}

#' Get usage information for deployed content
#'
#' @description

#' Retrieve content hits for all available content on the server. Available
#' content depends on the user whose API key is in use. Administrator accounts
#' will receive data for all content on the server. Publishers will receive data
#' for all content they own or collaborate on.
#'
#' If no date-times are provided, all usage data will be returned.

#' @param client A `Connect` R6 client object.
#' @param from Optional `Date` or date-time (`POSIXct` or `POSIXlt`). Only
#' records after this time are returned. If a `Date`, treated as the start of
#' that day in the local time zone; if a date-time, used verbatim.
#' @param to Optional `Date` or date-time (`POSIXct` or `POSIXlt`). Only records
#' before this time are returned. If a `Date`, treated as end of that day
#' (`23:59:59`) in the local time zone; if a date-time, used verbatim.
#'
#' @return A tibble with columns:
#' * `id`: An identifier for the record.
#' * `user_guid`: The GUID of logged-in visitors, NA for anonymous.
#' * `content_guid`: The GUID of the content.
#' * `timestamp`: The time of the hit as `POSIXct`.
#' * `path`: The path of the hit. Not recorded for all content types.
#' * `user_agent`: If available, the user agent string for the hit. Not
#' available for all records.
#'
#' @details
#'
#' The data returned by `get_usage()` includes all content types. For Shiny
#' content, the `timestamp` indicates the *start* of the Shiny session.
#' Additional fields for Shiny and non-Shiny are available respectively from
#' `get_usage_shiny()` and `get_usage_static()`.
#'
#' When possible, however, we recommend using `get_usage()` over
#' `get_usage_static()` or `get_usage_shiny()`, as it will be much faster for
#' large datasets.
#'
#' @examples
#' \dontrun{
#' client <- connect()
#'
#' # Fetch the last 2 days of hits
#' usage <- get_usage(client, from = Sys.Date() - 2, to = Sys.Date())
#'
#' # Fetch usage after a specified date
#' usage <- get_usage(
#' client,
#' from = as.POSIXct("2025-05-02 12:40:00", tz = "UTC")
#' )
#'
#' # Fetch all usage
#' usage <- get_usage(client)
#' }
#'
#' @export
get_usage <- function(client, from = NULL, to = NULL) {
usage_raw <- client$inst_content_hits(
from = from,
to = to
)

usage <- parse_connectapi_typed(usage_raw, connectapi_ptypes$usage)
fast_unnest_character(usage, "data")
}

#' Get Audit Logs from Posit Connect Server
#'
Expand Down
63 changes: 63 additions & 0 deletions R/parse.R
Original file line number Diff line number Diff line change
Expand Up @@ -58,15 +58,19 @@ ensure_column <- function(data, default, name) {
# manual fix because vctrs::vec_cast cannot cast double -> datetime or char -> datetime
col <- coerce_datetime(col, default, name = name)
}

if (inherits(default, "fs_bytes") && !inherits(col, "fs_bytes")) {
col <- coerce_fsbytes(col, default)
}

if (inherits(default, "integer64") && !inherits(col, "integer64")) {
col <- bit64::as.integer64(col)
}

if (inherits(default, "list") && !inherits(col, "list")) {
col <- list(col)
}

col <- vctrs::vec_cast(col, default, x_arg = name)
}
data[[name]] <- col
Expand Down Expand Up @@ -101,6 +105,65 @@ parse_connectapi <- function(data) {
))
}

# nolint start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What linting are we escaping here?

# Unnests a list column similarly to `tidyr::unnest_wider()`, bringing the
# entries of each list-item up to the top level. Makes some simplifying
# assumptions for the sake of performance:
# 1. All inner variables are treated as character vectors;
# 2. The names of the first entry of the list-column are used as the
# names of variables to extract.
# Performance example:
# > nrow(x_raw)
# [1] 373632
# > nrow(x_raw)
# [1] 373632
# > t_tidyr <- system.time(
# + x_tidyr <- tidyr::unnest_wider(x_raw, data)
# + )
# > t_custom <- system.time(
# + x_custom <- fast_unnest_character(x_raw, "data")
# + )
# > identical(x_tidyr, x_custom)
# [1] TRUE
# > t_tidyr
# user system elapsed
# 7.018 0.137 7.172
# > t_custom
# user system elapsed
# 0.281 0.005 0.285
# nolint end
fast_unnest_character <- function(df, col_name) {
if (!is.character(col_name)) {
stop("col_name must be a character vector")
}
if (!col_name %in% names(df)) {
stop("col_name is not present in df")
}

list_col <- df[[col_name]]

new_cols <- names(list_col[[1]])

df2 <- df
for (col in new_cols) {
df2[[col]] <- vapply(
list_col,
function(row) {
if (is.null(row[[col]])) {
NA_character_
} else {
row[[col]]
}
},
"1",
USE.NAMES = FALSE
)
}

df2[[col_name]] <- NULL
df2
}
Comment on lines +135 to +165
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data returned from the endpoint includes path and user_agent fields nested under a data field. Without special treatment these are returned a list-column, which is awkward. I initially experimented with tidyr::unnest(), but that was slow on the larger datasets returned by this endpoint, so I wrote a custom fast_unnest_character() function which runs in about 5% (!) of the time that tidyr::unnest() takes.

Thinking about this a bit more: this isn't a huge chunk of code of course, but it is another chunk that we will take on the maintenance of if we go this route. This is another example where having our data interchange within connectapi be all data frames means we have to worry about the performance of json-parsed list responses into data frames and make sure those data frames are in a natural structure for folks to use. If we relied instead on only the parsed list data as our interchange and then gave folks as.data.frame() methods, we could defer the (sometimes expense) reshaping until late in process and eaking out performance gains like this are much less important so we can rely on more off the shelf tools.


coerce_fsbytes <- function(x, to, ...) {
if (is.numeric(x)) {
fs::as_fs_bytes(x)
Expand Down
9 changes: 8 additions & 1 deletion R/ptype.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
NA_datetime_ <- # nolint: object_name_linter
vctrs::new_datetime(NA_real_, tzone = "UTC")
vctrs::new_datetime(NA_real_, tzone = Sys.timezone())
NA_list_ <- # nolint: object_name_linter
list(list())

Expand Down Expand Up @@ -38,6 +38,13 @@ connectapi_ptypes <- list(
"bundle_id" = NA_character_,
"data_version" = NA_integer_
),
usage = tibble::tibble(
"id" = NA_integer_,
"user_guid" = NA_character_,
"content_guid" = NA_character_,
"timestamp" = NA_datetime_,
"data" = NA_list_
),
content = tibble::tibble(
"guid" = NA_character_,
"name" = NA_character_,
Expand Down
22 changes: 22 additions & 0 deletions man/PositConnect.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

67 changes: 67 additions & 0 deletions man/get_usage.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
[
{
"id": 8966707,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T12:49:16.269904Z",
"data": {
"path": "/hello",
"user_agent": "Datadog/Synthetics"
}
},
{
"id": 8966708,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T12:49:17.002848Z",
"data": {
"path": "/world",
"user_agent": null
}
},
{
"id": 8967206,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T13:01:47.40738Z",
"data": {
"path": "/chinchilla",
"user_agent": "Datadog/Synthetics"
}
},
{
"id": 8967210,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T13:04:13.176791Z",
"data": {
"path": "/lava-lamp",
"user_agent": "Datadog/Synthetics"
}
},
{
"id": 8966214,
"user_guid": "fecbd383",
"content_guid": "b0eaf295",
"timestamp": "2025-04-30T12:36:13.818466Z",
"data": {
"path": null,
"user_agent": null
}
}
]
Loading