Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] API Cache Control Options #5035

Open
RayBB opened this issue Oct 29, 2024 · 5 comments
Open

[Feature request] API Cache Control Options #5035

RayBB opened this issue Oct 29, 2024 · 5 comments
Labels
feature-request Request of a new feature

Comments

@RayBB
Copy link

RayBB commented Oct 29, 2024

Is your feature request related to a problem? Please describe.

I want to collect data from YouTube videos via the Invidious API, specifically tracking metrics like like counts at regular intervals. However, the current caching behavior makes this difficult as there is no documented way to force fresh data retrieval (aka cache busting) or configure cache duration.

Describe the solution you'd like

I propose implementing one or more of the following solutions:

  1. Add support for cache control headers in the API to allow clients to request fresh data when needed
  2. Provide configuration options for cache duration
  3. Document the existing caching behavior and any available cache control mechanisms

Describe alternatives you've considered

The main alternative would be to extract YouTube data scraping, bypassing Invidious entirely. However, this would be difficult to keep up with for a small side proejct.

Additional context

For public Invidious instances, any cache control features should probably be disabled by default to prevent abuse and maintain performance. The feature would primarily benefit private instances and specific use cases requiring real-time or near-real-time data.

@RayBB RayBB added the feature-request Request of a new feature label Oct 29, 2024
@unixfox
Copy link
Member

unixfox commented Oct 29, 2024

Sounds like we could allow setting the value for force_refresh

def get_video(id, refresh = true, region = nil, force_refresh = false)
in the URL like we have other parameters: https://docs.invidious.io/url-parameters/

The big drawback is that this may allow some for abuse on the public instances. But I guess if one allows the API publicly, one should consider that people will abuse it.

It seems like an easy feature to implement, would you want to implement it @RayBB?

@RayBB
Copy link
Author

RayBB commented Oct 29, 2024

I think it should be disabled by default and configurable via an environment variable since it's probably not useful to the majority of hosters. Do you agree?

I'm guessing it would be a change here to effect the videos API?

id = env.params.url["id"]
region = env.params.query["region"]?
proxy = {"1", "true"}.any? &.== env.params.query["local"]?
begin
video = get_video(id, region: region)

Or would we want it implemented across all the routes, in which case maybe the code would go elsewhere? @unixfox

@unixfox
Copy link
Member

unixfox commented Oct 30, 2024

ping @syeopite and @SamantazFox for some feedback

Maybe we could instead have force_refresh being activated using a crystal build flag. This way, we don't introduce yet another new parameter just for that.

The use case is quite niche.

@SamantazFox
Copy link
Member

#4441 should add the required parameter on the API side to bypass the cache. Now the question of putting that behind some config or compilation flag is interesting, but complicated. How can the API client know whether or not this was respected? The API should inform the client about it in soem way.

@RayBB If you are not worried to compile from source, you can use the patch below to completely remove the video cache:

Details

diff --git a/src/invidious/videos.cr b/src/invidious/videos.cr
index c218b4ef..de1cb57c 100644
--- a/src/invidious/videos.cr
+++ b/src/invidious/videos.cr
@@ -363,31 +363,6 @@ struct Video
 end
 
 def get_video(id, refresh = true, region = nil, force_refresh = false)
-  if (video = Invidious::Database::Videos.select(id)) && !region
-    # If record was last updated over 10 minutes ago, or video has since premiered,
-    # refresh (expire param in response lasts for 6 hours)
-    if (refresh &&
-       (Time.utc - video.updated > 10.minutes) ||
-       (video.premiere_timestamp.try &.< Time.utc)) ||
-       force_refresh ||
-       video.schema_version != Video::SCHEMA_VERSION # cache control
-      begin
-        video = fetch_video(id, region)
-        Invidious::Database::Videos.update(video)
-      rescue ex
-        Invidious::Database::Videos.delete(id)
-        raise ex
-      end
-    end
-  else
-    video = fetch_video(id, region)
-    Invidious::Database::Videos.insert(video) if !region
-  end
-
-  return video
-rescue DB::Error
-  # Avoid common `DB::PoolRetryAttemptsExceeded` error and friends
-  # Note: All DB errors inherit from `DB::Error`
   return fetch_video(id, region)
 end
 

@iBicha
Copy link
Contributor

iBicha commented Nov 11, 2024

#4441 should add the required parameter on the API side to bypass the cache. Now the question of putting that behind

This is not accurate, my PR makes it possible to rely on the cache more than the default, not less.

def get_video(id, refresh = true, region = nil, force_refresh = false)

refresh is default to true - which is saying if there's cache, validate its freshness.
When you pass refresh=false, you're saying you don't care how old the cache is, just get it. Which is what I needed, because I wanted the title and other metadata of the video, so I don't care how old this information is, it's usually good enough.

What @RayBB I believe wants is the option to pass force_refresh=true, which defaults to false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request of a new feature
Projects
None yet
Development

No branches or pull requests

4 participants