Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: consider using APIv3 standard endpoints #356

Open
humitos opened this issue Aug 6, 2024 · 9 comments · May be fixed by #468
Open

API: consider using APIv3 standard endpoints #356

humitos opened this issue Aug 6, 2024 · 9 comments · May be fixed by #468
Assignees

Comments

@humitos
Copy link
Member

humitos commented Aug 6, 2024

Now that we are allowing public access to APIv3 endpoints, we can think about making different API queries using the common APIv3 endpoints to fetch small chunks of data instead doing one big call to the specific /_/addons/ one.

This would be transparent for our own JavaScript library and for users of the CustomEvent because it will be managed internally and the exact same JSON-like object will be returned.

As a example,

This may be something good to explore since I'm seeing some benefits on caching (not all of the responses will be purged on each new build) and also in maintenance as well, since it's going to be one less big endpoint to maintain over time. Besides, we will become users of our own APIv3 which will make us to keep improving it over time.

@humitos humitos added the Needed: design decision A core team decision is required label Aug 6, 2024
@github-project-automation github-project-automation bot moved this to Planned in 📍Roadmap Dec 2, 2024
@humitos humitos removed the Needed: design decision A core team decision is required label Dec 3, 2024
@humitos humitos self-assigned this Dec 3, 2024
@agjohnson
Copy link
Contributor

Noting:

  • This could stack up 5 or more API requests. We should test what timing would look like for all of these requests at once. We can probably even just mock up a test against production and a one off script
  • It looked like caching at CF and browser wasn't what I expected. I tested the project detail view endpoint and expected a cache hit.

@humitos
Copy link
Member Author

humitos commented Dec 5, 2024

I did a test for this using this script: https://gist.github.com/humitos/f7f5e48881868616ebb63b417887515d

Test performing multiple API requests using JavaScript Promises
Start Date: 1733400128907
End Date: 1733400129443

... stripped ...

Elapsed time in milliseconds: 536

Executing this function multiple times, I get times from 300ms to 600ms in the worst cases. Taking a look at NewRelic, the average time for current addons endpoint is 250ms and the median is 150ms. However, it seems that API responses are not cached by Cloudflare and they are always hitting our servers -- I see the header cf-cache-status: DYNAMIC instead of cf-cache-status: HIT as /_/addons/ endpoint returns. Note that we are getting almost the same result in the best case, so I think we will get the same time or even better if we enable cache on APIv3.

Another thing that I noticed is that APIv3 endpoints are not proxied via /_/api/v3/ and we cannot hit readthedocs.org from our documentation domains, so we will need to proxy the APIv3 endpoints for this to work. We are only proxying search and embed endpoints from APIv3.

Required changes

  • Proxy APIv3 endpoints to be consumed from documentation sites
  • Enable Cloudflare cache on APIv3 endpoints

Is there any reason why these two points aren't already implemented? cc @readthedocs/backend

@humitos humitos moved this from Planned to In progress in 📍Roadmap Dec 5, 2024
@stsewd
Copy link
Member

stsewd commented Dec 5, 2024

I think we should try to proxy only the required endpoints, we don't want to proxy all endpoints (bring the entire jungle just to get the banana), and also just expose read-only operations, as write/delete operations can be abused by an attacker if there is a vulnerability in the docs pages.

We also need to take into consideration permissions from sharing tokens on .com, and we shouldn't cache responses there, as we check for authn/authz on .com.

I'm +1 on splitting the addons API into several endpoints, but maybe just split by functionality rather than trying to re-implement all API v3 under docs pages.

We would also need to apply purge rules for each proxied endpoint on .org. Which also brings the question, are we really saving resources by splitting the cached resources (some ednpoints will still need to be purged frequently on some actions)? Is it worth the extra complexity (we now have to manage several caches)?

@humitos
Copy link
Member Author

humitos commented Dec 5, 2024

but maybe just split by functionality rather than trying to re-implement all API v3 under docs pages

I'm not saying re-implementing APIv3 on documentation pages, just expose the same endpoint/backend code under /_/api/v3/. I'm fine if it's read-only. That's what we need for now.

We would also need to apply purge rules for each proxied endpoint on .org. Which also brings the question, are we really saving resources by splitting the cached resources

We aren't caching APIv3 at all right now. We will have the same caching complexity when we add cache even if we don't use it for addons; so that's not extra work.

We will be saving a lot, since there will be a few endpoints that don't need to be purged if only the project has changed for example (eg. file tree diff, builds, versions, etc won't be purged). Otherwise, just by changing any field in any object, everything has to be purged (current behavior)

@stsewd
Copy link
Member

stsewd commented Dec 5, 2024

We aren't caching APIv3 at all right now. We will have the same caching complexity when we add cache even if we don't use it for addons; so that's not extra work.

And we shouldn't cache API v3, since it's served from our main application (we don't want to cache anything in our main application, as it serves dynamic content).

Otherwise, just by changing any field in any object, everything has to be purged (current behavior)

But we will still need to purge the other endpoints, we will be saving something like 1/3 of the response at the added complexity of tracking each action that will need to purge each endpoint instead of doing one purge per build. And I'm +1 on splitting things by functionality (but mostly to avoid having one big response), so things like FTD will be cached independently (but probably must endpoints will need to be purged on each build). This also makes me wonder how much are we really saving by not just purging on each build, unless lots of projects trigger builds every minute. Our current browser cache TTL is set to 20 min, and our edge cache TTL is two hours https://developers.cloudflare.com/cache/how-to/configure-cache-status-code/#edge-ttl, that means that each response is cached by 2 hours and 20 min max.

@humitos
Copy link
Member Author

humitos commented Dec 9, 2024

And we shouldn't cache API v3, since it's served from our main application (we don't want to cache anything in our main application, as it serves dynamic content).

I don't understand why you are saying we shouldn't cache APIv3. Can you expand on that? What are the cons you are seeing there? We can still serve uncached content from our application and cached content from our APIv3. I don't see any problem with that.

we will be saving something like 1/3 of the response
This also makes me wonder how much are we really saving by not just purging on each build

No, if you take a look at the example I shared, it perform 5 requests and only one depends on the build object. That means we will need to perform only 1 request when the build object changes; since all the other endpoints will be cached already. We will be saving 80% of the requests currently.

Our current browser cache TTL is set to 20 min, and our edge cache TTL is two hours

This is fine. Note this 2 hours cache will be shared between all the users; that means we will avoid a lot of requests hitting our servers. Besides, we can eventually increase that TTL if we want to save even more.

humitos added a commit that referenced this issue Dec 9, 2024
Initial implementation as a POC to give it a try. It works fine at first sight
and I think it could be a good idea to move forward with it. I didn't find any
blocker yet.

Closes #356
@humitos humitos linked a pull request Dec 9, 2024 that will close this issue
humitos added a commit to readthedocs/readthedocs.org that referenced this issue Dec 9, 2024
@stsewd
Copy link
Member

stsewd commented Dec 9, 2024

I don't understand why you are saying we shouldn't cache APIv3. Can you expand on that? What are the cons you are seeing there? We can still serve uncached content from our application and cached content from our APIv3. I don't see any problem with that.

We should cache read-only/static content, API v3 isn't read-only, we also serve dynamic content from our main application (content that depends on the user session). We can create very specific cache rules to avoid things like cache poisoning, but I'd just avoid caching content on our main application domain. We can cache content over docs domains (assuming we only expose read-only resources, and on .org only).

No, if you take a look at the example I shared, it perform 5 requests and only one depends on the build object.

Version objects do depend on build objects, fields like identifier (commit), built and downloads depend on the result of the latest build.

Like I said, I'm fine that we are splitting the addons response, but purging the cache should still happen after each build, trying to purge the cache over specific cases will add more complexity and not save us much at the end.

@humitos
Copy link
Member Author

humitos commented Dec 10, 2024

I'm fine that we are splitting the addons response

I understand you are OK proxing APIv3 and serving it under /_/api/v3/ from documentation domains, right? With that, I can start the implementation of this issue.

but purging the cache should still happen after each build, trying to purge the cache over specific cases will add more complexity and not save us much at the end.

I'm fine to start purging the cache after each build for now. We can make it more performant in the future if we want. We can continue the discussion around cache invalidation in https://github.com/readthedocs/meta/discussions/162.

@humitos
Copy link
Member Author

humitos commented Dec 10, 2024

Summarizing, the work required here so far is:

  • Expose /_/api/v3/ on documentation domains
  • Make it read-only
  • Enable Cloudflare cache
  • Invalidate cache after each build
  • Don't cache auth/authn requests on .com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

3 participants