Skip to content

Conversation

shssoichiro
Copy link
Contributor

@shssoichiro shssoichiro commented Aug 7, 2025

Description

  • Readds the rootUrl param (optional) to docs sites to match against for selecting subpages when crawling
  • If rootUrl is not specified, infer it from the startUrl
  • Use the rootUrl when crawling to include the correct pages that are part of the desired documentation

Fixes #3823

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created
  • [] The relevant tests, if any, have been updated or created

Screen recording or screenshot

N/A

Tests

TODO


Summary by cubic

Added an optional rootUrl parameter for docs sites to improve how subpages are selected during crawling. If rootUrl is not set, it is now inferred from startUrl for more accurate page inclusion.

  • New Features
    • rootUrl can be specified in docs config to control which pages are crawled.
    • Crawlers use rootUrl to match and include only relevant subpages.

private readonly maxRequestsPerCrawl: number,
private readonly maxDepth: number,
) {}

async crawl(): Promise<PageData[]> {
// TODO: How do we edit the remote crawler? I see there is a `control-plane` folder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened this as a draft because I'm stuck on this portion. I'm not sure where the code for the remote crawler lives, but it will need updated in order to use the root URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RomneyDa Would you happen to have any knowledge on the remote crawler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

Optional "rootUrl" option for crawling docs was useful and should be re-added
1 participant