-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Summary
Add a toggle to enable crawling and indexing subpages when scraping a URL, with tabbed display for each subpage's content. Useful for comprehensive indexing of documentation sites like vercel.com/docs.
Problem
Currently, indexing a URL scrapes only the specific page's content, excluding subpages. This limits utility for sites with distributed content, such as documentation or blogs.
Proposed Solution
Add a boolean toggle in the URL input dialog to enable subpage crawling.
Update scraping logic to crawl and index subpages if toggled.
Modify the content display UI to use tabs for each subpage, replacing the single content container.
Alternatives Considered
Implement a separate "crawl site" feature instead of integrating into URL indexing.
Use a third-party crawling service for subpage discovery.
Limit to manual subpage selection rather than automatic crawling.
Additional Context
UI could feature tabs labeled by subpage path (e.g., /docs/api, /docs/guides) for easy navigation of indexed content.
Happy to start getting to work on this myself but looking to hear thoughts