fix(website): improve agent readability score by aidenybai · Pull Request #330 · aidenybai/react-grab

aidenybai · 2026-05-09T05:18:49Z

Addresses the Vercel agent-readability spec audit findings on react-grab.com.

What changed

"Can agents find you?"

llms.txt: rewritten to follow the llmstxt.org format with an H1 title, blockquote summary, and [name](url) markdown link sections. The previous detailed install content moved to a new /llms-full.txt (linked from llms.txt).
/sitemap.md: new dynamic route that lists every page with its HTML and .md URL so agents can crawl without parsing XML.
JSON-LD: WebSite + Organization + SoftwareApplication graph injected into <head> of every page so agents can extract title, description, logo, and schema-typed metadata without DOM parsing.

"Can agents read you?"

proxy.ts (Next.js 16's renamed middleware):
- Detects AI user-agents (ChatGPT, Claude, Perplexity, Cursor, GPTBot, OAI-SearchBot, Anthropic, OpenCode, aider, CCBot, Bytespider, Amazonbot, Applebot-Extended, Diffbot, MistralAI, YouBot, Cohere) and Accept: text/markdown headers, then rewrites to the corresponding .md mirror.
- Returns 200 + markdown (instead of HTML 404) for unknown URLs requested by agents, since agents discard 404 bodies.
- Appends Vary: Accept, User-Agent only on proxy responses (HTML pass-through and markdown rewrites). Static assets keep their default headers.
Markdown mirrors at {url}.md:
- /index.md and /privacy.md as static files in public/.
- /changelog.md and /sitemap.md as Next.js route handlers (dynamic from CHANGELOG.md and the app/ directory).
- /404.md for the agent-friendly missing-page response.
- Content-Type: text/markdown; charset=utf-8 and Access-Control-Allow-Origin: * headers configured in next.config.ts for every .md URL.
<link rel="alternate" type="text/markdown">: each page emits its own page-specific alternate (/, /privacy, /changelog → index.md, privacy.md, changelog.md) via metadata.alternates.types. A single global <link> to /llms.txt lives in the layout.

"Is your HTML agent-friendly?"

Hierarchical headings added on all three indexed pages:
- Homepage: visually-hidden h1 → h2 → h3 over the existing demo so agents can chunk the page.
- Privacy: promoted each section heading from <p> to <h2> (already had <h1>).
- Changelog: "Changelog" promoted to <h1>, version to <h2>, change type to <h3> (skipped when the entry has no change type so the heading outline stays clean).
Canonical URL added to every page's metadata (alternates.canonical + metadataBase).
Skip-to-content link added to layout for keyboard users.

Verification

Tested locally with next start and against the deployed Vercel preview:

GET /llms.txt                              → llmstxt-spec markdown with link sections
GET /index.md / /privacy.md / /changelog.md → 200 text/markdown
GET /sitemap.md                             → 200 text/markdown
GET / (Accept: text/markdown)               → 200 → rewritten to /index.md
GET / (User-Agent: ChatGPT)                 → 200 → rewritten to /index.md
GET /privacy (Accept: text/markdown)        → 200 → rewritten to /privacy.md
GET /foobar (Accept: text/markdown)         → 200 → rewritten to /404.md (not 404)
GET /script.js / /logo.png                  → no Vary: User-Agent, CDN cache intact
GET / (browser)                             → HTML with h1, JSON-LD, canonical, alternate

pnpm typecheck, pnpm lint, pnpm format all pass; production next build succeeds; lint/build/test-build/test-cli/test-e2e/typecheck CI all green.

Bot review notes

Cursor Bugbot flagged an empty <h3> on changelog when entry.changeType is "", a duplicate <link rel="alternate" type="text/markdown"> from the root layout, an over-broad Vary: Accept, User-Agent on /:path* defeating CDN caching for static assets, a duplicate www → apex redirect across proxy.ts and next.config.ts, and (high severity) that a proxy-level www → apex redirect would create an infinite loop while the Vercel dashboard still redirects apex → www. All five fixed.
cubic-dev-ai flagged the <Script> snippet in llms-full.txt was missing its import Script from "next/script" line. Fixed in both llms-full.txt and the pre-existing install.md.
Vercel Agent Review suggested explicitly excluding the sitemap.md and changelog.md route folders from sitemap generation. The existing if (entry.includes(".")) filter already excludes them, so leaving as-is. The same review's other comment ("proxy.ts not executed") was incorrect — verified the deployed preview returns markdown to agent UAs through the proxy (x-matched-path: /index.md).

Manual follow-up (cannot be done from code)

The audit's "Redirect behavior cross-host redirect to www.react-grab.com" finding can only be addressed in the Vercel dashboard: in the project's Domains settings, mark react-grab.com (apex) as canonical so requests to react-grab.com no longer 30x to www.react-grab.com. A code-level www → apex redirect is intentionally not added — Vercel's edge redirect runs before the proxy, so a proxy redirect in the opposite direction would create an infinite loop and take the site down.

The remaining audit finding — homepage/changelog HTML over the 100 KB page size threshold — is mitigated by the markdown alternates: agent UAs no longer receive HTML for those pages. Reducing the HTML further would require turning off experimental.inlineCss, which costs a render-blocking round-trip for human visitors.

Summary by cubic

Improve react-grab.com agent readability by serving markdown to agents, adding structured metadata, and tightening semantics to align with Vercel’s agent-readability spec. Agents now get clean, crawlable content with better discovery and fewer dead ends.

New Features
- Markdown delivery: .md mirrors for key pages (/index.md, /privacy.md, /changelog.md, /sitemap.md) plus /404.md; a proxy detects agent user-agents or Accept: text/markdown and rewrites; unknown URLs return 200 markdown; Content-Type/CORS set on .md and llms*.txt.
- Discoverability: llms.txt rewritten to llmstxt.org format; added llms-full.txt; injected JSON-LD (WebSite, Organization, SoftwareApplication); canonical URLs and <link rel="alternate" type="text/markdown"> scoped per page.
- Semantics: <main> landmarks, hierarchical headings on Home/Privacy/Changelog, and a skip-to-content link for better parsing and navigation.
- Deployment note: In Vercel → Domains, set react-grab.com (apex) as canonical to resolve the cross-host redirect finding.
Bug Fixes
- Metadata scoping: moved the homepage canonical and index.md alternate from the root layout into Home; added a canonical and noindex, nofollow to /open-file to prevent incorrect inheritance.
- Alternates: removed the root layout’s hardcoded /index.md alternate; each page now declares its own markdown alternate; a single global alternate for /llms.txt remains.
- Caching: scoped Vary: Accept, User-Agent to proxy responses only to avoid fragmenting CDN caches for static assets.
- Redirects: removed duplicate www → apex rule from next.config.ts and dropped the proxy-level redirect to avoid apex↔www loops with Vercel’s edge redirect.
- Changelog: skip empty <h3> when an entry has no change type.
- Docs: add missing next/script import in the Next.js Pages Router snippet in llms-full.txt and install.md.
- Refactor: extracted shared route discovery to utils/discover-page-routes and reused in sitemap.ts and /sitemap.md to keep exclusions consistent.

^{Written for commit 903f7b9. Summary will update on new commits.}

- Move detailed install instructions into llms-full.txt - Make llms.txt a curated link list with H1, blockquote summary, and markdown link sections per the llmstxt.org spec Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

- /index.md, /privacy.md as static markdown mirrors - /changelog.md and /sitemap.md as dynamic route handlers - /404.md fallback markdown for missing-page agent responses Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

- Detect AI user-agents (ChatGPT, Claude, Cursor, Perplexity, GPTBot, etc) and Accept: text/markdown header, then rewrite to the page's markdown mirror - Return 200 with /404.md (instead of an HTML 404 body) for unknown URLs requested by agents, since agents discard 404 bodies - www.react-grab.com -> react-grab.com permanent redirect to avoid cross-host redirects (still requires Vercel apex domain to be marked canonical for the audit to pass) - Set Content-Type: text/markdown and CORS headers on all .md and llms*.txt responses - Add Vary: Accept, User-Agent so caches respect content negotiation Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

- WebSite + Organization + SoftwareApplication JSON-LD so agents can extract structured data without DOM parsing - alternates.canonical and metadataBase so duplicates are not indexed - <link rel=alternate type=text/markdown> pointing at /index.md and /llms.txt - Skip-to-content link for keyboard users Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

- Homepage: wrap in <main id=main-content> with screen-reader h1/h2/h3 so agents can chunk content - Privacy: promote each section title from <p> to <h2>, switch wrapper to <main> - Changelog: switch "Changelog" label to <h1>, version to <h2>, change type to <h3>, switch wrapper to <main> - Add canonical URL and text/markdown alternate to /privacy and /changelog metadata Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

vercel · 2026-05-09T05:18:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
react-grab-storybook	Ready	Preview, Comment	May 9, 2026 6:44am
react-grab-website	Ready	Preview, Comment	May 9, 2026 6:44am

pkg-pr-new · 2026-05-09T05:19:47Z

Open in StackBlitz

npm i https://pkg.pr.new/aidenybai/react-grab/@react-grab/cli@330

npm i https://pkg.pr.new/aidenybai/react-grab/grab@330

npm i https://pkg.pr.new/aidenybai/react-grab/@react-grab/mcp@330

npm i https://pkg.pr.new/aidenybai/react-grab@330

commit: 903f7b9

cubic-dev-ai

1 issue found across 13 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/website/public/llms-full.txt">

<violation number="1" location="apps/website/public/llms-full.txt:126">
P2: The Next.js Pages Router snippet uses `<Script>` without importing it from `next/script`, so the documented example will fail when copied.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

vercel · 2026-05-09T05:38:46Z

@@ -0,0 +1,85 @@
+import { NextResponse, type NextRequest } from "next/server";


Middleware not being executed - proxy.ts logic should be moved to middleware.ts with default export

parseChangelog initializes changeType to "" for versions without a ### line. The HTML page rendered an empty <h3> in that case, breaking the heading outline. The markdown route already guarded with if (entry.changeType); now the page does too. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

The Pages Router example used <Script> without importing it from next/script. Anyone copying the snippet would get a ReferenceError. Add the import in both llms-full.txt and install.md. Reported by cubic for llms-full.txt; install.md was carrying the same pre-existing bug. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

The layout had a hardcoded <link rel=alternate type=text/markdown href=/index.md> that rendered on every page. On /privacy and /changelog this conflicted with the page-specific markdown alternate from metadata.alternates.types (e.g. /privacy.md), so agents saw two text/markdown alternates pointing at different URLs. Now each page's metadata is the single source of truth for its markdown alternate. The global <link rel=alternate> for /llms.txt stays since it's a site-level resource that's correct on every page. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

Previously next.config set Vary: Accept, User-Agent on /:path*, which matched static assets too. Vary: User-Agent on cached static assets defeats CDN caching since each unique UA gets its own cache entry. Now the Vary header is appended by proxy.ts on the responses that actually vary (HTML pass-through and markdown rewrites). Static .md files keep their Content-Type and CORS headers; .png/.svg/.js/.css and other static assets no longer carry Vary. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

Both proxy.ts and next.config.ts had a www.react-grab.com -> react-grab.com 308 redirect. Since the proxy runs before next.config redirects in Next 16, the next.config entry was dead code. Keep only the proxy implementation as the single source of truth. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

Vercel's domain config still redirects apex -> www at the edge, before the proxy runs. A proxy-level www -> apex 308 would create an infinite redirect loop: 1. Browser hits react-grab.com 2. Vercel edge redirects apex -> www 3. Proxy runs: 308 redirect www -> apex 4. Vercel edge redirects apex -> www 5. ... loops until browser gives up The cross-host redirect finding from the audit can only be addressed by flipping the canonical domain in the Vercel dashboard. Drop the proxy redirect so shipping this PR cannot break the site. Reported by Cursor Bugbot (high severity). Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

vercel

Additional Suggestion:

The middleware is named proxy.ts and uses a named export instead of the default export required by Next.js, preventing the middleware from executing

The root layout's metadata.alternates was inherited by every page that did not declare its own. /open-file (which has its own metadata block) ended up with the homepage's canonical URL and the homepage's text/markdown alternate. Move the homepage canonical + index.md alternate from the root layout into app/page.tsx, where it belongs. Add a canonical and noindex/nofollow to app/open-file/layout.tsx so the page is correctly self-canonical and excluded from indexing (matching the existing sitemap exclusion). Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3c9cb45. Configure here.}

Both app/sitemap.ts and app/sitemap.md/route.ts walked the app/ directory to discover page routes, with two near-identical copies of the recursion + exclusion set. Move that logic into utils/discover-page-routes.ts so the exclusion set (api, open-file, anything with a dot) is defined once and shared. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

cursoragent and others added 5 commits May 9, 2026 05:17

feat(website): add markdown mirrors for each page

b407123

- /index.md, /privacy.md as static markdown mirrors - /changelog.md and /sitemap.md as dynamic route handlers - /404.md fallback markdown for missing-page agent responses Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>

aidenybai marked this pull request as ready for review May 9, 2026 05:26

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread apps/website/app/changelog/page.tsx Outdated

cubic-dev-ai Bot reviewed May 9, 2026

View reviewed changes

Comment thread apps/website/public/llms-full.txt

vercel Bot reviewed May 9, 2026

View reviewed changes

cursoragent and others added 2 commits May 9, 2026 05:40

vercel Bot deployed to Preview – react-grab-storybook May 9, 2026 05:40 View deployment

vercel Bot deployed to Preview – react-grab-website May 9, 2026 05:40 View deployment

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread apps/website/app/layout.tsx Outdated

Comment thread apps/website/next.config.ts Outdated

cursoragent and others added 2 commits May 9, 2026 05:55

vercel Bot deployed to Preview – react-grab-storybook May 9, 2026 05:56 View deployment

vercel Bot deployed to Preview – react-grab-website May 9, 2026 05:56 View deployment

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread apps/website/next.config.ts Outdated

vercel Bot deployed to Preview – react-grab-storybook May 9, 2026 06:07 View deployment

vercel Bot deployed to Preview – react-grab-website May 9, 2026 06:08 View deployment

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread apps/website/proxy.ts Outdated

vercel Bot deployed to Preview – react-grab-storybook May 9, 2026 06:21 View deployment

vercel Bot deployed to Preview – react-grab-website May 9, 2026 06:21 View deployment

vercel Bot reviewed May 9, 2026

View reviewed changes

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread apps/website/app/layout.tsx Outdated

vercel Bot deployed to Preview – react-grab-storybook May 9, 2026 06:34 View deployment

vercel Bot deployed to Preview – react-grab-website May 9, 2026 06:35 View deployment

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread apps/website/app/sitemap.md/route.ts Outdated

vercel Bot deployed to Preview – react-grab-storybook May 9, 2026 06:44 View deployment

vercel Bot deployed to Preview – react-grab-website May 9, 2026 06:44 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(website): improve agent readability score#330

fix(website): improve agent readability score#330
aidenybai wants to merge 13 commits into
mainfrom
cursor/fix-agent-readability-aed3

aidenybai commented May 9, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

vercel Bot commented May 9, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

vercel Bot May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel Bot left a comment

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,85 @@
		import { NextResponse, type NextRequest } from "next/server";

Conversation

aidenybai commented May 9, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

"Can agents find you?"

"Can agents read you?"

"Is your HTML agent-friendly?"

Verification

Bot review notes

Manual follow-up (cannot be done from code)

Summary by cubic

Uh oh!

vercel Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel Bot May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aidenybai commented May 9, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented May 9, 2026 •

edited

Loading

pkg-pr-new Bot commented May 9, 2026 •

edited

Loading

vercel Bot May 9, 2026 •

edited

Loading