Skip to content

fix(website): improve agent readability score#330

Open
aidenybai wants to merge 13 commits into
mainfrom
cursor/fix-agent-readability-aed3
Open

fix(website): improve agent readability score#330
aidenybai wants to merge 13 commits into
mainfrom
cursor/fix-agent-readability-aed3

Conversation

@aidenybai
Copy link
Copy Markdown
Owner

@aidenybai aidenybai commented May 9, 2026

Addresses the Vercel agent-readability spec audit findings on react-grab.com.

What changed

"Can agents find you?"

  • llms.txt: rewritten to follow the llmstxt.org format with an H1 title, blockquote summary, and [name](url) markdown link sections. The previous detailed install content moved to a new /llms-full.txt (linked from llms.txt).
  • /sitemap.md: new dynamic route that lists every page with its HTML and .md URL so agents can crawl without parsing XML.
  • JSON-LD: WebSite + Organization + SoftwareApplication graph injected into <head> of every page so agents can extract title, description, logo, and schema-typed metadata without DOM parsing.

"Can agents read you?"

  • proxy.ts (Next.js 16's renamed middleware):
    • Detects AI user-agents (ChatGPT, Claude, Perplexity, Cursor, GPTBot, OAI-SearchBot, Anthropic, OpenCode, aider, CCBot, Bytespider, Amazonbot, Applebot-Extended, Diffbot, MistralAI, YouBot, Cohere) and Accept: text/markdown headers, then rewrites to the corresponding .md mirror.
    • Returns 200 + markdown (instead of HTML 404) for unknown URLs requested by agents, since agents discard 404 bodies.
    • Appends Vary: Accept, User-Agent only on proxy responses (HTML pass-through and markdown rewrites). Static assets keep their default headers.
  • Markdown mirrors at {url}.md:
    • /index.md and /privacy.md as static files in public/.
    • /changelog.md and /sitemap.md as Next.js route handlers (dynamic from CHANGELOG.md and the app/ directory).
    • /404.md for the agent-friendly missing-page response.
    • Content-Type: text/markdown; charset=utf-8 and Access-Control-Allow-Origin: * headers configured in next.config.ts for every .md URL.
  • <link rel="alternate" type="text/markdown">: each page emits its own page-specific alternate (/, /privacy, /changelogindex.md, privacy.md, changelog.md) via metadata.alternates.types. A single global <link> to /llms.txt lives in the layout.

"Is your HTML agent-friendly?"

  • Hierarchical headings added on all three indexed pages:
    • Homepage: visually-hidden h1h2h3 over the existing demo so agents can chunk the page.
    • Privacy: promoted each section heading from <p> to <h2> (already had <h1>).
    • Changelog: "Changelog" promoted to <h1>, version to <h2>, change type to <h3> (skipped when the entry has no change type so the heading outline stays clean).
  • Canonical URL added to every page's metadata (alternates.canonical + metadataBase).
  • Skip-to-content link added to layout for keyboard users.

Verification

Tested locally with next start and against the deployed Vercel preview:

GET /llms.txt                              → llmstxt-spec markdown with link sections
GET /index.md / /privacy.md / /changelog.md → 200 text/markdown
GET /sitemap.md                             → 200 text/markdown
GET / (Accept: text/markdown)               → 200 → rewritten to /index.md
GET / (User-Agent: ChatGPT)                 → 200 → rewritten to /index.md
GET /privacy (Accept: text/markdown)        → 200 → rewritten to /privacy.md
GET /foobar (Accept: text/markdown)         → 200 → rewritten to /404.md (not 404)
GET /script.js / /logo.png                  → no Vary: User-Agent, CDN cache intact
GET / (browser)                             → HTML with h1, JSON-LD, canonical, alternate

pnpm typecheck, pnpm lint, pnpm format all pass; production next build succeeds; lint/build/test-build/test-cli/test-e2e/typecheck CI all green.

Bot review notes

  • Cursor Bugbot flagged an empty <h3> on changelog when entry.changeType is "", a duplicate <link rel="alternate" type="text/markdown"> from the root layout, an over-broad Vary: Accept, User-Agent on /:path* defeating CDN caching for static assets, a duplicate www → apex redirect across proxy.ts and next.config.ts, and (high severity) that a proxy-level www → apex redirect would create an infinite loop while the Vercel dashboard still redirects apex → www. All five fixed.
  • cubic-dev-ai flagged the <Script> snippet in llms-full.txt was missing its import Script from "next/script" line. Fixed in both llms-full.txt and the pre-existing install.md.
  • Vercel Agent Review suggested explicitly excluding the sitemap.md and changelog.md route folders from sitemap generation. The existing if (entry.includes(".")) filter already excludes them, so leaving as-is. The same review's other comment ("proxy.ts not executed") was incorrect — verified the deployed preview returns markdown to agent UAs through the proxy (x-matched-path: /index.md).

Manual follow-up (cannot be done from code)

The audit's "Redirect behavior cross-host redirect to www.react-grab.com" finding can only be addressed in the Vercel dashboard: in the project's Domains settings, mark react-grab.com (apex) as canonical so requests to react-grab.com no longer 30x to www.react-grab.com. A code-level www → apex redirect is intentionally not added — Vercel's edge redirect runs before the proxy, so a proxy redirect in the opposite direction would create an infinite loop and take the site down.

The remaining audit finding — homepage/changelog HTML over the 100 KB page size threshold — is mitigated by the markdown alternates: agent UAs no longer receive HTML for those pages. Reducing the HTML further would require turning off experimental.inlineCss, which costs a render-blocking round-trip for human visitors.

Open in Web Open in Cursor 

Summary by cubic

Improve react-grab.com agent readability by serving markdown to agents, adding structured metadata, and tightening semantics to align with Vercel’s agent-readability spec. Agents now get clean, crawlable content with better discovery and fewer dead ends.

  • New Features

    • Markdown delivery: .md mirrors for key pages (/index.md, /privacy.md, /changelog.md, /sitemap.md) plus /404.md; a proxy detects agent user-agents or Accept: text/markdown and rewrites; unknown URLs return 200 markdown; Content-Type/CORS set on .md and llms*.txt.
    • Discoverability: llms.txt rewritten to llmstxt.org format; added llms-full.txt; injected JSON-LD (WebSite, Organization, SoftwareApplication); canonical URLs and <link rel="alternate" type="text/markdown"> scoped per page.
    • Semantics: <main> landmarks, hierarchical headings on Home/Privacy/Changelog, and a skip-to-content link for better parsing and navigation.
    • Deployment note: In Vercel → Domains, set react-grab.com (apex) as canonical to resolve the cross-host redirect finding.
  • Bug Fixes

    • Metadata scoping: moved the homepage canonical and index.md alternate from the root layout into Home; added a canonical and noindex, nofollow to /open-file to prevent incorrect inheritance.
    • Alternates: removed the root layout’s hardcoded /index.md alternate; each page now declares its own markdown alternate; a single global alternate for /llms.txt remains.
    • Caching: scoped Vary: Accept, User-Agent to proxy responses only to avoid fragmenting CDN caches for static assets.
    • Redirects: removed duplicate www → apex rule from next.config.ts and dropped the proxy-level redirect to avoid apex↔www loops with Vercel’s edge redirect.
    • Changelog: skip empty <h3> when an entry has no change type.
    • Docs: add missing next/script import in the Next.js Pages Router snippet in llms-full.txt and install.md.
    • Refactor: extracted shared route discovery to utils/discover-page-routes and reused in sitemap.ts and /sitemap.md to keep exclusions consistent.

Written for commit 903f7b9. Summary will update on new commits.

cursoragent and others added 5 commits May 9, 2026 05:17
- Move detailed install instructions into llms-full.txt
- Make llms.txt a curated link list with H1, blockquote summary, and markdown link sections per the llmstxt.org spec

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- /index.md, /privacy.md as static markdown mirrors
- /changelog.md and /sitemap.md as dynamic route handlers
- /404.md fallback markdown for missing-page agent responses

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- Detect AI user-agents (ChatGPT, Claude, Cursor, Perplexity, GPTBot, etc) and Accept: text/markdown header, then rewrite to the page's markdown mirror
- Return 200 with /404.md (instead of an HTML 404 body) for unknown URLs requested by agents, since agents discard 404 bodies
- www.react-grab.com -> react-grab.com permanent redirect to avoid cross-host redirects (still requires Vercel apex domain to be marked canonical for the audit to pass)
- Set Content-Type: text/markdown and CORS headers on all .md and llms*.txt responses
- Add Vary: Accept, User-Agent so caches respect content negotiation

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- WebSite + Organization + SoftwareApplication JSON-LD so agents can extract structured data without DOM parsing
- alternates.canonical and metadataBase so duplicates are not indexed
- <link rel=alternate type=text/markdown> pointing at /index.md and /llms.txt
- Skip-to-content link for keyboard users

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- Homepage: wrap in <main id=main-content> with screen-reader h1/h2/h3 so agents can chunk content
- Privacy: promote each section title from <p> to <h2>, switch wrapper to <main>
- Changelog: switch "Changelog" label to <h1>, version to <h2>, change type to <h3>, switch wrapper to <main>
- Add canonical URL and text/markdown alternate to /privacy and /changelog metadata

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
react-grab-storybook Ready Ready Preview, Comment May 9, 2026 6:44am
react-grab-website Ready Ready Preview, Comment May 9, 2026 6:44am

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 9, 2026

Open in StackBlitz

npm i https://pkg.pr.new/aidenybai/react-grab/@react-grab/cli@330
npm i https://pkg.pr.new/aidenybai/react-grab/grab@330
npm i https://pkg.pr.new/aidenybai/react-grab/@react-grab/mcp@330
npm i https://pkg.pr.new/aidenybai/react-grab@330

commit: 903f7b9

@aidenybai aidenybai marked this pull request as ready for review May 9, 2026 05:26
Comment thread apps/website/app/changelog/page.tsx Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 13 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/website/public/llms-full.txt">

<violation number="1" location="apps/website/public/llms-full.txt:126">
P2: The Next.js Pages Router snippet uses `<Script>` without importing it from `next/script`, so the documented example will fail when copied.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread apps/website/public/llms-full.txt
Comment thread apps/website/proxy.ts
@@ -0,0 +1,85 @@
import { NextResponse, type NextRequest } from "next/server";
Copy link
Copy Markdown
Contributor

@vercel vercel Bot May 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Middleware not being executed - proxy.ts logic should be moved to middleware.ts with default export

Fix on Vercel

Comment thread apps/website/app/sitemap.md/route.ts Outdated
cursoragent and others added 2 commits May 9, 2026 05:40
parseChangelog initializes changeType to "" for versions without a ### line. The HTML page rendered an empty <h3> in that case, breaking the heading outline. The markdown route already guarded with if (entry.changeType); now the page does too.

Reported by Cursor Bugbot.

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
The Pages Router example used <Script> without importing it from next/script. Anyone copying the snippet would get a ReferenceError. Add the import in both llms-full.txt and install.md.

Reported by cubic for llms-full.txt; install.md was carrying the same pre-existing bug.

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Comment thread apps/website/app/layout.tsx Outdated
Comment thread apps/website/next.config.ts Outdated
cursoragent and others added 2 commits May 9, 2026 05:55
The layout had a hardcoded <link rel=alternate type=text/markdown href=/index.md> that rendered on every page. On /privacy and /changelog this conflicted with the page-specific markdown alternate from metadata.alternates.types (e.g. /privacy.md), so agents saw two text/markdown alternates pointing at different URLs.

Now each page's metadata is the single source of truth for its markdown alternate. The global <link rel=alternate> for /llms.txt stays since it's a site-level resource that's correct on every page.

Reported by Cursor Bugbot.

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Previously next.config set Vary: Accept, User-Agent on /:path*, which matched static assets too. Vary: User-Agent on cached static assets defeats CDN caching since each unique UA gets its own cache entry.

Now the Vary header is appended by proxy.ts on the responses that actually vary (HTML pass-through and markdown rewrites). Static .md files keep their Content-Type and CORS headers; .png/.svg/.js/.css and other static assets no longer carry Vary.

Reported by Cursor Bugbot.

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Comment thread apps/website/next.config.ts Outdated
Both proxy.ts and next.config.ts had a www.react-grab.com -> react-grab.com 308 redirect. Since the proxy runs before next.config redirects in Next 16, the next.config entry was dead code. Keep only the proxy implementation as the single source of truth.

Reported by Cursor Bugbot.

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Comment thread apps/website/proxy.ts Outdated
Vercel's domain config still redirects apex -> www at the edge, before the proxy runs. A proxy-level www -> apex 308 would create an infinite redirect loop:
  1. Browser hits react-grab.com
  2. Vercel edge redirects apex -> www
  3. Proxy runs: 308 redirect www -> apex
  4. Vercel edge redirects apex -> www
  5. ... loops until browser gives up

The cross-host redirect finding from the audit can only be addressed by flipping the canonical domain in the Vercel dashboard. Drop the proxy redirect so shipping this PR cannot break the site.

Reported by Cursor Bugbot (high severity).

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@vercel vercel Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Suggestion:

The middleware is named proxy.ts and uses a named export instead of the default export required by Next.js, preventing the middleware from executing

Fix on Vercel

Comment thread apps/website/app/layout.tsx Outdated
The root layout's metadata.alternates was inherited by every page that did not declare its own. /open-file (which has its own metadata block) ended up with the homepage's canonical URL and the homepage's text/markdown alternate.

Move the homepage canonical + index.md alternate from the root layout into app/page.tsx, where it belongs. Add a canonical and noindex/nofollow to app/open-file/layout.tsx so the page is correctly self-canonical and excluded from indexing (matching the existing sitemap exclusion).

Reported by Cursor Bugbot.

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3c9cb45. Configure here.

Comment thread apps/website/app/sitemap.md/route.ts Outdated
Both app/sitemap.ts and app/sitemap.md/route.ts walked the app/ directory to discover page routes, with two near-identical copies of the recursion + exclusion set. Move that logic into utils/discover-page-routes.ts so the exclusion set (api, open-file, anything with a dot) is defined once and shared.

Reported by Cursor Bugbot.

Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants