fix(website): improve agent readability score#330
Open
aidenybai wants to merge 13 commits into
Open
Conversation
- Move detailed install instructions into llms-full.txt - Make llms.txt a curated link list with H1, blockquote summary, and markdown link sections per the llmstxt.org spec Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- /index.md, /privacy.md as static markdown mirrors - /changelog.md and /sitemap.md as dynamic route handlers - /404.md fallback markdown for missing-page agent responses Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- Detect AI user-agents (ChatGPT, Claude, Cursor, Perplexity, GPTBot, etc) and Accept: text/markdown header, then rewrite to the page's markdown mirror - Return 200 with /404.md (instead of an HTML 404 body) for unknown URLs requested by agents, since agents discard 404 bodies - www.react-grab.com -> react-grab.com permanent redirect to avoid cross-host redirects (still requires Vercel apex domain to be marked canonical for the audit to pass) - Set Content-Type: text/markdown and CORS headers on all .md and llms*.txt responses - Add Vary: Accept, User-Agent so caches respect content negotiation Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- WebSite + Organization + SoftwareApplication JSON-LD so agents can extract structured data without DOM parsing - alternates.canonical and metadataBase so duplicates are not indexed - <link rel=alternate type=text/markdown> pointing at /index.md and /llms.txt - Skip-to-content link for keyboard users Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
- Homepage: wrap in <main id=main-content> with screen-reader h1/h2/h3 so agents can chunk content - Privacy: promote each section title from <p> to <h2>, switch wrapper to <main> - Changelog: switch "Changelog" label to <h1>, version to <h2>, change type to <h3>, switch wrapper to <main> - Add canonical URL and text/markdown alternate to /privacy and /changelog metadata Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
commit: |
Contributor
There was a problem hiding this comment.
1 issue found across 13 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/website/public/llms-full.txt">
<violation number="1" location="apps/website/public/llms-full.txt:126">
P2: The Next.js Pages Router snippet uses `<Script>` without importing it from `next/script`, so the documented example will fail when copied.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -0,0 +1,85 @@ | |||
| import { NextResponse, type NextRequest } from "next/server"; | |||
Contributor
parseChangelog initializes changeType to "" for versions without a ### line. The HTML page rendered an empty <h3> in that case, breaking the heading outline. The markdown route already guarded with if (entry.changeType); now the page does too. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
The Pages Router example used <Script> without importing it from next/script. Anyone copying the snippet would get a ReferenceError. Add the import in both llms-full.txt and install.md. Reported by cubic for llms-full.txt; install.md was carrying the same pre-existing bug. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
The layout had a hardcoded <link rel=alternate type=text/markdown href=/index.md> that rendered on every page. On /privacy and /changelog this conflicted with the page-specific markdown alternate from metadata.alternates.types (e.g. /privacy.md), so agents saw two text/markdown alternates pointing at different URLs. Now each page's metadata is the single source of truth for its markdown alternate. The global <link rel=alternate> for /llms.txt stays since it's a site-level resource that's correct on every page. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Previously next.config set Vary: Accept, User-Agent on /:path*, which matched static assets too. Vary: User-Agent on cached static assets defeats CDN caching since each unique UA gets its own cache entry. Now the Vary header is appended by proxy.ts on the responses that actually vary (HTML pass-through and markdown rewrites). Static .md files keep their Content-Type and CORS headers; .png/.svg/.js/.css and other static assets no longer carry Vary. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Both proxy.ts and next.config.ts had a www.react-grab.com -> react-grab.com 308 redirect. Since the proxy runs before next.config redirects in Next 16, the next.config entry was dead code. Keep only the proxy implementation as the single source of truth. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Vercel's domain config still redirects apex -> www at the edge, before the proxy runs. A proxy-level www -> apex 308 would create an infinite redirect loop: 1. Browser hits react-grab.com 2. Vercel edge redirects apex -> www 3. Proxy runs: 308 redirect www -> apex 4. Vercel edge redirects apex -> www 5. ... loops until browser gives up The cross-host redirect finding from the audit can only be addressed by flipping the canonical domain in the Vercel dashboard. Drop the proxy redirect so shipping this PR cannot break the site. Reported by Cursor Bugbot (high severity). Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
The root layout's metadata.alternates was inherited by every page that did not declare its own. /open-file (which has its own metadata block) ended up with the homepage's canonical URL and the homepage's text/markdown alternate. Move the homepage canonical + index.md alternate from the root layout into app/page.tsx, where it belongs. Add a canonical and noindex/nofollow to app/open-file/layout.tsx so the page is correctly self-canonical and excluded from indexing (matching the existing sitemap exclusion). Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3c9cb45. Configure here.
Both app/sitemap.ts and app/sitemap.md/route.ts walked the app/ directory to discover page routes, with two near-identical copies of the recursion + exclusion set. Move that logic into utils/discover-page-routes.ts so the exclusion set (api, open-file, anything with a dot) is defined once and shared. Reported by Cursor Bugbot. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Addresses the Vercel agent-readability spec audit findings on
react-grab.com.What changed
"Can agents find you?"
llms.txt: rewritten to follow the llmstxt.org format with an H1 title, blockquote summary, and[name](url)markdown link sections. The previous detailed install content moved to a new/llms-full.txt(linked fromllms.txt)./sitemap.md: new dynamic route that lists every page with its HTML and.mdURL so agents can crawl without parsing XML.WebSite+Organization+SoftwareApplicationgraph injected into<head>of every page so agents can extract title, description, logo, and schema-typed metadata without DOM parsing."Can agents read you?"
proxy.ts(Next.js 16's renamedmiddleware):ChatGPT,Claude,Perplexity,Cursor,GPTBot,OAI-SearchBot,Anthropic,OpenCode,aider,CCBot,Bytespider,Amazonbot,Applebot-Extended,Diffbot,MistralAI,YouBot,Cohere) andAccept: text/markdownheaders, then rewrites to the corresponding.mdmirror.Vary: Accept, User-Agentonly on proxy responses (HTML pass-through and markdown rewrites). Static assets keep their default headers.{url}.md:/index.mdand/privacy.mdas static files inpublic/./changelog.mdand/sitemap.mdas Next.js route handlers (dynamic fromCHANGELOG.mdand theapp/directory)./404.mdfor the agent-friendly missing-page response.Content-Type: text/markdown; charset=utf-8andAccess-Control-Allow-Origin: *headers configured innext.config.tsfor every.mdURL.<link rel="alternate" type="text/markdown">: each page emits its own page-specific alternate (/,/privacy,/changelog→index.md,privacy.md,changelog.md) viametadata.alternates.types. A single global<link>to/llms.txtlives in the layout."Is your HTML agent-friendly?"
h1→h2→h3over the existing demo so agents can chunk the page.<p>to<h2>(already had<h1>).<h1>, version to<h2>, change type to<h3>(skipped when the entry has no change type so the heading outline stays clean).alternates.canonical+metadataBase).Verification
Tested locally with
next startand against the deployed Vercel preview:pnpm typecheck,pnpm lint,pnpm formatall pass; productionnext buildsucceeds; lint/build/test-build/test-cli/test-e2e/typecheck CI all green.Bot review notes
<h3>on changelog whenentry.changeTypeis"", a duplicate<link rel="alternate" type="text/markdown">from the root layout, an over-broadVary: Accept, User-Agenton/:path*defeating CDN caching for static assets, a duplicatewww → apexredirect acrossproxy.tsandnext.config.ts, and (high severity) that a proxy-levelwww → apexredirect would create an infinite loop while the Vercel dashboard still redirects apex →www. All five fixed.<Script>snippet inllms-full.txtwas missing itsimport Script from "next/script"line. Fixed in bothllms-full.txtand the pre-existinginstall.md.sitemap.mdandchangelog.mdroute folders from sitemap generation. The existingif (entry.includes("."))filter already excludes them, so leaving as-is. The same review's other comment ("proxy.ts not executed") was incorrect — verified the deployed preview returns markdown to agent UAs through the proxy (x-matched-path: /index.md).Manual follow-up (cannot be done from code)
The audit's "Redirect behavior cross-host redirect to
www.react-grab.com" finding can only be addressed in the Vercel dashboard: in the project's Domains settings, markreact-grab.com(apex) as canonical so requests toreact-grab.comno longer 30x towww.react-grab.com. A code-levelwww → apexredirect is intentionally not added — Vercel's edge redirect runs before the proxy, so a proxy redirect in the opposite direction would create an infinite loop and take the site down.The remaining audit finding — homepage/changelog HTML over the 100 KB page size threshold — is mitigated by the markdown alternates: agent UAs no longer receive HTML for those pages. Reducing the HTML further would require turning off
experimental.inlineCss, which costs a render-blocking round-trip for human visitors.Summary by cubic
Improve react-grab.com agent readability by serving markdown to agents, adding structured metadata, and tightening semantics to align with Vercel’s agent-readability spec. Agents now get clean, crawlable content with better discovery and fewer dead ends.
New Features
.mdmirrors for key pages (/index.md,/privacy.md,/changelog.md,/sitemap.md) plus/404.md; a proxy detects agent user-agents orAccept: text/markdownand rewrites; unknown URLs return 200 markdown;Content-Type/CORS set on.mdandllms*.txt.llms.txtrewritten tollmstxt.orgformat; addedllms-full.txt; injected JSON-LD (WebSite,Organization,SoftwareApplication); canonical URLs and<link rel="alternate" type="text/markdown">scoped per page.<main>landmarks, hierarchical headings on Home/Privacy/Changelog, and a skip-to-content link for better parsing and navigation.react-grab.com(apex) as canonical to resolve the cross-host redirect finding.Bug Fixes
index.mdalternate from the root layout into Home; added a canonical andnoindex, nofollowto/open-fileto prevent incorrect inheritance./index.mdalternate; each page now declares its own markdown alternate; a single global alternate for/llms.txtremains.Vary: Accept, User-Agentto proxy responses only to avoid fragmenting CDN caches for static assets.www → apexrule fromnext.config.tsand dropped the proxy-level redirect to avoid apex↔www loops with Vercel’s edge redirect.<h3>when an entry has no change type.next/scriptimport in the Next.js Pages Router snippet inllms-full.txtandinstall.md.utils/discover-page-routesand reused insitemap.tsand/sitemap.mdto keep exclusions consistent.Written for commit 903f7b9. Summary will update on new commits.