Skip to content

feat(seo): emit sitemap.xml + robots.txt at site root#28

Open
mastermanas805 wants to merge 1 commit into
mainfrom
feat/seo-sitemap-robots
Open

feat(seo): emit sitemap.xml + robots.txt at site root#28
mastermanas805 wants to merge 1 commit into
mainfrom
feat/seo-sitemap-robots

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Summary

  • New post-prerender step in scripts/prerender.mjs emits dist/sitemap.xml and dist/robots.txt at the site root so Google, Bing, and generative-engine crawlers (Perplexity, ChatGPT search) can discover every public URL on instanode.dev.
  • The sitemap lists 232 URLs: all 116 HTML routes (homepage, /pricing, /docs, /for-agents, /status, /blog + 5 posts, /use-cases + 104 case detail pages) plus their 116 .md mirror counterparts. Each <url> has <loc> (no trailing slash), <lastmod> (today's ISO date), and <changefreq>.
  • <changefreq> policy: daily for /blog and /use-cases index pages, weekly for detail pages, monthly for marketing copy (/pricing, /docs, /for-agents), weekly for everything else. .md mirrors inherit their HTML counterpart's frequency.
  • Auth-gated routes (/app/*, /login, /claim) are excluded from the sitemap and Disallow'd in robots.txt. Both lists are derived from a single AUTH_GATED_PREFIXES constant so they cannot drift.
  • robots.txt points crawlers at https://instanode.dev/sitemap.xml so they find it without guessing.

Test plan

  • npm run build produces dist/sitemap.xml and dist/robots.txt
  • xmllint --noout dist/sitemap.xml validates as well-formed XML
  • wc -l dist/sitemap.xml shows 1163 lines (232 URLs × 5 + header/footer)
  • grep -c '<loc>' reports 232 — comfortably above the ~116 baseline
  • No /app/, /login, or /claim paths leak into the sitemap
  • robots.txt contents: allow all, disallow auth-gated paths, sitemap pointer

🤖 Generated with Claude Code

Adds a post-prerender step that writes dist/sitemap.xml and
dist/robots.txt so Google, Bing, and generative-engine crawlers
(Perplexity, ChatGPT search) can discover every public URL —
all 116 HTML routes plus their 116 .md mirrors, 232 total.

Per-route <changefreq> policy: daily for /blog and /use-cases
index pages, weekly for blog/use-case detail pages, monthly for
the marketing copy (/pricing, /docs, /for-agents), weekly for
everything else. Auth-gated routes (/app/*, /login, /claim) are
excluded from the sitemap and Disallow'd in robots.txt; the two
lists are derived from a single AUTH_GATED_PREFIXES constant so
they cannot drift.

robots.txt points at https://instanode.dev/sitemap.xml so
crawlers find it without guessing.

Verified: npm run build emits both files, xmllint validates
sitemap.xml as well-formed XML, the URL count matches the
prerender route count + .md mirror count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant