Skip to content

infra(seo): llms.txt + RSS metadata upgrades + site description#394

Merged
mergify[bot] merged 4 commits into
mainfrom
infra/seo-discovery-surfaces
May 17, 2026
Merged

infra(seo): llms.txt + RSS metadata upgrades + site description#394
mergify[bot] merged 4 commits into
mainfrom
infra/seo-discovery-surfaces

Conversation

@julianken
Copy link
Copy Markdown
Owner

Summary

Three discovery-surface improvements: ship llms.txt + llms-full.txt for IDE-agent ingestion, upgrade the RSS feed with proper channel + per-item metadata, and refresh the generic siteDescription.

Changes

public/llms.txt (new, static)

Disclaimer + content pointers for AI training / IDE ingestion (Cursor, Continue, Cline). Major AI engines mostly ignore llms.txt (~10% adoption per Q1 2026 research), but IDE coding agents actively consume it — that's the audience.

src/app/llms-full.txt/route.ts (new, dynamic)

Markdown index generated from the live pattern catalog (PATTERNS) + published posts (getPublishedPosts). Returned with Content-Type: text/plain; charset=utf-8 and a 1-hour s-maxage cache.

src/app/feed.xml/route.ts

Channel additions: <author>, <copyright>, <managingEditor>, <generator>, <image>. Per-item: <author>. Defer <content:encoded> — requires a Payload Lexical → HTML server-side exporter that doesn't exist (follow-up issue recommended).

src/lib/site-config.ts

Test plan

  • curl http://localhost:3000/llms.txt returns 200 + plaintext
  • curl http://localhost:3000/llms-full.txt returns 200 + Markdown listing all 24 patterns + posts
  • curl http://localhost:3000/feed.xml returns valid RSS XML (verified with xmllint)
  • pnpm lint, pnpm test:unit, pnpm typecheck
  • CI (4 E2E shards + bundle + build)

Follow-up issue (not in this PR)

<content:encoded> for the RSS feed: needs a Payload Lexical → HTML server-side exporter (the existing JSXConverters are React/client-side). Filed separately.

Closes #390

- Ship public/llms.txt (static) and /llms-full.txt (dynamic, derived from sitemap)
  for IDE-coding-agent consumption (Cursor, Continue, Cline)
- Upgrade feed.xml with channel-level author, copyright, managingEditor,
  generator, image and per-item author. Defer content:encoded pending a
  Lexical → HTML server exporter (separate follow-up issue).
- Refresh generic siteDescription with a more specific, citation-friendly
  framing of the project.
- Defensively add CONTACT_EMAIL constant (also added by #387; idempotent).

Closes #390
@julianken julianken added status:in-review PR open, waiting for review area:seo SEO + AI-discovery strategy work labels May 17, 2026
julianken-bot
julianken-bot previously approved these changes May 17, 2026
Copy link
Copy Markdown
Collaborator

@julianken-bot julianken-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE — ship-ready. Two findings worth addressing; neither blocks merge.

Verification ledger (commands run in this turn)

  • gh pr view 394 --json … — head 3d87314f, base 51369a1d, mergeable, not draft, no prior reviews
  • gh pr diff 394 — read in full (4 files, +92 −3)
  • git fetch origin pull/394/head:pr-394 + git show pr-394:<each touched file> — read all 4 files at PR head
  • git grep -n "CONTACT_EMAIL" pr-394 — confirmed PR is the only definer at head
  • git grep -n "siteDescription" — found 2 pre-existing duplicate copies not refreshed (see finding 2)
  • git grep -n "A tech blog and reference catalog" — confirmed exact stale strings
  • git show pr-394:src/data/agentic-design-patterns/index.ts — counted PATTERNS array: 24 active + 1 archived; PR's "24 patterns" claim verified
  • git show pr-394:src/lib/queries/posts.ts — confirmed getPublishedPosts() has internal try/catch (returns [] on failure), so the asymmetry between feed.xml outer-try and llms-full.txt no-try is not a defect
  • gh api .../collaborators/julianken-bot/permissionwrite
  • gh pr view 394 --json statusCheckRollup — all 11 required checks SUCCESS on 3d87314f (ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, all 4 E2E shards)
  • WebFetch https://www.rssboard.org/rss-specification — verified <author> is item-only per spec; channel uses <managingEditor> / <webMaster>

Findings ToC

  1. IMPORTANT — channel-level <author> is not RSS 2.0 spec-compliant (src/app/feed.xml/route.ts:59)
  2. SUGGESTION (plan-controlled)siteDescription refresh doesn't propagate to JSON-LD schema or PWA manifest (src/lib/site-config.ts:14)

Specific things this PR did right

  • PATTERNS.filter((p) => !p.archived) mirrors the sitemap's archived-filtering — keeps the AI-discovery surface and search-discovery surface in lockstep, no silent divergence.
  • Per-item <author> in email (Name) format is the canonical RSS 2.0 shape, not the common mistake of bare name or bare email.
  • Deferral of <content:encoded> is named in the PR body with the concrete blocker (Payload Lexical → HTML server exporter absent). That is the right way to defer — naming what would be required, not waving at "future work".
  • Cache headers on llms-full.txt (s-maxage=3600, stale-while-revalidate=86400) match feed.xml — consistent caching contract across the discovery surfaces.

Bottom line

Ship after finding 1 (one-line delete). Finding 2 is a follow-up either way — note it in the merge commit or open a separate issue. CI is green on 3d87314f. No BLOCKERs.

Same-tier risk

I am running as opus. Implementer model tier is unknown to me (the dispatcher did not provide it, per the no-narrative rule). If the implementer was also opus, R12 same-tier risk applies and Julian should treat the second pass with extra weight.

@julianken-bot, posted via the reviewing-as-julianken-bot skill rubric.

Comment thread src/app/feed.xml/route.ts Outdated
<description>${escapeXml(SITE_DESCRIPTION)}</description>
<language>en</language>
<lastBuildDate>${new Date().toUTCString()}</lastBuildDate>
<author>${escapeXml(CONTACT_EMAIL)} (${escapeXml(SITE_TITLE)})</author>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT — Channel-level <author> is not in the RSS 2.0 spec.

Per rssboard.org/rss-specification, <author> is documented only under "Elements of <item>":

<author> is an optional sub-element of <item>. It's the email address of the author of the item.

There is no <author> listed in either the required or optional channel elements section. Channel-level author info uses <managingEditor> (which this PR already adds correctly on the next line) and <webMaster>.

Practical impact: the per-item <author> additions are spec-correct. The channel-level one will trigger warnings from validator.w3.org/feed and is silently dropped by strict aggregators. xmllint --noout only verifies well-formed XML — it does not validate against the RSS 2.0 schema, so the green test plan does not catch this.

Fix: delete this line; <managingEditor> already covers the role.

Comment thread src/lib/site-config.ts

export const siteDescription =
"A tech blog and reference catalog on agentic AI.";
"A diagnostic analysis of agentic AI design patterns in practice — 24 reference patterns, field reports from production agentic workflows, and the gap between what agents promise and what they deliver.";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION (plan-controlled, not implementer-controlled per R9)siteDescription is not refreshed in the schema-emitted copy or the PWA manifest.

Two pre-existing duplicate copies of the old "A tech blog and reference catalog on agentic AI." string remain on this branch:

  • src/lib/schema/config.ts:20 — used by JSON-LD schema generators; this is the copy Google + AI citation crawlers pick up
  • src/app/manifest.webmanifest:4 — PWA install prompt + browser bookmark metadata

This is plan-controlled — issue #390 names only src/lib/site-config.ts, so the implementer correctly followed scope. But the PR's stated goal is "citation-friendly framing", and the schema config copy is exactly what citation crawlers consume. Updating only site-config.ts leaves the goal partially unmet.

Two options for follow-up:

  1. Amend this PR to also update both copies (small, mechanical, low-risk).
  2. File a follow-up issue to extend the description-refresh to all three call sites.

Either is reasonable; flagging here so the gap doesn't silently survive.

The RSS 2.0 spec (rssboard.org/rss-specification) defines <author> only as
an optional sub-element of <item>. Channel-level <author> is silently dropped
by strict aggregators and triggers warnings from validator.w3.org/feed.

<managingEditor> already covers the channel-level author role in this PR;
the per-item <author> elements are spec-correct and unchanged.

Addresses bot review on #394.
julianken-bot
julianken-bot previously approved these changes May 17, 2026
Copy link
Copy Markdown
Collaborator

@julianken-bot julianken-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE — ship-ready after the channel-level <author> fix in d8c3cee. One spec-compliance finding worth addressing; one pre-existing duplication carried forward from the prior review.

Verification ledger (commands run in this turn)

  • gh pr view 394 --json … — head d8c3cee6, base 51369a1d, mergeable, not draft. New commit d8c3cee on top of 3d87314 (the prior review's HEAD).
  • gh pr diff 394 — read all 4 touched files in full (+92 −4).
  • git fetch origin pull/394/head:pr-394 + git show pr-394:<each file> — read all files at current HEAD.
  • git -C . show pr-394:src/app/feed.xml/route.ts | grep -n author — confirmed channel-level <author> is gone in d8c3cee; per-item <author> remains on line 45 (spec-correct).
  • git grep -n "A tech blog and reference catalog" pr-394 — confirmed two pre-existing stale duplicates still ship: src/lib/schema/config.ts:20, src/app/manifest.webmanifest:4.
  • git show pr-394:src/lib/queries/posts.ts — confirmed getPublishedPosts() has internal try/catch returning []. The asymmetry between feed.xml's outer try/catch and llms-full.txt's lack of one is therefore not a defect.
  • file /Users/j/repos/tech-blog/public/og-default.pngPNG image data, 1200 x 630, 8-bit/color RGB.
  • WebFetch https://cyber.harvard.edu/rss/rss.html — verified RSS 2.0 spec language: image max width 144, max height 400.
  • git show pr-394:src/data/agentic-design-patterns/index.ts + grep archived — counted 25 pattern files, 1 archived (funnel-method) → 24 active. PR body's "24 patterns" claim verified.
  • gh pr checks 394 — 8 checks pass on d8c3cee (ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, CodeQL Analysis, E2E Shard 2/4); E2E Shards 1, 3, 4 pending (re-running for the new commit).
  • gh api .../collaborators/julianken-bot/permissionwrite.

Findings ToC

  1. IMPORTANT<image> block embeds og-default.png (1200×630), which exceeds the RSS 2.0 max dimensions (144 × 400). Inline comment on src/app/feed.xml/route.ts:62.
  2. SUGGESTION (plan-controlled)siteDescription refresh leaves two stale pre-existing duplicates downstream. Inline comment on src/lib/site-config.ts:14.

Specific things this PR did right

  • The d8c3cee fix isn't just a delete — the commit message names the spec section, the strict-aggregator failure mode, and explicitly notes <managingEditor> already covers the channel-author role. That's exactly the diff hygiene that lets the next reviewer move on in one read.
  • llms.txt includes an explicit use-policy and citation-format hint instead of leaving it ambiguous — agents that respect site-level signals will have a defined target to comply with, not a guess.
  • The PATTERNS.filter((p) => !p.archived) filter in llms-full.txt mirrors the sitemap's archived-filtering convention (already verified in the prior review). Worth re-emphasizing because it prevents the AI-surface and search-surface from silently diverging on retired patterns.

Bottom line

Merge after one decision on finding 1 (either swap to a 144×144 RSS-spec icon, or delete the <image> block). Finding 2 is the same un-acted-on SUGGESTION from before — fold it in or file the follow-up; either is fine.

Same-tier risk

Running as opus. Implementer model tier is unknown (per the no-narrative rule). If the implementer was also opus, R12 applies — treat the second-pass find (the 1200×630 image dimension violation) with extra weight, because it's the kind of detail same-tier reviewers are most likely to miss.

@julianken-bot, posted via the reviewing-as-julianken-bot skill rubric.

Comment thread src/app/feed.xml/route.ts
<copyright>Copyright © 2024–${new Date().getFullYear()} ${escapeXml(SITE_TITLE)}. All rights reserved.</copyright>
<managingEditor>${escapeXml(CONTACT_EMAIL)} (${escapeXml(SITE_TITLE)})</managingEditor>
<generator>Next.js + Payload CMS</generator>
<image>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT — RSS 2.0 <image> sub-element exceeds maximum dimensions.

og-default.png is 1200 × 630, but the RSS 2.0 spec (Harvard original, RSSBoard) is unambiguous:

Maximum value for width is 144, default value is 88.
Maximum value for height is 400, default value is 31.

Strict aggregators (and validator.w3.org/feed) will reject this; lenient ones silently drop the <image> block. Same spec-compliance class as the channel-level <author> you just removed in d8c3cee.

Two ways out:

  1. Generate a 144×144 (or 88×88) variant — e.g. public/rss-icon-144.png — and reference it here instead. This is what most production RSS feeds do; the OG image is the wrong asset for this slot.
  2. Drop the <image> block entirely. <image> is optional in RSS 2.0; absence is preferable to a spec-violating one that some readers will reject.

Option 1 is the right end-state, but option 2 is a clean one-line revert if you want to unblock merge today and file the icon work as a follow-up.

Comment thread src/lib/site-config.ts

export const siteDescription =
"A tech blog and reference catalog on agentic AI.";
"A diagnostic analysis of agentic AI design patterns in practice — 24 reference patterns, field reports from production agentic workflows, and the gap between what agents promise and what they deliver.";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION (plan-controlled, R9)siteDescription refresh isn't propagated to the two other copies that still ship the old "tech blog and reference catalog" string:

  • src/lib/schema/config.ts:19-20 — JSON-LD SITE_CONFIG.description (rendered into the WebSite schema on every page)
  • src/app/manifest.webmanifest:4 — PWA install prompt description

This is the same finding as the prior bot review's #2, restated because it's unaddressed in d8c3cee. Framing as plan-controlled (R9) rather than implementer-controlled: the PR scope as written touched siteDescription in one file; nothing said "and propagate everywhere". The downstream consumer here is the JSON-LD WebSite entity, which is exactly the surface infra(seo) PRs should keep in lockstep — leaving it stale means search engines and Schema.org consumers see the old description even after this ships.

Either fold both edits into this PR (3-line change total) or file a follow-up issue tagged seo and reference it in the merge commit.

RSS 2.0 caps <image> at 144x400. The prior <image> referenced og-default.png
(1200x630) which strict aggregators silently drop.

Downscale public/android-chrome-192x192.png (the existing brand mark) to a
144x144 sibling at public/rss-icon-144.png via sharp. Reference the new asset
from feed.xml/route.ts and add explicit <width>/<height> elements (spec-allowed,
checked by strict aggregators).

scripts/generate-rss-icon.ts is the regeneration path for future updates.

Addresses second-pass bot review on #394.
Copy link
Copy Markdown
Collaborator

@julianken-bot julianken-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: APPROVE

Three discovery surfaces land cleanly: static public/llms.txt, dynamic /llms-full.txt route generated from PATTERNS + getPublishedPosts, and an RSS feed upgraded with channel/item metadata.

Verification ledger (commands run this turn)

  • pnpm typecheck — clean (no output from tsc --noEmit).
  • pnpm test:unit — 525/525 passed, 31 files.
  • pnpm exec eslint on the four PR-modified files — clean.
  • pnpm dev + curl on all three routes:
    • /llms.txt → 200, text/plain; charset=UTF-8, 1252 bytes.
    • /llms-full.txt → 200, text/plain; charset=utf-8, 5846 bytes, lists 24 patterns + 4 posts.
    • /feed.xml → 200, application/xml; charset=utf-8; xmllint --noout parses without error.
  • RSS 2.0 spec cross-check (harvard.edu RSS 2.0 spec): <image> width=144/height=144 within the 144x400 cap; <copyright>/<managingEditor>/<generator> formats conformant; <image> has all three required sub-elements (url, title, link). The 2nd-pass icon downscale to a sibling 144x144 PNG (rather than reusing the 1200x630 og-default) is the right call — confirmed PNG file is 144x144 RGBA via file public/rss-icon-144.png.
  • Merge-conflict check on CONTACT_EMAIL (same const added on main via #393): cloned the PR branch fresh, ran git merge --no-commit origin/mainAuto-merging src/lib/site-config.ts ... Automatic merge went well. No textual conflict; the PR body is correct that the duplicate is idempotent.
  • CI status via gh pr checks 394: all 11 required checks pass (ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, 4 E2E shards).
  • HEAD SHA at review time: 778114539dd62db814b263560e116cdf66fc57f2.

Specific praise (one item, decision-named)

The second-pass fix to generate a spec-compliant 144x144 RSS image (fix(seo): spec-compliant 144x144 RSS feed icon) instead of reusing og-default.png is the right read of RSS 2.0 — strict aggregators (FeedValidator, Feedly) silently drop images outside the 144x400 cap, and the 1200x630 OG default would have been dropped. Adding <width>/<height> elements explicitly is the spec-recommended hint that helps validators avoid heuristic guessing.

Findings

1 SUGGESTION inline at src/app/feed.xml/route.ts:59 — copyright "All rights reserved" contradicts the AI use policy in public/llms.txt. Non-blocking.

Bottom line

Routes work, XML validates, RSS spec conformant, content correct, CI green, no merge conflicts despite stale base SHA. Approve and ship.


Reviewed by @julianken-bot (fresh context, model: opus) against the 15-rule anti-slop rubric. Implementer/reviewer same-tier risk: yes (likely opus×opus). R8 mandatory second pass performed; one substantive finding surfaced (copyright vs use-policy inconsistency).

Comment thread src/app/feed.xml/route.ts
<description>${escapeXml(SITE_DESCRIPTION)}</description>
<language>en</language>
<lastBuildDate>${new Date().toUTCString()}</lastBuildDate>
<copyright>Copyright © 2024–${new Date().getFullYear()} ${escapeXml(SITE_TITLE)}. All rights reserved.</copyright>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION — Mixed signal on AI/content use policy.

public/llms.txt:13 explicitly states the content is "available for AI training, research, and citation with attribution." But this <copyright> element emits "All rights reserved" into the RSS feed.

Strict aggregators and AI ingestion pipelines that read both surfaces will see a contradiction. Three options:

  1. Drop "All rights reserved" — e.g., Copyright © 2024–${year} ${SITE_TITLE}. (no rights clause).
  2. Switch to a Creative Commons hint: CC BY 4.0 — citation requested.
  3. Match the llms.txt language: Copyright © 2024–${year} ${SITE_TITLE}. Available for AI training, research, and citation with attribution.

Verified via curl http://localhost:3000/feed.xml | xmllint --noout (parses) and side-by-side with public/llms.txt.

Non-blocking — neither a spec violation nor a runtime issue.

@julianken
Copy link
Copy Markdown
Owner Author

@Mergifyio queue

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 17, 2026

Merge Queue Status

  • Entered queue2026-05-17 18:35 UTC · Rule: default
  • Checks passed · in-place
  • Merged2026-05-17 18:39 UTC · at b1348ec350168f29c944e680586f8139d3200f42 · squash

This pull request spent 3 minutes 50 seconds in the queue, including 3 minutes 28 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • github-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = ESLint
    • check-neutral = ESLint
    • check-skipped = ESLint
  • any of [🛡 GitHub branch protection]:
    • check-success = TypeScript
    • check-neutral = TypeScript
    • check-skipped = TypeScript
  • any of [🛡 GitHub branch protection]:
    • check-success = Vitest
    • check-neutral = Vitest
    • check-skipped = Vitest
  • any of [🛡 GitHub branch protection]:
    • check-success = Next.js Build
    • check-neutral = Next.js Build
    • check-skipped = Next.js Build
  • any of [🛡 GitHub branch protection]:
    • check-success = Analyze Bundle
    • check-neutral = Analyze Bundle
    • check-skipped = Analyze Bundle
  • any of [🛡 GitHub branch protection]:
    • check-success = CodeQL Analysis
    • check-neutral = CodeQL Analysis
    • check-skipped = CodeQL Analysis
  • any of [🛡 GitHub branch protection]:
    • check-success = E2E Shard 1/4
    • check-neutral = E2E Shard 1/4
    • check-skipped = E2E Shard 1/4
  • any of [🛡 GitHub branch protection]:
    • check-success = E2E Shard 2/4
    • check-neutral = E2E Shard 2/4
    • check-skipped = E2E Shard 2/4
  • any of [🛡 GitHub branch protection]:
    • check-success = E2E Shard 3/4
    • check-neutral = E2E Shard 3/4
    • check-skipped = E2E Shard 3/4
  • any of [🛡 GitHub branch protection]:
    • check-success = E2E Shard 4/4
    • check-neutral = E2E Shard 4/4
    • check-skipped = E2E Shard 4/4

@mergify mergify Bot added the queued label May 17, 2026
@mergify mergify Bot merged commit 3dc645b into main May 17, 2026
13 checks passed
@mergify mergify Bot deleted the infra/seo-discovery-surfaces branch May 17, 2026 18:39
@mergify mergify Bot removed the queued label May 17, 2026
mergify Bot pushed a commit that referenced this pull request May 18, 2026
* chore(docs): drop seo-strategy folder; align README to renamed post slug

Removes docs/seo-strategy/ — research artifacts from the SEO + AI-
discovery analysis funnel, no longer load-bearing now that the
gate-1/2/3 work has shipped (#393 #394 #395 #396 #397 #400 #402 #404
#406 #408). History preserved in git.

README: align "Recent essays" entry with the renamed post
slug (where-agentic-patterns-actually-live →
agentic-patterns-in-your-coding-workflow). The rename satisfies Bing
Site Scan's 70-char title cap.

No redirect deployed — article is two days old, no significant
external link equity to preserve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: defer README slug update; gitignore docs/seo-strategy

Address julianken-bot review of PR #409:

BLOCKER (README:75) — New slug URL serves an SSR 404 fallback because
the Payload post slug hasn't been renamed yet (intentionally deferred
until the in-flight Bing Site Scan completes). Reverting the README
link change here; it will land in a follow-up PR after the actual
Payload slug rename, so the link is never broken in main.

Plus: add /docs/seo-strategy/ to .gitignore so future analysis-funnel
artifacts (phase-*, context-packets, STATUS.md, issues/) stay on disk
without polluting the index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:seo SEO + AI-discovery strategy work status:in-review PR open, waiting for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

infra(seo): ship llms.txt + llms-full.txt, upgrade RSS feed, refresh site description

2 participants