infra(seo): llms.txt + RSS metadata upgrades + site description by julianken · Pull Request #394 · julianken/detached-node

julianken · 2026-05-17T14:34:27Z

Summary

Three discovery-surface improvements: ship llms.txt + llms-full.txt for IDE-agent ingestion, upgrade the RSS feed with proper channel + per-item metadata, and refresh the generic siteDescription.

Changes

`public/llms.txt` (new, static)

Disclaimer + content pointers for AI training / IDE ingestion (Cursor, Continue, Cline). Major AI engines mostly ignore llms.txt (~10% adoption per Q1 2026 research), but IDE coding agents actively consume it — that's the audience.

`src/app/llms-full.txt/route.ts` (new, dynamic)

Markdown index generated from the live pattern catalog (PATTERNS) + published posts (getPublishedPosts). Returned with Content-Type: text/plain; charset=utf-8 and a 1-hour s-maxage cache.

`src/app/feed.xml/route.ts`

Channel additions: <author>, <copyright>, <managingEditor>, <generator>, <image>. Per-item: <author>. Defer <content:encoded> — requires a Payload Lexical → HTML server-side exporter that doesn't exist (follow-up issue recommended).

`src/lib/site-config.ts`

siteDescription rewritten to a more specific, citation-friendly framing.
CONTACT_EMAIL added defensively (also being added by issue fix(seo): broken sameAs URL + 5 other quick-wins #387 / Category B in parallel; idempotent).

Test plan

curl http://localhost:3000/llms.txt returns 200 + plaintext
curl http://localhost:3000/llms-full.txt returns 200 + Markdown listing all 24 patterns + posts
curl http://localhost:3000/feed.xml returns valid RSS XML (verified with xmllint)
pnpm lint, pnpm test:unit, pnpm typecheck
CI (4 E2E shards + bundle + build)

Follow-up issue (not in this PR)

<content:encoded> for the RSS feed: needs a Payload Lexical → HTML server-side exporter (the existing JSXConverters are React/client-side). Filed separately.

Closes #390

- Ship public/llms.txt (static) and /llms-full.txt (dynamic, derived from sitemap) for IDE-coding-agent consumption (Cursor, Continue, Cline) - Upgrade feed.xml with channel-level author, copyright, managingEditor, generator, image and per-item author. Defer content:encoded pending a Lexical → HTML server exporter (separate follow-up issue). - Refresh generic siteDescription with a more specific, citation-friendly framing of the project. - Defensively add CONTACT_EMAIL constant (also added by #387; idempotent). Closes #390

julianken-bot

APPROVE — ship-ready. Two findings worth addressing; neither blocks merge.

Verification ledger (commands run in this turn)

gh pr view 394 --json … — head 3d87314f, base 51369a1d, mergeable, not draft, no prior reviews
gh pr diff 394 — read in full (4 files, +92 −3)
git fetch origin pull/394/head:pr-394 + git show pr-394:<each touched file> — read all 4 files at PR head
git grep -n "CONTACT_EMAIL" pr-394 — confirmed PR is the only definer at head
git grep -n "siteDescription" — found 2 pre-existing duplicate copies not refreshed (see finding 2)
git grep -n "A tech blog and reference catalog" — confirmed exact stale strings
git show pr-394:src/data/agentic-design-patterns/index.ts — counted PATTERNS array: 24 active + 1 archived; PR's "24 patterns" claim verified
git show pr-394:src/lib/queries/posts.ts — confirmed getPublishedPosts() has internal try/catch (returns [] on failure), so the asymmetry between feed.xml outer-try and llms-full.txt no-try is not a defect
gh api .../collaborators/julianken-bot/permission → write
gh pr view 394 --json statusCheckRollup — all 11 required checks SUCCESS on 3d87314f (ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, all 4 E2E shards)
WebFetch https://www.rssboard.org/rss-specification — verified <author> is item-only per spec; channel uses <managingEditor> / <webMaster>

Findings ToC

IMPORTANT — channel-level <author> is not RSS 2.0 spec-compliant (src/app/feed.xml/route.ts:59)
SUGGESTION (plan-controlled) — siteDescription refresh doesn't propagate to JSON-LD schema or PWA manifest (src/lib/site-config.ts:14)

Specific things this PR did right

PATTERNS.filter((p) => !p.archived) mirrors the sitemap's archived-filtering — keeps the AI-discovery surface and search-discovery surface in lockstep, no silent divergence.
Per-item <author> in email (Name) format is the canonical RSS 2.0 shape, not the common mistake of bare name or bare email.
Deferral of <content:encoded> is named in the PR body with the concrete blocker (Payload Lexical → HTML server exporter absent). That is the right way to defer — naming what would be required, not waving at "future work".
Cache headers on llms-full.txt (s-maxage=3600, stale-while-revalidate=86400) match feed.xml — consistent caching contract across the discovery surfaces.

Bottom line

Ship after finding 1 (one-line delete). Finding 2 is a follow-up either way — note it in the merge commit or open a separate issue. CI is green on 3d87314f. No BLOCKERs.

Same-tier risk

I am running as opus. Implementer model tier is unknown to me (the dispatcher did not provide it, per the no-narrative rule). If the implementer was also opus, R12 same-tier risk applies and Julian should treat the second pass with extra weight.

— @julianken-bot, posted via the reviewing-as-julianken-bot skill rubric.

julianken-bot · 2026-05-17T14:50:17Z

    <description>${escapeXml(SITE_DESCRIPTION)}</description>
    <language>en</language>
    <lastBuildDate>${new Date().toUTCString()}</lastBuildDate>
+    <author>${escapeXml(CONTACT_EMAIL)} (${escapeXml(SITE_TITLE)})</author>


IMPORTANT — Channel-level <author> is not in the RSS 2.0 spec.

Per rssboard.org/rss-specification, <author> is documented only under "Elements of <item>":

<author> is an optional sub-element of <item>. It's the email address of the author of the item.

There is no <author> listed in either the required or optional channel elements section. Channel-level author info uses <managingEditor> (which this PR already adds correctly on the next line) and <webMaster>.

Practical impact: the per-item <author> additions are spec-correct. The channel-level one will trigger warnings from validator.w3.org/feed and is silently dropped by strict aggregators. xmllint --noout only verifies well-formed XML — it does not validate against the RSS 2.0 schema, so the green test plan does not catch this.

Fix: delete this line; <managingEditor> already covers the role.

julianken-bot · 2026-05-17T14:50:17Z


 export const siteDescription =
-  "A tech blog and reference catalog on agentic AI.";
+  "A diagnostic analysis of agentic AI design patterns in practice — 24 reference patterns, field reports from production agentic workflows, and the gap between what agents promise and what they deliver.";


SUGGESTION (plan-controlled, not implementer-controlled per R9) — siteDescription is not refreshed in the schema-emitted copy or the PWA manifest.

Two pre-existing duplicate copies of the old "A tech blog and reference catalog on agentic AI." string remain on this branch:

src/lib/schema/config.ts:20 — used by JSON-LD schema generators; this is the copy Google + AI citation crawlers pick up

src/app/manifest.webmanifest:4 — PWA install prompt + browser bookmark metadata

This is plan-controlled — issue #390 names only src/lib/site-config.ts, so the implementer correctly followed scope. But the PR's stated goal is "citation-friendly framing", and the schema config copy is exactly what citation crawlers consume. Updating only site-config.ts leaves the goal partially unmet.

Two options for follow-up:

Amend this PR to also update both copies (small, mechanical, low-risk).

File a follow-up issue to extend the description-refresh to all three call sites.

Either is reasonable; flagging here so the gap doesn't silently survive.

The RSS 2.0 spec (rssboard.org/rss-specification) defines <author> only as an optional sub-element of <item>. Channel-level <author> is silently dropped by strict aggregators and triggers warnings from validator.w3.org/feed. <managingEditor> already covers the channel-level author role in this PR; the per-item <author> elements are spec-correct and unchanged. Addresses bot review on #394.

julianken-bot

APPROVE — ship-ready after the channel-level <author> fix in d8c3cee. One spec-compliance finding worth addressing; one pre-existing duplication carried forward from the prior review.

Verification ledger (commands run in this turn)

gh pr view 394 --json … — head d8c3cee6, base 51369a1d, mergeable, not draft. New commit d8c3cee on top of 3d87314 (the prior review's HEAD).
gh pr diff 394 — read all 4 touched files in full (+92 −4).
git fetch origin pull/394/head:pr-394 + git show pr-394:<each file> — read all files at current HEAD.
git -C . show pr-394:src/app/feed.xml/route.ts | grep -n author — confirmed channel-level <author> is gone in d8c3cee; per-item <author> remains on line 45 (spec-correct).
git grep -n "A tech blog and reference catalog" pr-394 — confirmed two pre-existing stale duplicates still ship: src/lib/schema/config.ts:20, src/app/manifest.webmanifest:4.
git show pr-394:src/lib/queries/posts.ts — confirmed getPublishedPosts() has internal try/catch returning []. The asymmetry between feed.xml's outer try/catch and llms-full.txt's lack of one is therefore not a defect.
file /Users/j/repos/tech-blog/public/og-default.png → PNG image data, 1200 x 630, 8-bit/color RGB.
WebFetch https://cyber.harvard.edu/rss/rss.html — verified RSS 2.0 spec language: image max width 144, max height 400.
git show pr-394:src/data/agentic-design-patterns/index.ts + grep archived — counted 25 pattern files, 1 archived (funnel-method) → 24 active. PR body's "24 patterns" claim verified.
gh pr checks 394 — 8 checks pass on d8c3cee (ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, CodeQL Analysis, E2E Shard 2/4); E2E Shards 1, 3, 4 pending (re-running for the new commit).
gh api .../collaborators/julianken-bot/permission → write.

Findings ToC

IMPORTANT — <image> block embeds og-default.png (1200×630), which exceeds the RSS 2.0 max dimensions (144 × 400). Inline comment on src/app/feed.xml/route.ts:62.
SUGGESTION (plan-controlled) — siteDescription refresh leaves two stale pre-existing duplicates downstream. Inline comment on src/lib/site-config.ts:14.

Specific things this PR did right

The d8c3cee fix isn't just a delete — the commit message names the spec section, the strict-aggregator failure mode, and explicitly notes <managingEditor> already covers the channel-author role. That's exactly the diff hygiene that lets the next reviewer move on in one read.
llms.txt includes an explicit use-policy and citation-format hint instead of leaving it ambiguous — agents that respect site-level signals will have a defined target to comply with, not a guess.
The PATTERNS.filter((p) => !p.archived) filter in llms-full.txt mirrors the sitemap's archived-filtering convention (already verified in the prior review). Worth re-emphasizing because it prevents the AI-surface and search-surface from silently diverging on retired patterns.

Bottom line

Merge after one decision on finding 1 (either swap to a 144×144 RSS-spec icon, or delete the <image> block). Finding 2 is the same un-acted-on SUGGESTION from before — fold it in or file the follow-up; either is fine.

Same-tier risk

Running as opus. Implementer model tier is unknown (per the no-narrative rule). If the implementer was also opus, R12 applies — treat the second-pass find (the 1200×630 image dimension violation) with extra weight, because it's the kind of detail same-tier reviewers are most likely to miss.

— @julianken-bot, posted via the reviewing-as-julianken-bot skill rubric.

julianken-bot · 2026-05-17T15:45:47Z

+    <copyright>Copyright © 2024–${new Date().getFullYear()} ${escapeXml(SITE_TITLE)}. All rights reserved.</copyright>
+    <managingEditor>${escapeXml(CONTACT_EMAIL)} (${escapeXml(SITE_TITLE)})</managingEditor>
+    <generator>Next.js + Payload CMS</generator>
+    <image>


IMPORTANT — RSS 2.0 <image> sub-element exceeds maximum dimensions.

og-default.png is 1200 × 630, but the RSS 2.0 spec (Harvard original, RSSBoard) is unambiguous:

Maximum value for width is 144, default value is 88.
Maximum value for height is 400, default value is 31.

Strict aggregators (and validator.w3.org/feed) will reject this; lenient ones silently drop the <image> block. Same spec-compliance class as the channel-level <author> you just removed in d8c3cee.

Two ways out:

Generate a 144×144 (or 88×88) variant — e.g. public/rss-icon-144.png — and reference it here instead. This is what most production RSS feeds do; the OG image is the wrong asset for this slot.

Drop the <image> block entirely. <image> is optional in RSS 2.0; absence is preferable to a spec-violating one that some readers will reject.

Option 1 is the right end-state, but option 2 is a clean one-line revert if you want to unblock merge today and file the icon work as a follow-up.

julianken-bot · 2026-05-17T15:45:48Z


 export const siteDescription =
-  "A tech blog and reference catalog on agentic AI.";
+  "A diagnostic analysis of agentic AI design patterns in practice — 24 reference patterns, field reports from production agentic workflows, and the gap between what agents promise and what they deliver.";


SUGGESTION (plan-controlled, R9) — siteDescription refresh isn't propagated to the two other copies that still ship the old "tech blog and reference catalog" string:

src/lib/schema/config.ts:19-20 — JSON-LD SITE_CONFIG.description (rendered into the WebSite schema on every page)

src/app/manifest.webmanifest:4 — PWA install prompt description

This is the same finding as the prior bot review's #2, restated because it's unaddressed in d8c3cee. Framing as plan-controlled (R9) rather than implementer-controlled: the PR scope as written touched siteDescription in one file; nothing said "and propagate everywhere". The downstream consumer here is the JSON-LD WebSite entity, which is exactly the surface infra(seo) PRs should keep in lockstep — leaving it stale means search engines and Schema.org consumers see the old description even after this ships.

Either fold both edits into this PR (3-line change total) or file a follow-up issue tagged seo and reference it in the merge commit.

RSS 2.0 caps <image> at 144x400. The prior <image> referenced og-default.png (1200x630) which strict aggregators silently drop. Downscale public/android-chrome-192x192.png (the existing brand mark) to a 144x144 sibling at public/rss-icon-144.png via sharp. Reference the new asset from feed.xml/route.ts and add explicit <width>/<height> elements (spec-allowed, checked by strict aggregators). scripts/generate-rss-icon.ts is the regeneration path for future updates. Addresses second-pass bot review on #394.

julianken-bot

Verdict: APPROVE

Three discovery surfaces land cleanly: static public/llms.txt, dynamic /llms-full.txt route generated from PATTERNS + getPublishedPosts, and an RSS feed upgraded with channel/item metadata.

Verification ledger (commands run this turn)

pnpm typecheck — clean (no output from tsc --noEmit).
pnpm test:unit — 525/525 passed, 31 files.
pnpm exec eslint on the four PR-modified files — clean.
pnpm dev + curl on all three routes:
- /llms.txt → 200, text/plain; charset=UTF-8, 1252 bytes.
- /llms-full.txt → 200, text/plain; charset=utf-8, 5846 bytes, lists 24 patterns + 4 posts.
- /feed.xml → 200, application/xml; charset=utf-8; xmllint --noout parses without error.
RSS 2.0 spec cross-check (harvard.edu RSS 2.0 spec): <image> width=144/height=144 within the 144x400 cap; <copyright>/<managingEditor>/<generator> formats conformant; <image> has all three required sub-elements (url, title, link). The 2nd-pass icon downscale to a sibling 144x144 PNG (rather than reusing the 1200x630 og-default) is the right call — confirmed PNG file is 144x144 RGBA via file public/rss-icon-144.png.
Merge-conflict check on CONTACT_EMAIL (same const added on main via #393): cloned the PR branch fresh, ran git merge --no-commit origin/main — Auto-merging src/lib/site-config.ts ... Automatic merge went well. No textual conflict; the PR body is correct that the duplicate is idempotent.
CI status via gh pr checks 394: all 11 required checks pass (ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, 4 E2E shards).
HEAD SHA at review time: 778114539dd62db814b263560e116cdf66fc57f2.

Specific praise (one item, decision-named)

The second-pass fix to generate a spec-compliant 144x144 RSS image (fix(seo): spec-compliant 144x144 RSS feed icon) instead of reusing og-default.png is the right read of RSS 2.0 — strict aggregators (FeedValidator, Feedly) silently drop images outside the 144x400 cap, and the 1200x630 OG default would have been dropped. Adding <width>/<height> elements explicitly is the spec-recommended hint that helps validators avoid heuristic guessing.

Findings

Bottom line

Routes work, XML validates, RSS spec conformant, content correct, CI green, no merge conflicts despite stale base SHA. Approve and ship.

Reviewed by @julianken-bot (fresh context, model: opus) against the 15-rule anti-slop rubric. Implementer/reviewer same-tier risk: yes (likely opus×opus). R8 mandatory second pass performed; one substantive finding surfaced (copyright vs use-policy inconsistency).

julianken-bot · 2026-05-17T17:44:20Z

    <description>${escapeXml(SITE_DESCRIPTION)}</description>
    <language>en</language>
    <lastBuildDate>${new Date().toUTCString()}</lastBuildDate>
+    <copyright>Copyright © 2024–${new Date().getFullYear()} ${escapeXml(SITE_TITLE)}. All rights reserved.</copyright>


SUGGESTION — Mixed signal on AI/content use policy.

public/llms.txt:13 explicitly states the content is "available for AI training, research, and citation with attribution." But this <copyright> element emits "All rights reserved" into the RSS feed.

Strict aggregators and AI ingestion pipelines that read both surfaces will see a contradiction. Three options:

Drop "All rights reserved" — e.g., Copyright © 2024–${year} ${SITE_TITLE}. (no rights clause).

Switch to a Creative Commons hint: CC BY 4.0 — citation requested.

Match the llms.txt language: Copyright © 2024–${year} ${SITE_TITLE}. Available for AI training, research, and citation with attribution.

Verified via curl http://localhost:3000/feed.xml | xmllint --noout (parses) and side-by-side with public/llms.txt.

Non-blocking — neither a spec violation nor a runtime issue.

julianken · 2026-05-17T18:35:49Z

@Mergifyio queue

mergify · 2026-05-17T18:35:58Z

* chore(docs): drop seo-strategy folder; align README to renamed post slug Removes docs/seo-strategy/ — research artifacts from the SEO + AI- discovery analysis funnel, no longer load-bearing now that the gate-1/2/3 work has shipped (#393 #394 #395 #396 #397 #400 #402 #404 #406 #408). History preserved in git. README: align "Recent essays" entry with the renamed post slug (where-agentic-patterns-actually-live → agentic-patterns-in-your-coding-workflow). The rename satisfies Bing Site Scan's 70-char title cap. No redirect deployed — article is two days old, no significant external link equity to preserve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: defer README slug update; gitignore docs/seo-strategy Address julianken-bot review of PR #409: BLOCKER (README:75) — New slug URL serves an SSR 404 fallback because the Payload post slug hasn't been renamed yet (intentionally deferred until the in-flight Bing Site Scan completes). Reverting the README link change here; it will land in a follow-up PR after the actual Payload slug rename, so the link is never broken in main. Plus: add /docs/seo-strategy/ to .gitignore so future analysis-funnel artifacts (phase-*, context-packets, STATUS.md, issues/) stay on disk without polluting the index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

julianken added status:in-review PR open, waiting for review area:seo SEO + AI-discovery strategy work labels May 17, 2026

julianken mentioned this pull request May 17, 2026

infra(seo): ship llms.txt + llms-full.txt, upgrade RSS feed, refresh site description #390

Closed

13 tasks

julianken-bot previously approved these changes May 17, 2026

View reviewed changes

julianken dismissed julianken-bot’s stale review via d8c3cee May 17, 2026 15:40

julianken-bot previously approved these changes May 17, 2026

View reviewed changes

julianken dismissed julianken-bot’s stale review via 7781145 May 17, 2026 17:28

julianken-bot approved these changes May 17, 2026

View reviewed changes

mergify Bot added the queued label May 17, 2026

Merge branch 'main' into infra/seo-discovery-surfaces

b1348ec

mergify Bot merged commit 3dc645b into main May 17, 2026
13 checks passed

mergify Bot deleted the infra/seo-discovery-surfaces branch May 17, 2026 18:39

mergify Bot removed the queued label May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

infra(seo): llms.txt + RSS metadata upgrades + site description#394

infra(seo): llms.txt + RSS metadata upgrades + site description#394
mergify[bot] merged 4 commits into
mainfrom
infra/seo-discovery-surfaces

julianken commented May 17, 2026

Uh oh!

julianken-bot left a comment

Uh oh!

julianken-bot May 17, 2026

Uh oh!

julianken-bot May 17, 2026

Uh oh!

julianken-bot left a comment

Uh oh!

julianken-bot May 17, 2026

Uh oh!

julianken-bot May 17, 2026

Uh oh!

julianken-bot left a comment

Uh oh!

julianken-bot May 17, 2026

Uh oh!

julianken commented May 17, 2026

Uh oh!

mergify Bot commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

julianken commented May 17, 2026

Summary

Changes

public/llms.txt (new, static)

src/app/llms-full.txt/route.ts (new, dynamic)

src/app/feed.xml/route.ts

src/lib/site-config.ts

Test plan

Follow-up issue (not in this PR)

Uh oh!

julianken-bot left a comment

Choose a reason for hiding this comment

Verification ledger (commands run in this turn)

Findings ToC

Specific things this PR did right

Bottom line

Same-tier risk

Uh oh!

julianken-bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

julianken-bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

julianken-bot left a comment

Choose a reason for hiding this comment

Verification ledger (commands run in this turn)

Findings ToC

Specific things this PR did right

Bottom line

Same-tier risk

Uh oh!

julianken-bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

julianken-bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

julianken-bot left a comment

Choose a reason for hiding this comment

Verdict: APPROVE

Verification ledger (commands run this turn)

Specific praise (one item, decision-named)

Findings

Bottom line

Uh oh!

julianken-bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

julianken commented May 17, 2026

Uh oh!

mergify Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`public/llms.txt` (new, static)

`src/app/llms-full.txt/route.ts` (new, dynamic)

`src/app/feed.xml/route.ts`

`src/lib/site-config.ts`

mergify Bot commented May 17, 2026 •

edited

Loading