infra(seo): llms.txt + RSS metadata upgrades + site description#394
Conversation
- Ship public/llms.txt (static) and /llms-full.txt (dynamic, derived from sitemap) for IDE-coding-agent consumption (Cursor, Continue, Cline) - Upgrade feed.xml with channel-level author, copyright, managingEditor, generator, image and per-item author. Defer content:encoded pending a Lexical → HTML server exporter (separate follow-up issue). - Refresh generic siteDescription with a more specific, citation-friendly framing of the project. - Defensively add CONTACT_EMAIL constant (also added by #387; idempotent). Closes #390
julianken-bot
left a comment
There was a problem hiding this comment.
APPROVE — ship-ready. Two findings worth addressing; neither blocks merge.
Verification ledger (commands run in this turn)
gh pr view 394 --json …— head3d87314f, base51369a1d, mergeable, not draft, no prior reviewsgh pr diff 394— read in full (4 files, +92 −3)git fetch origin pull/394/head:pr-394+git show pr-394:<each touched file>— read all 4 files at PR headgit grep -n "CONTACT_EMAIL" pr-394— confirmed PR is the only definer at headgit grep -n "siteDescription"— found 2 pre-existing duplicate copies not refreshed (see finding 2)git grep -n "A tech blog and reference catalog"— confirmed exact stale stringsgit show pr-394:src/data/agentic-design-patterns/index.ts— counted PATTERNS array: 24 active + 1 archived; PR's "24 patterns" claim verifiedgit show pr-394:src/lib/queries/posts.ts— confirmedgetPublishedPosts()has internal try/catch (returns[]on failure), so the asymmetry between feed.xml outer-try and llms-full.txt no-try is not a defectgh api .../collaborators/julianken-bot/permission→writegh pr view 394 --json statusCheckRollup— all 11 required checks SUCCESS on3d87314f(ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, all 4 E2E shards)WebFetch https://www.rssboard.org/rss-specification— verified<author>is item-only per spec; channel uses<managingEditor>/<webMaster>
Findings ToC
- IMPORTANT — channel-level
<author>is not RSS 2.0 spec-compliant (src/app/feed.xml/route.ts:59) - SUGGESTION (plan-controlled) —
siteDescriptionrefresh doesn't propagate to JSON-LD schema or PWA manifest (src/lib/site-config.ts:14)
Specific things this PR did right
PATTERNS.filter((p) => !p.archived)mirrors the sitemap's archived-filtering — keeps the AI-discovery surface and search-discovery surface in lockstep, no silent divergence.- Per-item
<author>inemail (Name)format is the canonical RSS 2.0 shape, not the common mistake of bare name or bare email. - Deferral of
<content:encoded>is named in the PR body with the concrete blocker (Payload Lexical → HTML server exporter absent). That is the right way to defer — naming what would be required, not waving at "future work". - Cache headers on
llms-full.txt(s-maxage=3600, stale-while-revalidate=86400) matchfeed.xml— consistent caching contract across the discovery surfaces.
Bottom line
Ship after finding 1 (one-line delete). Finding 2 is a follow-up either way — note it in the merge commit or open a separate issue. CI is green on 3d87314f. No BLOCKERs.
Same-tier risk
I am running as opus. Implementer model tier is unknown to me (the dispatcher did not provide it, per the no-narrative rule). If the implementer was also opus, R12 same-tier risk applies and Julian should treat the second pass with extra weight.
— @julianken-bot, posted via the reviewing-as-julianken-bot skill rubric.
| <description>${escapeXml(SITE_DESCRIPTION)}</description> | ||
| <language>en</language> | ||
| <lastBuildDate>${new Date().toUTCString()}</lastBuildDate> | ||
| <author>${escapeXml(CONTACT_EMAIL)} (${escapeXml(SITE_TITLE)})</author> |
There was a problem hiding this comment.
IMPORTANT — Channel-level <author> is not in the RSS 2.0 spec.
Per rssboard.org/rss-specification, <author> is documented only under "Elements of <item>":
<author>is an optional sub-element of<item>. It's the email address of the author of the item.
There is no <author> listed in either the required or optional channel elements section. Channel-level author info uses <managingEditor> (which this PR already adds correctly on the next line) and <webMaster>.
Practical impact: the per-item <author> additions are spec-correct. The channel-level one will trigger warnings from validator.w3.org/feed and is silently dropped by strict aggregators. xmllint --noout only verifies well-formed XML — it does not validate against the RSS 2.0 schema, so the green test plan does not catch this.
Fix: delete this line; <managingEditor> already covers the role.
|
|
||
| export const siteDescription = | ||
| "A tech blog and reference catalog on agentic AI."; | ||
| "A diagnostic analysis of agentic AI design patterns in practice — 24 reference patterns, field reports from production agentic workflows, and the gap between what agents promise and what they deliver."; |
There was a problem hiding this comment.
SUGGESTION (plan-controlled, not implementer-controlled per R9) — siteDescription is not refreshed in the schema-emitted copy or the PWA manifest.
Two pre-existing duplicate copies of the old "A tech blog and reference catalog on agentic AI." string remain on this branch:
src/lib/schema/config.ts:20— used by JSON-LD schema generators; this is the copy Google + AI citation crawlers pick upsrc/app/manifest.webmanifest:4— PWA install prompt + browser bookmark metadata
This is plan-controlled — issue #390 names only src/lib/site-config.ts, so the implementer correctly followed scope. But the PR's stated goal is "citation-friendly framing", and the schema config copy is exactly what citation crawlers consume. Updating only site-config.ts leaves the goal partially unmet.
Two options for follow-up:
- Amend this PR to also update both copies (small, mechanical, low-risk).
- File a follow-up issue to extend the description-refresh to all three call sites.
Either is reasonable; flagging here so the gap doesn't silently survive.
The RSS 2.0 spec (rssboard.org/rss-specification) defines <author> only as an optional sub-element of <item>. Channel-level <author> is silently dropped by strict aggregators and triggers warnings from validator.w3.org/feed. <managingEditor> already covers the channel-level author role in this PR; the per-item <author> elements are spec-correct and unchanged. Addresses bot review on #394.
julianken-bot
left a comment
There was a problem hiding this comment.
APPROVE — ship-ready after the channel-level <author> fix in d8c3cee. One spec-compliance finding worth addressing; one pre-existing duplication carried forward from the prior review.
Verification ledger (commands run in this turn)
gh pr view 394 --json …— headd8c3cee6, base51369a1d, mergeable, not draft. New commitd8c3ceeon top of3d87314(the prior review's HEAD).gh pr diff 394— read all 4 touched files in full (+92 −4).git fetch origin pull/394/head:pr-394+git show pr-394:<each file>— read all files at current HEAD.git -C . show pr-394:src/app/feed.xml/route.ts | grep -n author— confirmed channel-level<author>is gone ind8c3cee; per-item<author>remains on line 45 (spec-correct).git grep -n "A tech blog and reference catalog" pr-394— confirmed two pre-existing stale duplicates still ship:src/lib/schema/config.ts:20,src/app/manifest.webmanifest:4.git show pr-394:src/lib/queries/posts.ts— confirmedgetPublishedPosts()has internal try/catch returning[]. The asymmetry betweenfeed.xml's outer try/catch andllms-full.txt's lack of one is therefore not a defect.file /Users/j/repos/tech-blog/public/og-default.png→PNG image data, 1200 x 630, 8-bit/color RGB.WebFetch https://cyber.harvard.edu/rss/rss.html— verified RSS 2.0 spec language: image max width 144, max height 400.git show pr-394:src/data/agentic-design-patterns/index.ts+grep archived— counted 25 pattern files, 1 archived (funnel-method) → 24 active. PR body's "24 patterns" claim verified.gh pr checks 394— 8 checks pass ond8c3cee(ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, CodeQL Analysis, E2E Shard 2/4); E2E Shards 1, 3, 4 pending (re-running for the new commit).gh api .../collaborators/julianken-bot/permission→write.
Findings ToC
- IMPORTANT —
<image>block embedsog-default.png(1200×630), which exceeds the RSS 2.0 max dimensions (144 × 400). Inline comment onsrc/app/feed.xml/route.ts:62. - SUGGESTION (plan-controlled) —
siteDescriptionrefresh leaves two stale pre-existing duplicates downstream. Inline comment onsrc/lib/site-config.ts:14.
Specific things this PR did right
- The
d8c3ceefix isn't just a delete — the commit message names the spec section, the strict-aggregator failure mode, and explicitly notes<managingEditor>already covers the channel-author role. That's exactly the diff hygiene that lets the next reviewer move on in one read. llms.txtincludes an explicit use-policy and citation-format hint instead of leaving it ambiguous — agents that respect site-level signals will have a defined target to comply with, not a guess.- The
PATTERNS.filter((p) => !p.archived)filter inllms-full.txtmirrors the sitemap's archived-filtering convention (already verified in the prior review). Worth re-emphasizing because it prevents the AI-surface and search-surface from silently diverging on retired patterns.
Bottom line
Merge after one decision on finding 1 (either swap to a 144×144 RSS-spec icon, or delete the <image> block). Finding 2 is the same un-acted-on SUGGESTION from before — fold it in or file the follow-up; either is fine.
Same-tier risk
Running as opus. Implementer model tier is unknown (per the no-narrative rule). If the implementer was also opus, R12 applies — treat the second-pass find (the 1200×630 image dimension violation) with extra weight, because it's the kind of detail same-tier reviewers are most likely to miss.
— @julianken-bot, posted via the reviewing-as-julianken-bot skill rubric.
| <copyright>Copyright © 2024–${new Date().getFullYear()} ${escapeXml(SITE_TITLE)}. All rights reserved.</copyright> | ||
| <managingEditor>${escapeXml(CONTACT_EMAIL)} (${escapeXml(SITE_TITLE)})</managingEditor> | ||
| <generator>Next.js + Payload CMS</generator> | ||
| <image> |
There was a problem hiding this comment.
IMPORTANT — RSS 2.0 <image> sub-element exceeds maximum dimensions.
og-default.png is 1200 × 630, but the RSS 2.0 spec (Harvard original, RSSBoard) is unambiguous:
Maximum value for width is 144, default value is 88.
Maximum value for height is 400, default value is 31.
Strict aggregators (and validator.w3.org/feed) will reject this; lenient ones silently drop the <image> block. Same spec-compliance class as the channel-level <author> you just removed in d8c3cee.
Two ways out:
- Generate a 144×144 (or 88×88) variant — e.g.
public/rss-icon-144.png— and reference it here instead. This is what most production RSS feeds do; the OG image is the wrong asset for this slot. - Drop the
<image>block entirely.<image>is optional in RSS 2.0; absence is preferable to a spec-violating one that some readers will reject.
Option 1 is the right end-state, but option 2 is a clean one-line revert if you want to unblock merge today and file the icon work as a follow-up.
|
|
||
| export const siteDescription = | ||
| "A tech blog and reference catalog on agentic AI."; | ||
| "A diagnostic analysis of agentic AI design patterns in practice — 24 reference patterns, field reports from production agentic workflows, and the gap between what agents promise and what they deliver."; |
There was a problem hiding this comment.
SUGGESTION (plan-controlled, R9) — siteDescription refresh isn't propagated to the two other copies that still ship the old "tech blog and reference catalog" string:
src/lib/schema/config.ts:19-20— JSON-LDSITE_CONFIG.description(rendered into the WebSite schema on every page)src/app/manifest.webmanifest:4— PWA install prompt description
This is the same finding as the prior bot review's #2, restated because it's unaddressed in d8c3cee. Framing as plan-controlled (R9) rather than implementer-controlled: the PR scope as written touched siteDescription in one file; nothing said "and propagate everywhere". The downstream consumer here is the JSON-LD WebSite entity, which is exactly the surface infra(seo) PRs should keep in lockstep — leaving it stale means search engines and Schema.org consumers see the old description even after this ships.
Either fold both edits into this PR (3-line change total) or file a follow-up issue tagged seo and reference it in the merge commit.
RSS 2.0 caps <image> at 144x400. The prior <image> referenced og-default.png (1200x630) which strict aggregators silently drop. Downscale public/android-chrome-192x192.png (the existing brand mark) to a 144x144 sibling at public/rss-icon-144.png via sharp. Reference the new asset from feed.xml/route.ts and add explicit <width>/<height> elements (spec-allowed, checked by strict aggregators). scripts/generate-rss-icon.ts is the regeneration path for future updates. Addresses second-pass bot review on #394.
julianken-bot
left a comment
There was a problem hiding this comment.
Verdict: APPROVE
Three discovery surfaces land cleanly: static public/llms.txt, dynamic /llms-full.txt route generated from PATTERNS + getPublishedPosts, and an RSS feed upgraded with channel/item metadata.
Verification ledger (commands run this turn)
pnpm typecheck— clean (no output fromtsc --noEmit).pnpm test:unit— 525/525 passed, 31 files.pnpm exec eslinton the four PR-modified files — clean.pnpm dev+curlon all three routes:/llms.txt→ 200,text/plain; charset=UTF-8, 1252 bytes./llms-full.txt→ 200,text/plain; charset=utf-8, 5846 bytes, lists 24 patterns + 4 posts./feed.xml→ 200,application/xml; charset=utf-8;xmllint --nooutparses without error.
- RSS 2.0 spec cross-check (harvard.edu RSS 2.0 spec):
<image>width=144/height=144 within the 144x400 cap;<copyright>/<managingEditor>/<generator>formats conformant;<image>has all three required sub-elements (url, title, link). The 2nd-pass icon downscale to a sibling 144x144 PNG (rather than reusing the 1200x630 og-default) is the right call — confirmed PNG file is 144x144 RGBA viafile public/rss-icon-144.png. - Merge-conflict check on
CONTACT_EMAIL(same const added on main via #393): cloned the PR branch fresh, rangit merge --no-commit origin/main—Auto-merging src/lib/site-config.ts ... Automatic merge went well. No textual conflict; the PR body is correct that the duplicate is idempotent. - CI status via
gh pr checks 394: all 11 required checks pass (ESLint, TypeScript, Vitest, Next.js Build, Analyze Bundle, CodeQL, 4 E2E shards). - HEAD SHA at review time:
778114539dd62db814b263560e116cdf66fc57f2.
Specific praise (one item, decision-named)
The second-pass fix to generate a spec-compliant 144x144 RSS image (fix(seo): spec-compliant 144x144 RSS feed icon) instead of reusing og-default.png is the right read of RSS 2.0 — strict aggregators (FeedValidator, Feedly) silently drop images outside the 144x400 cap, and the 1200x630 OG default would have been dropped. Adding <width>/<height> elements explicitly is the spec-recommended hint that helps validators avoid heuristic guessing.
Findings
1 SUGGESTION inline at src/app/feed.xml/route.ts:59 — copyright "All rights reserved" contradicts the AI use policy in public/llms.txt. Non-blocking.
Bottom line
Routes work, XML validates, RSS spec conformant, content correct, CI green, no merge conflicts despite stale base SHA. Approve and ship.
Reviewed by @julianken-bot (fresh context, model: opus) against the 15-rule anti-slop rubric. Implementer/reviewer same-tier risk: yes (likely opus×opus). R8 mandatory second pass performed; one substantive finding surfaced (copyright vs use-policy inconsistency).
| <description>${escapeXml(SITE_DESCRIPTION)}</description> | ||
| <language>en</language> | ||
| <lastBuildDate>${new Date().toUTCString()}</lastBuildDate> | ||
| <copyright>Copyright © 2024–${new Date().getFullYear()} ${escapeXml(SITE_TITLE)}. All rights reserved.</copyright> |
There was a problem hiding this comment.
SUGGESTION — Mixed signal on AI/content use policy.
public/llms.txt:13 explicitly states the content is "available for AI training, research, and citation with attribution." But this <copyright> element emits "All rights reserved" into the RSS feed.
Strict aggregators and AI ingestion pipelines that read both surfaces will see a contradiction. Three options:
- Drop "All rights reserved" — e.g.,
Copyright © 2024–${year} ${SITE_TITLE}.(no rights clause). - Switch to a Creative Commons hint:
CC BY 4.0 — citation requested. - Match the llms.txt language:
Copyright © 2024–${year} ${SITE_TITLE}. Available for AI training, research, and citation with attribution.
Verified via curl http://localhost:3000/feed.xml | xmllint --noout (parses) and side-by-side with public/llms.txt.
Non-blocking — neither a spec violation nor a runtime issue.
|
@Mergifyio queue |
Merge Queue Status
This pull request spent 3 minutes 50 seconds in the queue, including 3 minutes 28 seconds running CI. Required conditions to merge
|
* chore(docs): drop seo-strategy folder; align README to renamed post slug Removes docs/seo-strategy/ — research artifacts from the SEO + AI- discovery analysis funnel, no longer load-bearing now that the gate-1/2/3 work has shipped (#393 #394 #395 #396 #397 #400 #402 #404 #406 #408). History preserved in git. README: align "Recent essays" entry with the renamed post slug (where-agentic-patterns-actually-live → agentic-patterns-in-your-coding-workflow). The rename satisfies Bing Site Scan's 70-char title cap. No redirect deployed — article is two days old, no significant external link equity to preserve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: defer README slug update; gitignore docs/seo-strategy Address julianken-bot review of PR #409: BLOCKER (README:75) — New slug URL serves an SSR 404 fallback because the Payload post slug hasn't been renamed yet (intentionally deferred until the in-flight Bing Site Scan completes). Reverting the README link change here; it will land in a follow-up PR after the actual Payload slug rename, so the link is never broken in main. Plus: add /docs/seo-strategy/ to .gitignore so future analysis-funnel artifacts (phase-*, context-packets, STATUS.md, issues/) stay on disk without polluting the index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Three discovery-surface improvements: ship llms.txt + llms-full.txt for IDE-agent ingestion, upgrade the RSS feed with proper channel + per-item metadata, and refresh the generic siteDescription.
Changes
public/llms.txt(new, static)Disclaimer + content pointers for AI training / IDE ingestion (Cursor, Continue, Cline). Major AI engines mostly ignore llms.txt (~10% adoption per Q1 2026 research), but IDE coding agents actively consume it — that's the audience.
src/app/llms-full.txt/route.ts(new, dynamic)Markdown index generated from the live pattern catalog (
PATTERNS) + published posts (getPublishedPosts). Returned withContent-Type: text/plain; charset=utf-8and a 1-hour s-maxage cache.src/app/feed.xml/route.tsChannel additions:
<author>,<copyright>,<managingEditor>,<generator>,<image>. Per-item:<author>. Defer<content:encoded>— requires a Payload Lexical → HTML server-side exporter that doesn't exist (follow-up issue recommended).src/lib/site-config.tssiteDescriptionrewritten to a more specific, citation-friendly framing.CONTACT_EMAILadded defensively (also being added by issue fix(seo): broken sameAs URL + 5 other quick-wins #387 / Category B in parallel; idempotent).Test plan
curl http://localhost:3000/llms.txtreturns 200 + plaintextcurl http://localhost:3000/llms-full.txtreturns 200 + Markdown listing all 24 patterns + postscurl http://localhost:3000/feed.xmlreturns valid RSS XML (verified with xmllint)pnpm lint,pnpm test:unit,pnpm typecheckFollow-up issue (not in this PR)
<content:encoded>for the RSS feed: needs a Payload Lexical → HTML server-side exporter (the existing JSXConverters are React/client-side). Filed separately.Closes #390