Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 111 additions & 3 deletions REFERENCE_IMPL.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,20 @@ Each pattern gets up to two implementation folders:
| `langgraph/` | LangGraph `StateGraph` | The pattern as a visible graph with explicit nodes, edges, and conditional routing |
| `langchain/` | LangChain v1 middleware | The pattern as middleware plugged into `create_agent` — less code, less visibility |

Both share the same hook factories ([`hooks.py`](action/d-guardrail-sandwich/hooks.py)) and model configuration ([`model_config.py`](model_config.py)).
Both share the same hook factories ([`shared.py`](action/d-guardrail-sandwich/shared.py)) and model configuration ([`model_config.py`](model_config.py)).

### Implemented

| Pattern | Path | LangGraph | LangChain | Status |
|---------|------|:---------:|:---------:|--------|
| Guardrail Sandwich | `action/d-guardrail-sandwich/` | notebook + html + md | notebook + html + md | Done |
| Prompt Chaining | `action/c-prompt-chaining/` | notebook + html + md | notebook + html + md | Done |

### Roadmap

| Wave | Patterns | Status |
|------|----------|--------|
| 1 | Context Triage, Semantic Compaction, Chain of Thought, Prompt Chaining, Generator-Critic | Planned |
| 1 | ~~Prompt Chaining~~ ✓ · Context Triage, Semantic Compaction, Chain of Thought, Generator-Critic | In progress |
| 2 | Progressive Discovery, Complexity Routing, Iterative Hypothesis, Self-Heal Loop | Planned |
| 3 | Multimodal Fusion, Parallel Exploration, Fan-out & Gather | Planned |
| 4 | Hierarchical Retention, Progress Tracking, Failure Journals, Experience Replay | Planned |
Expand All @@ -39,6 +40,101 @@ Both share the same hook factories ([`hooks.py`](action/d-guardrail-sandwich/hoo

---

## How to draft a new reference implementation

Each RI follows the same recipe (extracted from the Guardrail Sandwich PR):

### 1. Create `shared.py` (shared factories)

A small Python file at `<pattern-folder>/shared.py` that:
- Defines framework-agnostic helper factories (enums, config-dict builders) — gates, hooks, schemas, whatever this pattern needs
- Is imported by **both** the LangGraph and LangChain notebooks
- Keeps the two notebooks in sync without code duplication

Name it `shared.py` after its *role* (the file both notebooks share), not its *contents* — contents vary by pattern (Prompt Chaining shares gate factories; Guardrail Sandwich shares hook factories + a hook runner), but the role is always the same.

### 2. Write the LangGraph notebook (`langgraph/tutorial.ipynb`)

Structure (cell order):

| Section | Cell type | Content |
|---------|-----------|---------|
| Title + quote | markdown | Pattern name, one-line thesis, pointer to langchain version |
| What this pattern does | markdown | 3-sentence explanation + comparison table (LangGraph vs LangChain) |
| Setup | markdown + code | Imports from `shared.py`, `model_config.py`, `nbtools.py`, langgraph |
| State | markdown + code | `TypedDict` state definition, minimal fields |
| Helper utilities | code | ToolNode wrappers, decoders, renderers |
| Core nodes | markdown + code | One section per graph node (explain → implement) |
| Build the graph | code | `StateGraph` assembly, `add_conditional_edges` |
| Mock tools | code | Deterministic fakes for the demo |
| Assemble demo | code | Instantiate factories + graph, display graph PNG |
| Mock runs (3–4) | markdown + code pairs | One run per scenario (happy path, pre-block, post-block, error) |
| Real backend | markdown + code | `get_model()` + free-text → structured args → same graph |
| What to remember | markdown | 5 bullet recap |
| Further reading | markdown | Links to langchain version, pattern README, REFERENCE_IMPL.md, official docs |

### 3. Write the LangChain notebook (`langchain/tutorial.ipynb`)

Same conceptual flow but uses LangChain middleware / LCEL instead of explicit graph nodes. Highlights the **less code, less visibility** trade-off.

### 4. Export artifacts

From each notebook directory:
```bash
uv run jupyter nbconvert --to html tutorial.ipynb
uv run jupyter nbconvert --to markdown tutorial.ipynb
```

### 5. Update this file

- Move the pattern from "Planned" to "Done" in the roadmap table
- Add a row to the **Implemented** table with path + status

### Key conventions

- Both notebooks add `shared.py`, `model_config.py`, and `nbtools.py` to `sys.path` by **searching upward for each file by name** (`next(p for p in (Path.cwd(), *Path.cwd().parents) if (p / marker).exists())`) — not by counting `../..`, which breaks if the folder depth changes
- Name the per-pattern shared-factory file `shared.py` (after its role), not after its contents (`gates.py`/`hooks.py`) — the contents differ per pattern, the role doesn't
- Root-level shared helpers live in `nbtools.py` (display/util plumbing, e.g. `show_graph`) and `model_config.py` (model loading) — the counterpart to each pattern's `shared.py`. Only put genuinely cross-cutting, framework-agnostic helpers there; pattern logic stays in `shared.py` or the notebook
- Default model: `ernie:ernie-5.1` via AI Studio (OpenAI-compatible)
- Mock runs first (no API key needed), real LLM section last
- `print_trace()` or equivalent — short, readable audit output
- Graph visualization via the shared `show_graph()` from root `nbtools.py` (`from nbtools import show_graph`) — it tries `draw_mermaid_png()` and falls back to ASCII (`draw_ascii()`, needs `grandalf`) when the remote Mermaid API is unreachable. Don't re-implement it per notebook; pass an `alt=` label.
- **Avoid Mermaid reserved words as node/step ids** (`style`, `end`, `graph`, `subgraph`, `class`, …). A node literally named `style` makes `draw_mermaid_png()` return HTTP 400 — rename it (e.g. `restyle`)
- For deterministic mock runs (no API key), use **`FakeListChatModel`** from `langchain_core.language_models.fake_chat_models` — it returns a fixed list of replies **in call order and cycles** when exhausted, so a multi-step pipeline's call sequence lines up with the list and retries keep working. Do **not** use `GenericFakeChatModel` (its `messages` iterator *exhausts* and raises `StopIteration` on re-invocation, breaking any retry demo) or `FakeChatModel` (constant `"fake response"`, can't satisfy gates). The `fake` module's `FakeListLLM`/`FakeStreamingListLLM` are old-style string LLMs, not chat models — wrong type for a `ChatPromptTemplate | model` pipe.
- Keep framework-agnostic mock helpers framework-agnostic: the LangGraph `llm_call` is a bare `Callable`, so its fakes are plain Python functions (no langchain dependency) — don't reach for `FakeListChatModel` there.
- Use `from __future__ import annotations` + type hints everywhere
- Enums are `(str, Enum)` for JSON serialization
- Dataclasses (e.g. a `StepSpec`): only fields the logic reads — drop set-but-unused fields (e.g. a `description` nobody renders), and prefer a required field over a verbose `field(default_factory=...)` default that's never exercised
- Hook/gate crashes → fail-closed (BLOCK/reject), never silently pass
- No trailing `print("X ready")` cell-end noise; keep only prints that show real results (traces, smoke tests, accumulated state)

### Keep the two impls aligned

A reader should be able to flip between the LangGraph and LangChain notebooks
and recognize the same pattern. Align everything that isn't *forced* to differ
by the framework:

- **Same vocabulary for the same concept.** If one side calls a parameter
`prompt_template` / `max_retries`, the other side uses the same names — not
`template` / `retries`. Same for the core abstraction (e.g. a "step", a
"gate"), the gate-runner function, and the trace fields.
- **Same demo, same data, same gates.** Both run the identical scenario
(e.g. the 2-step `rewrite → factcheck` pipeline) on the identical input,
with gates built from the **same `shared.py` factories** — so outputs line up
side by side.
- **Same safety semantics.** Fail-closed, bounded-retry, and any anchored
checks (e.g. `starts_with_gate` for a one-word verdict, not a substring
match) must behave identically on both sides. Fix a bug in one → fix it in
the other in the same pass.
- **Let only the framework mechanism differ**, and name that difference
explicitly in the comparison table (StateGraph nodes/edges + manual retry
loop vs. LCEL `prompt | model | parser | gate` + `.with_retry()`). The
*shape* differs; the *concepts and names* should not.
- **No dead imports / unused factories.** When a fix changes which gate a
notebook uses, drop the now-unused import from **both** notebooks.

---

## Quick start

```bash
Expand Down Expand Up @@ -91,10 +187,11 @@ uv run pytest --import-mode=importlib -v
```
.env.example # LLM provider config — copy to .env
model_config.py # Shared model loader (register_model_provider + load_chat_model)
nbtools.py # Shared notebook helpers (show_graph: PNG with ASCII fallback)
REFERENCE_IMPL.md # This file

action/d-guardrail-sandwich/
hooks.py # Shared hook factories (amount_threshold, blocklist, output_schema)
shared.py # Shared hook factories + runner (amount_threshold, blocklist, output_schema, run_single_hook)
langgraph/
tutorial.ipynb # LangGraph StateGraph implementation
tutorial.html # Pre-rendered HTML
Expand All @@ -103,4 +200,15 @@ action/d-guardrail-sandwich/
tutorial.ipynb # LangChain middleware implementation
tutorial.html # Pre-rendered HTML
tutorial.md # Markdown export

action/c-prompt-chaining/
shared.py # Shared gate factories (length_gate, keys_gate, starts_with_gate, json_gate, regex_gate, any_gate, all_gate)
langgraph/
tutorial.ipynb # LangGraph StateGraph: one node per step + early-exit routing
tutorial.html # Pre-rendered HTML
tutorial.md # Markdown export
langchain/
tutorial.ipynb # LangChain LCEL: prompt | model | parser | gate, with .with_retry()
tutorial.html # Pre-rendered HTML
tutorial.md # Markdown export
```
Loading
Loading