Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# CLAUDE.md

Guidance for AI coding agents (and humans) working in this repository.

## What this is

`docx_plus` is an OOXML-level extension layer for [python-docx](https://github.com/python-openxml/python-docx).
It reaches the parts of the `.docx` format python-docx does not expose — the
style cascade, content controls, fields, anchored comments, layout, bookmarks,
footnotes/endnotes, publishing (TOC/captions), tracked changes, and document
protection — while leaving the underlying `Document` object fully usable.

- **Scope discipline:** keep this a lean python-docx *extension*. It is not a
document-authoring framework and does not do live Word automation. Adjacent
ideas belong in sibling projects, not here.
- Authoritative API contract: `SPEC.md` (original design) + `ROADMAP.md` (live
shipped/deferred status). `CHANGELOG.md` is the per-release record.

## Environment & tooling

This project uses **`uv`** for everything. Never call bare `python` or `pip`.

```bash
uv sync --extra dev # install package + dev deps (single source: pyproject [project.optional-dependencies] dev)
uv run pytest # run the test suite (configured in [tool.pytest.ini_options])
uv run pytest tests/test_foo.py -k name # one file / one test
uv run mypy # strict type-check (files = ["docx_plus"])
uv run ruff check # lint (rules: E,F,W,I,B,UP,D — Google docstrings)
uv run ruff format # format (line-length 100)
uv run mkdocs serve # preview docs locally
uv run mkdocs build --strict # docs must build link-clean
```

Pre-commit mirrors the CI lint gate: `uv run pre-commit run --all-files`.

Run an example: `uv run python -m docx_plus.examples.<name>` (e.g. `track_changes`).
Run the CLI: `uv run docx-plus inspect FILE` or `uv run python -m docx_plus.cli`.

## Architecture

Layered, one-way dependencies:

- `core/` — foundation: `DocxPlusError` (base of every typed error), namespace
map (`ns`), OOXML element helpers (`oxml`), id allocation (`ids`), separate
OOXML parts (`parts`). Depends on nothing above it.
- **Capability modules** — `styles/`, `controls/`, `fields/`, `comments/`,
`layout/`, `bookmarks/`, `notes/`, `publishing/`, `revisions/`, `protection/`.
Each builds on `core/` and is largely independent of its siblings.
- `cli/` — argparse console entry point (`docx-plus`) that composes the
capability modules. This is the one layer that legitimately imports across
capabilities.
- `examples/`, `_testing/` — runnable examples and test-only OOXML assertions;
excluded from coverage and the public API.

Each subpackage's `__init__.py` `__all__` is the authoritative public surface
for that module. `docs/ARCHITECTURE.md` has the full module-by-module breakdown.

## Conventions

- **Errors:** every public exception subclasses `core.DocxPlusError`; dual-inherit
a stdlib type where it aids `except` ergonomics (e.g.
`RevisionNotFoundError(DocxPlusError, KeyError)`).
- **Typing:** mypy `strict = True` must pass with zero ignores. `warn_unused_ignores`
is on. Public APIs are fully typed (`Typing :: Typed`).
- **Docstrings:** Google convention (ruff `D`). Tests, `_testing/`, and
`examples/` are exempt from `D`.
- **Python:** target 3.10+ (`target-version = py310`); CI tests 3.10–3.13.
- **Coverage:** `fail_under = 90`. New code needs tests.

## Gotchas

- **xpath:** `BaseOxmlElement.xpath()` does NOT accept a `namespaces=` kwarg.
Use `etree.XPath(expr, namespaces=NSMAP)(node)` instead.
- **Examples must be cp1252-safe:** print ASCII only to stdout so
`python -m docx_plus.examples.<name>` runs on a default Windows console.
- **Paragraph index base differs by surface:** the CLI `inspect` command numbers
paragraphs 1-based; the library `read_*` functions use 0-based `paragraph_index`.

## Releasing

```bash
uv run bump-my-version bump {major|minor|patch} --dry-run -v # preview first; tree must be clean
```

Bumps `pyproject.toml` + `docx_plus/__init__.py`, commits, and tags `vX.Y.Z`.
`CHANGELOG.md` is maintained by hand. After a release, re-stamp the prose docs
(README, `docs/index.md`, `docs/API.md`, `docs/ARCHITECTURE.md`, `docs/SKILLS.md`)
for the new version — these have historically lagged behind the bump.
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Composes with python-docx rather than replacing it: callers keep their
`Document` object and use `docx_plus` for the operations python-docx
can't reach.

**Capabilities** (v0.1, v0.2, and v0.2 in-place expansion):
**Capabilities** (v0.1 through v0.3):

- **Style cascade**: read the effective formatting that would apply to
any paragraph/run/cell, with per-field provenance; modify styles in
Expand Down Expand Up @@ -41,7 +41,7 @@ can't reach.
library — `inspect` (effective formatting), `restyle` (style
remapping), and `controls` (list / set / clear control values).

> **Status:** v0.2.0 is the current release, published on
> **Status:** v0.3.0 is the current release, published on 2026-06-15 to
> [PyPI](https://pypi.org/project/docx-plus/). Read [`SPEC.md`](SPEC.md) for
> the API contract and [`IMPLEMENTATION.md`](IMPLEMENTATION.md) for the
> build plan.
Expand Down Expand Up @@ -348,17 +348,19 @@ $ docx-plus controls list form.docx --json # every content control
$ docx-plus controls set form.docx --tag name --value "Ada Lovelace" -o filled.docx
```

Read commands (`inspect`, `controls list`) take `--json`; mutating
commands (`restyle`, `controls set` / `clear`) require `-o/--output` (or
`--in-place`) so the source is never overwritten by accident. Full
Read commands (`inspect`, `controls list`) take `--json`; so does
`restyle`, which emits its resolved target→style-id mapping as JSON.
Mutating commands (`restyle`, `controls set` / `clear`) require
`-o/--output` (or `--in-place`) so the source is never overwritten by
accident. Full
reference: [`docs/cli.md`](https://thomas-villani.github.io/docx-plus/cli/).

## What's next

v0.2 ships the feature modules listed at the top of this README, plus
the in-place expansion (line numbering, page borders, conditional
table-style formatting, comment / note editing, and the publishing
module). v0.3 adds **tracked changes** (read/write revision marks) and
module). v0.3 added **tracked changes** (read/write revision marks) and
the **`docx-plus` CLI** (`inspect` / `restyle` / `controls`).
[`ROADMAP.md`](ROADMAP.md) tracks what comes after: the backlog
holds `STYLEREF` / sequence-field cross-references, w15 threaded
Expand All @@ -378,6 +380,8 @@ if your use case needs any of these.
`bookmarks/`, `notes/`, plus the in-place expansion (toggle
properties, in-place edit verbs, line numbering, page borders,
conditional table styles, and the `publishing/` module).
- **v0.3.0** — complete: tracked changes (`revisions/`) and the
`docx-plus` command line (`cli/`).

The per-phase log with dates lives in `IMPLEMENTATION.md` §12.

Expand Down
50 changes: 44 additions & 6 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,25 +33,27 @@ Google-style docstring (enforced by ruff's `D` ruleset on `docx_plus/`).

---

## Public surface at v0.2
## Public surface at v0.3

v0.1's six phases, the initial v0.2 cycle (comments, layout, bookmarks
/ cross-references, footnotes / endnotes), and the v0.2 in-place
/ cross-references, footnotes / endnotes), the v0.2 in-place
expansion (toggle props, in-place comment / note edits, line numbering,
page borders, conditional table-style formatting, publishing module)
are all complete. Nine runnable example scripts in
page borders, conditional table-style formatting, publishing module),
and the v0.3 cycle (tracked changes, the `docx-plus` CLI) are all
complete. Ten runnable example scripts in
`docx_plus/examples/` demonstrate the surface: `inspect_document.py`,
`restyle_existing.py`, `build_form.py`, `populate_form.py`,
`add_comments.py`, `multi_column_layout.py`, `bookmarks_and_xrefs.py`,
`footnotes_and_endnotes.py`, `publishing_layout.py`. Start there if you
`footnotes_and_endnotes.py`, `publishing_layout.py`,
`track_changes.py`. Start there if you
want to see the library in motion before reading the index.

### `docx_plus` (top-level package)

| Symbol | Kind | Notes |
|---|---|---|
| `DocxPlusError` | exception | Root of every typed library error. See [`ARCHITECTURE.md` §9](ARCHITECTURE.md#9-error-hierarchy) |
| `__version__` | str | `"0.2.0"` |
| `__version__` | str | `"0.3.0"` |

### `docx_plus.core`

Expand Down Expand Up @@ -301,6 +303,42 @@ next open. Architecture walkthrough in
| `add_caption(paragraph, label, *, caption_type="Figure", numbering="ARABIC")` | function | Label text run + `SEQ <caption_type> \* <numbering>` complex field. `caption_type` must match the `\c` switch on a downstream Table of Figures |
| `add_table_of_figures(paragraph, *, caption_type="Figure", hyperlink=True)` | function | Append a `TOC \c "<caption_type>"` complex field that collects matching captions |

### `docx_plus.revisions`

Tracked changes — read, author, and resolve OOXML revision marks
(`w:ins` / `w:del` / move wrappers / property-change markers).
python-docx cannot read or write tracked changes at all; this module
fills the gap. Scoped in `ROADMAP.md` §1 at the repo root.

| Symbol | Kind | Notes |
|---|---|---|
| `enable_track_changes(doc)` | function | Write `<w:trackChanges/>` into `settings.xml` so Word records every subsequent user edit as a revision. Idempotent (normalises a pre-existing element to "on", collapses duplicates) |
| `disable_track_changes(doc)` | function | Remove every `<w:trackChanges/>`. Idempotent. Existing body revision marks are untouched |
| `mark_insertion(target, *, author="", date=None, id_registry=None)` | function | Wrap existing run(s) in `<w:ins>`. `date=None` stamps current UTC (ms precision). Returns `RevisionRef` |
| `mark_deletion(target, *, author="", date=None, id_registry=None)` | function | Wrap existing run(s) in `<w:del>` and retag each `<w:t>` to `<w:delText>`. Returns `RevisionRef` |
| `read_revisions(doc)` | function | Enumerate every revision in document order, each paired with its metadata and affected text. Returns `list[TrackedChange]` |
| `accept_revision(doc, revision_id)` | function | Accept the revision(s) carrying `revision_id`, keeping the recorded edit. Raises `RevisionNotFoundError` if absent |
| `reject_revision(doc, revision_id)` | function | Reject the revision(s) carrying `revision_id`, restoring the prior state. Raises `RevisionNotFoundError` if absent |
| `accept_all_revisions(doc)` | function | Accept every tracked change. Idempotent; resolves innermost-first |
| `reject_all_revisions(doc)` | function | Reject every tracked change. Idempotent; resolves innermost-first |
| `RevisionRef` | dataclass (frozen) | Write-side handle: `revision_id`, `body_element` (the `<w:ins>` / `<w:del>` element) |
| `TrackedChange` | dataclass (frozen) | Read-side result: `revision_id`, `revision_type`, `author`, `timestamp`, `text`, `paragraph_index` |
| `RevisionIdRegistry(doc)` | class | Per-document revision-id allocator. All revision types share one `w:id` namespace; seeds from every revision-bearing element in the body |
| `RevisionType` | type alias | `Literal["insertion", "deletion", "move_from", "move_to", "format_run", "format_paragraph", "paragraph_mark_insertion", "paragraph_mark_deletion"]` |
| `RevisionTarget` | type alias | `Run | Paragraph | tuple[Run, Run]` — same target shapes as `add_comment`; a range must lie within one paragraph |
| `RevisionNotFoundError` | exception | Dual-bases: `DocxPlusError, KeyError`. `accept_revision` / `reject_revision` on a missing id |

### `docx_plus.cli`

The `docx-plus` command-line interface — a thin shell over the library
(each subcommand wraps one tested function). Full reference, including
every subcommand and flag, lives in [`cli.md`](cli.md).

| Symbol | Kind | Notes |
|---|---|---|
| `main(argv=None)` | function | Console entry point (`docx-plus = "docx_plus.cli:main"`; also `python -m docx_plus.cli`). Returns `0` on success, `1` on a handled library/CLI error, `2` when no command was given |
| `build_parser()` | function | Construct the top-level `argparse.ArgumentParser` with every subcommand registered |

---

## Internal modules (not part of the public API)
Expand Down
Loading