-
Notifications
You must be signed in to change notification settings - Fork 95
Adopt GenericXLog for WAL-based crash atomicity and replication #294
Description
Summary
pg_textsearch currently uses MarkBufferDirty + FlushOneBuffer for all buffer modifications, with no WAL logging. This means:
- No crash atomicity for multi-buffer operations (e.g., docid chain extension writes two buffers with manual flush ordering)
- No physical replication of index changes — replicas don't see buffer modifications since no WAL records are generated
- Crash safety depends on manual
FlushOneBufferordering, which is fragile (BM25 index on TimescaleDB hypertable: Invalid docid page magic on chunk scans #291)
Adopting Postgres's GenericXLog API would give us proper WAL-based crash atomicity and correct physical replication for free. Segment data pages (the bulk of writes) can remain as MarkBufferDirty + FlushRelationBuffers since they're immutable and unreachable until linked via metapage.
Scope
Only pointer/metadata operations need GenericXLog — not segment data writes:
| Call site | Buffers | Current approach |
|---|---|---|
tp_add_docid_to_pages (first page) |
2 (docid + meta) | FlushOneBuffer ordering |
tp_add_docid_to_pages (chain extend) |
2 (new + old) | FlushOneBuffer ordering |
tp_add_docid_to_pages (single add) |
1 (docid) | MarkBufferDirty |
tp_clear_docid_pages |
1 (meta) | FlushOneBuffer |
tp_build_init_metapage |
1 (meta) | FlushOneBuffer |
tp_buildempty |
1 (meta) | FlushOneBuffer |
tp_build corpus stats |
1 (meta) | FlushOneBuffer |
tp_link_l0_chain_head |
2 (seg + meta) | MarkBufferDirty (no flush) |
tp_bulk_load_spill_check |
2 (seg + meta) | MarkBufferDirty (no flush) |
tp_merge_level_segments |
2 (seg + meta) | MarkBufferDirty (no flush) |
tp_vacuum_replace_segment |
3 (new + prev + meta) | MarkBufferDirty (no flush) |
tp_bulkdelete stats |
1 (meta) | MarkBufferDirty (no flush) |
build_parallel metapage |
1 (meta) | MarkBufferDirty (no flush) |
Segment data page writes (segment.c writer, build_context.c dict backpatch, merge.c sink) stay as-is — they're immutable and not reachable until linked.
Blocker
Initial implementation found that GenericXLogFinish during aminsert (the tp_add_docid_to_pages single-docid-add path) causes a BufferContent LWLock self-deadlock on the second INSERT to any BM25 index. Key findings:
- GenericXLog in
tp_build_init_metapage(DDL/CREATE INDEX path) works fine - GenericXLog in
tp_add_docid_to_pages(DML/aminsert path) deadlocks - Even a no-op GenericXLog (register buffer, don't modify, finish) triggers it
GenericXLogAbortworks — onlyGenericXLogFinishcauses the hang- The
bloomcontrib extension uses GenericXLog in aminsert without issues - Not caused by TimescaleDB (tested without it)
- Reproduces on both debug and release PG18 builds
Next step: attach a debugger to get the exact stack trace and identify which LockBuffer call blocks and which prior lock holds the conflicting BufferContent lock.
Context
This was scoped out during work on #291. The flush-ordering fix in PR #292 is the immediate tactical fix. This issue tracks the proper architectural solution.