Problem
bm25_force_merge merges all segments into a single segment, but the result is written page-by-page into whatever free pages are available (via FSM or relation extension). If the index previously had segments that were freed, the new segment's pages can be scattered across the relation file, hurting sequential I/O during scans.
Proposed approach
Add an optional defragmentation step to bm25_force_merge using a two-phase write:
- Merge to BufFile: Write the merged segment as a flat byte stream to a temporary BufFile (infrastructure already exists via
tp_write_segment_to_buffile and the merge code path).
- COPY to contiguous pages: Reclaim all existing segment pages, then copy the BufFile data back into a contiguous page range at the end of the relation (similar to parallel build Phase 2 COPY path). Truncate any trailing free pages.
This would ensure the final segment has physically sequential pages, benefiting:
- Sequential read patterns during full posting list scans
- OS readahead effectiveness
- Reduced random I/O on spinning disks
Vacuum
VACUUM should also defragment segments when it rewrites them. After dead tuple removal triggers a segment rewrite, the new segment should be laid out contiguously rather than scattered across freed pages. This is the same two-phase approach (merge to BufFile, COPY to contiguous pages) and would keep the index compact over time without requiring manual bm25_force_merge calls.
Notes
- The BufFile write and contiguous COPY infrastructure already exists in the parallel build code path
- Could be opt-in via a boolean parameter (e.g.,
bm25_force_merge('idx', defragment := true)) or always-on since force merge is already an expensive maintenance operation
- Should measure the actual I/O improvement on large indexes before deciding on default behavior
Problem
bm25_force_mergemerges all segments into a single segment, but the result is written page-by-page into whatever free pages are available (via FSM or relation extension). If the index previously had segments that were freed, the new segment's pages can be scattered across the relation file, hurting sequential I/O during scans.Proposed approach
Add an optional defragmentation step to
bm25_force_mergeusing a two-phase write:tp_write_segment_to_buffileand the merge code path).This would ensure the final segment has physically sequential pages, benefiting:
Vacuum
VACUUM should also defragment segments when it rewrites them. After dead tuple removal triggers a segment rewrite, the new segment should be laid out contiguously rather than scattered across freed pages. This is the same two-phase approach (merge to BufFile, COPY to contiguous pages) and would keep the index compact over time without requiring manual
bm25_force_mergecalls.Notes
bm25_force_merge('idx', defragment := true)) or always-on since force merge is already an expensive maintenance operation