test: add extraction regression workflow by Nawaz-khan-droid · Pull Request #1029 · safishamsi/graphify

Nawaz-khan-droid · 2026-05-26T12:17:23Z

Summary

Adds a lightweight end-to-end extraction regression workflow that:

creates a tiny local fixture corpus
runs graphify extraction
verifies graph outputs are generated successfully
checks graph.json contains nodes

Why

Current CI validates installation and tests, but does not explicitly verify minimal graph extraction behavior end-to-end.

This workflow helps catch silent extraction regressions while remaining lightweight and fully local-first.

Scope

Intentionally small and low-risk:

no architecture changes
no infrastructure additions
no runtime behavior changes
minimal fixture footprint

safishamsi · 2026-05-27T22:30:40Z

Good idea for a regression workflow — having CI catch silent extraction breakage is useful. A few fixes needed:

Missing trailing newline

The file ends without a newline after the last " — most editors and GitHub diffs flag this. Add one.

Add a paths filter

Without a paths: block the workflow runs on every PR, including doc-only and CI-config changes. Scope it to code that can actually affect extraction:

on:
  pull_request:
    paths:
      - 'graphify/**'
      - 'tests/**'
      - 'pyproject.toml'
  workflow_dispatch:

graphify fixture --no-viz is not valid syntax

The graphify CLI expects graphify extract <path> (with the extract subcommand). graphify fixture would try to look up fixture as a subcommand and fail. Also, --no-viz isn't a recognised flag for extract. The step should be:

graphify extract fixture

(Visualization is skipped automatically when there's no display.)

Consistency with the uv migration in #885

If #885 lands first, the CI stack will be uv-based. This workflow uses pip install -e ".[mcp,pdf,watch,sql]" which is fine standalone but inconsistent. Consider aligning with whichever approach the repo settles on.

Ensure the graph contains nodes in the regression test.

Nawaz-khan-droid · 2026-05-28T11:04:45Z

Thanks for the detailed feedback, I’ve updated the workflow accordingly:

added the paths filter
fixed the extraction command to graphify extract fixture
removed the invalid --no-viz flag
added the trailing newline

I left the install step unchanged for now since the repo still appears to be in transition toward the uv-based CI approach discussed in #885, but I can align it later if needed.

safishamsi · 2026-05-30T20:00:57Z

Thanks for thinking about end-to-end regression coverage — it's a real gap in current CI. However the workflow as written will fail on every run for two separate reasons, and a few things need alignment with existing CI conventions before we can merge.

Blocking issues

1. `graphify extract` requires an LLM API key — it will always exit 1

The workflow calls graphify extract fixture without any API key configured and no secrets: block. Per the CLI (__main__.py:3167-3177), when no backend is detected the command prints an error and sys.exit(1) before any extraction runs. The remaining steps never execute.

The PR description says "fully local-first" but graphify extract is the LLM-augmented pipeline. The purely local command (AST only, no LLM needed) is graphify update. Please replace graphify extract fixture with graphify update fixture.

2. Output path assertions are wrong

graphify extract/update writes outputs to <target>/graphify-out/, not to ./graphify-out/. With graphify update fixture the outputs land at fixture/graphify-out/graph.json and fixture/graphify-out/GRAPH_REPORT.md. Both test -f checks point at the wrong paths and would fail even if extraction succeeded.

Fix: either cd fixture before the assertions, or pass --out . to the update command so output goes to the CWD.

Convention mismatches with existing CI

	`ci.yml` (existing)	`extraction-regression.yml` (this PR)
Toolchain	`uv` via `astral-sh/setup-uv@v8.1.0`	bare `pip install -e`
Python	matrix 3.10 + 3.12	single 3.12 only
Checkout	`actions/checkout@v6`	`actions/checkout@v4`

Please align with the existing conventions: use uv, add Python 3.10 to the matrix (the most common failure surface), and update the checkout action.

Suggestion: fold into `ci.yml` instead of a new file

Rather than a separate workflow, consider adding one extra step to the existing CI job:

- name: End-to-end extraction smoke test
  run: |
    uv run graphify update tests/fixtures
    python -c "
    import json
    data = json.load(open('tests/fixtures/graphify-out/graph.json'))
    nodes = data.get('nodes', [])
    assert len(nodes) > 0, 'graph has no nodes'
    assert any(n.get('kind') == 'function' for n in nodes), 'no function nodes'
    assert len(data.get('edges', [])) > 0, 'graph has no edges'
    print(f'OK: {len(nodes)} nodes, {len(data[\"edges\"])} edges')
    "

This reuses the existing toolchain, runs on both Python versions, uses the already-maintained tests/fixtures/ corpus (which has Go, Rust, C#, Python etc.), and catches real regressions rather than just checking that graph.json exists.

Strengthening the assertions

len(nodes) > 0 will pass on a structurally broken graph. Better checks:

At least one node with kind == "function" or "class"
At least one edge (catches regressions that drop all edges)
Required schema keys present (id, label, kind, source_file)

Please fix the two blocking issues (wrong command + wrong output path) and align with CI conventions. If you go the folded-into-ci.yml route we can merge that quickly. Happy to re-review once updated — the intent here is good and worth getting right. Thanks!

test: add extraction regression workflow

666470b

Nawaz-khan-droid added 2 commits May 28, 2026 16:13

Update extraction regression workflow for pull requests

20aa8e5

fix: address workflow review feedback

9c284dc

Ensure the graph contains nodes in the regression test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add extraction regression workflow#1029

test: add extraction regression workflow#1029
Nawaz-khan-droid wants to merge 3 commits into
safishamsi:v8from
Nawaz-khan-droid:extraction-regression

Nawaz-khan-droid commented May 26, 2026

Uh oh!

safishamsi commented May 27, 2026

Uh oh!

Nawaz-khan-droid commented May 28, 2026

Uh oh!

safishamsi commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Nawaz-khan-droid commented May 26, 2026

Summary

Why

Scope

Uh oh!

safishamsi commented May 27, 2026

Uh oh!

Nawaz-khan-droid commented May 28, 2026

Uh oh!

safishamsi commented May 30, 2026

Blocking issues

1. graphify extract requires an LLM API key — it will always exit 1

2. Output path assertions are wrong

Convention mismatches with existing CI

Suggestion: fold into ci.yml instead of a new file

Strengthening the assertions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `graphify extract` requires an LLM API key — it will always exit 1

Suggestion: fold into `ci.yml` instead of a new file