Skip to content

test: add extraction regression workflow#1029

Open
Nawaz-khan-droid wants to merge 3 commits into
safishamsi:v8from
Nawaz-khan-droid:extraction-regression
Open

test: add extraction regression workflow#1029
Nawaz-khan-droid wants to merge 3 commits into
safishamsi:v8from
Nawaz-khan-droid:extraction-regression

Conversation

@Nawaz-khan-droid
Copy link
Copy Markdown

Summary

Adds a lightweight end-to-end extraction regression workflow that:

  • creates a tiny local fixture corpus
  • runs graphify extraction
  • verifies graph outputs are generated successfully
  • checks graph.json contains nodes

Why

Current CI validates installation and tests, but does not explicitly verify minimal graph extraction behavior end-to-end.

This workflow helps catch silent extraction regressions while remaining lightweight and fully local-first.

Scope

Intentionally small and low-risk:

  • no architecture changes
  • no infrastructure additions
  • no runtime behavior changes
  • minimal fixture footprint

@safishamsi
Copy link
Copy Markdown
Owner

Good idea for a regression workflow — having CI catch silent extraction breakage is useful. A few fixes needed:

Missing trailing newline

The file ends without a newline after the last " — most editors and GitHub diffs flag this. Add one.

Add a paths filter

Without a paths: block the workflow runs on every PR, including doc-only and CI-config changes. Scope it to code that can actually affect extraction:

on:
  pull_request:
    paths:
      - 'graphify/**'
      - 'tests/**'
      - 'pyproject.toml'
  workflow_dispatch:

graphify fixture --no-viz is not valid syntax

The graphify CLI expects graphify extract <path> (with the extract subcommand). graphify fixture would try to look up fixture as a subcommand and fail. Also, --no-viz isn't a recognised flag for extract. The step should be:

graphify extract fixture

(Visualization is skipped automatically when there's no display.)

Consistency with the uv migration in #885

If #885 lands first, the CI stack will be uv-based. This workflow uses pip install -e ".[mcp,pdf,watch,sql]" which is fine standalone but inconsistent. Consider aligning with whichever approach the repo settles on.

@Nawaz-khan-droid
Copy link
Copy Markdown
Author

Thanks for the detailed feedback, I’ve updated the workflow accordingly:

  • added the paths filter
  • fixed the extraction command to graphify extract fixture
  • removed the invalid --no-viz flag
  • added the trailing newline

I left the install step unchanged for now since the repo still appears to be in transition toward the uv-based CI approach discussed in #885, but I can align it later if needed.

@safishamsi
Copy link
Copy Markdown
Owner

Thanks for thinking about end-to-end regression coverage — it's a real gap in current CI. However the workflow as written will fail on every run for two separate reasons, and a few things need alignment with existing CI conventions before we can merge.

Blocking issues

1. graphify extract requires an LLM API key — it will always exit 1

The workflow calls graphify extract fixture without any API key configured and no secrets: block. Per the CLI (__main__.py:3167-3177), when no backend is detected the command prints an error and sys.exit(1) before any extraction runs. The remaining steps never execute.

The PR description says "fully local-first" but graphify extract is the LLM-augmented pipeline. The purely local command (AST only, no LLM needed) is graphify update. Please replace graphify extract fixture with graphify update fixture.

2. Output path assertions are wrong

graphify extract/update writes outputs to <target>/graphify-out/, not to ./graphify-out/. With graphify update fixture the outputs land at fixture/graphify-out/graph.json and fixture/graphify-out/GRAPH_REPORT.md. Both test -f checks point at the wrong paths and would fail even if extraction succeeded.

Fix: either cd fixture before the assertions, or pass --out . to the update command so output goes to the CWD.

Convention mismatches with existing CI

ci.yml (existing) extraction-regression.yml (this PR)
Toolchain uv via astral-sh/setup-uv@v8.1.0 bare pip install -e
Python matrix 3.10 + 3.12 single 3.12 only
Checkout actions/checkout@v6 actions/checkout@v4

Please align with the existing conventions: use uv, add Python 3.10 to the matrix (the most common failure surface), and update the checkout action.

Suggestion: fold into ci.yml instead of a new file

Rather than a separate workflow, consider adding one extra step to the existing CI job:

- name: End-to-end extraction smoke test
  run: |
    uv run graphify update tests/fixtures
    python -c "
    import json
    data = json.load(open('tests/fixtures/graphify-out/graph.json'))
    nodes = data.get('nodes', [])
    assert len(nodes) > 0, 'graph has no nodes'
    assert any(n.get('kind') == 'function' for n in nodes), 'no function nodes'
    assert len(data.get('edges', [])) > 0, 'graph has no edges'
    print(f'OK: {len(nodes)} nodes, {len(data[\"edges\"])} edges')
    "

This reuses the existing toolchain, runs on both Python versions, uses the already-maintained tests/fixtures/ corpus (which has Go, Rust, C#, Python etc.), and catches real regressions rather than just checking that graph.json exists.

Strengthening the assertions

len(nodes) > 0 will pass on a structurally broken graph. Better checks:

  • At least one node with kind == "function" or "class"
  • At least one edge (catches regressions that drop all edges)
  • Required schema keys present (id, label, kind, source_file)

Please fix the two blocking issues (wrong command + wrong output path) and align with CI conventions. If you go the folded-into-ci.yml route we can merge that quickly. Happy to re-review once updated — the intent here is good and worth getting right. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants