Skip to content

Normalize unit_type field shape across parsers so EntryPointDetector's unit-type check works for JS/Go #63

@ar7casper

Description

@ar7casper

Context

utilities/agentic_enhancer/entry_point_detector.py:176 reads:

unit_type = func_data.get('unit_type', '')
if unit_type in ENTRY_POINT_TYPES:
    reasons.append(f'unit_type:{unit_type}')

This is "Check 1" of entry-point detection. It only fires when the function's metadata has a unit_type key in snake_case.

The gap

Different parsers emit different shapes:

Parser unit_type key shape
Python (function_extractor.py) unit_type (snake_case) ✅
C / Ruby / PHP (builder.export()) unit_type (snake_case) ✅
Zig unit_type (snake_case) ✅
JavaScript (typescript_analyzer.js) unitType (camelCase) ❌
Go (normalized in parsers/go/test_pipeline.py) unitType (camelCase) ❌

For JS and Go scans, func_data.get('unit_type', '') returns '' and Check 1 silently fails. Only Check 3 (input patterns like req.body / req.params) actually catches anything.

Practical impact

  • Express handler bodies that touch req.body get caught by Check 3 — fine in practice
  • Minimal Express middleware (async (req, res, next) => next();) has no body content matching input patterns → silently dropped from analysis
  • route_middleware units (added in fix: extract Express.js anonymous route handler callbacks #49) have the same issue — Check 1 should catch them but doesn't due to camelCase

#49's regression test test_route_middleware_is_entry_point passes in isolation (constructs unit_type snake_case directly) but the full JS pipeline gap remains.

This is documented

_docs/internal/parser-issues.md items #23-26 describe the broader normalization gap. _docs/internal/how-openant-works.md Room-for-Improvement #10 lists "Fix test_pipeline.py normalization in ALL parsers" as the highest-impact fix.

Proposed approaches

Option A — Normalize at parser output time (preferred)

Each parser's test_pipeline.py should emit unit_type (snake_case). Python / C / Ruby / PHP / Zig already do. JS and Go don't.

Concrete fix:

  • parsers/javascript/test_pipeline.py:497: normalize unitTypeunit_type before passing to EntryPointDetector
  • parsers/go/test_pipeline.py:296: change 'unitType': ...'unit_type': ...
  • Same for the call_graph.json writes added in feat: LLM review stage for enhanced reachability detection #50 (so apply_reachability_filter re-filter also gets snake_case)

Option B — Make EntryPointDetector accept both

unit_type = func_data.get('unit_type') or func_data.get('unitType', '')

One-line change but doesn't align the underlying schema.

Recommendation

Option A. Establish snake_case as the canonical schema, update tests/test_call_graph_output.py to assert unit_type exists in each function entry.

References

  • utilities/agentic_enhancer/entry_point_detector.py:176
  • parsers/javascript/test_pipeline.py:497
  • parsers/go/test_pipeline.py:296

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestjavascriptPull requests that update javascript codepythonPull requests that update python code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions