feat: extract markdown links as first-class edges (fixes #951)#1066
Open
adityachaudhary99 wants to merge 6 commits into
Open
feat: extract markdown links as first-class edges (fixes #951)#1066adityachaudhary99 wants to merge 6 commits into
adityachaudhary99 wants to merge 6 commits into
Conversation
- Add _file_node_id(path) helper that returns _make_id(_file_stem(path)) - Use _file_node_id for all file-level node IDs instead of _make_id(str(path)) - Update all import resolution targets to reference _file_node_id format - Update extract() legacy remap to handle both old formats - Update tests to use _file_node_id This ensures AST and semantic subagent nodes for the same file use identical node IDs (parent_dir_stem), fixing the split-node bug where one physical file appeared as two disconnected nodes (safishamsi#1033).
- ensure_named_node now always uses stem-qualified IDs - Same fix for superclass/inheritance resolution in walk() - Same fix for C#, Swift, C++, Java base type fallbacks - Removes bare-name fallback that caused cross-file collisions Previously, _make_id(name) (bare, no stem) was used as fallback when _make_id(stem, name) was not in the per-file seen_ids set, causing identically-named entities in different files to produce colliding IDs. This caused the second entity's node to overwrite the first in the NetworkX graph, losing one entity entirely (safishamsi#952).
- Add _resolve_markdown_link() for [text](path) resolution - Add _resolve_markdown_wikilink() for [[page-name]] resolution - Extract links_to edges in extract_markdown() for all resolvable links - Skips external URLs, anchors, and unresolvable paths - Edge context stores the link text/name as metadata This adds a deterministic pre-pass that captures human-authored inter-document links as high-confidence edges, dramatically reducing isolated nodes in documentation-heavy corpora (safishamsi#951).
f06e859 to
3a2f0c5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Markdown documents often contain rich cross-references through links and [[wikilinks]], but graphify ignored these as first-class graph relationships.
What changed
Depends on