diff --git a/CLAUDE.md b/CLAUDE.md index 3e274c12..309b3363 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,22 +2,24 @@ Tree-sitter powered code analysis for massive context savings (60-90% token reduction). -## MANDATORY: Always Use AFT First +## Start With Outline, Escalate From There -**CRITICAL**: AFT semantic commands are the DEFAULT, not optional. Grep/Read with limited context (e.g., 3 lines) misses the bigger picture. We want to SEE the full picture, not shoot in the dark. +**Outline is the default entry point.** Before reading full files, run `aft outline` to get structure — ~10% the tokens of a full read. This applies to code, markdown, config, and docs. -**AFT applies to ALL file types** — not just code. Markdown, config, docs, JSON, YAML all benefit. Even for "just checking what files are" — outline first. +**Escalate to semantic commands only when the task needs them:** +- `aft zoom ` — when you need to read a specific function body. +- `aft call_tree` / `aft callers` — when you need cross-file call relationships (grep can't infer these). +- `aft impact` — before a refactor, to see what breaks. +- `aft trace_to` — when debugging how execution reaches a point. +- `aft trace_data` — when tracking where a value came from or where it flows next. -**Before reading ANY files:** -1. `aft outline` FIRST - understand structure before diving in -2. `aft zoom` for symbols - never read full files when you need one function -3. `aft callers`/`aft call_tree` for flow - grep misses cross-file relationships +**Don't use semantic commands reflexively.** For verification tasks — "does this symbol still exist?", "is this doc accurate?" — outline alone is usually enough. Reaching for zoom/call_tree on every task inflates work without improving answers. ## AFT CLI Commands Use `aft` commands via Bash for code navigation. These provide structured output optimized for LLM consumption. -### Semantic Commands (USE THESE BY DEFAULT) +### Semantic Commands ```bash # Get structure without content (~10% of full read tokens) @@ -35,11 +37,38 @@ aft callers # Impact analysis - what breaks if this changes? aft impact -# Trace analysis - how does execution reach this? +# Control flow - how does execution reach this function? aft trace_to + +# Data flow - how does a value flow through assignments and across calls? +aft trace_data [depth] ``` -### Basic Commands (fallback only) +## Tracing: control flow vs. data flow + +Two different questions, two commands: +- **"How does execution reach this function?"** → `aft trace_to` (control flow). + Example: `aft trace_to api/handler.go ChargePayment` — shows the call chain that lands on ChargePayment. +- **"Where did this value come from / where does it go next?"** → `aft trace_data` (data flow through assignments and parameter passing). + Example: `aft trace_data api/handler.go ChargePayment merchantID` — traces how `merchantID` propagates within and across function boundaries. + +For a bug like "this field got the wrong value," `trace_data` is usually the right starting point; for "why did this handler run," `trace_to` is. + +### Patterns trace_data handles + +`trace_data` follows values across these constructs — use it confidently on idiomatic code instead of manually reading every caller: + +- **Direct args**: `f(x)` → hop into `f`'s matching parameter. +- **Reference args**: `f(&x)` → hop into `f`'s pointer parameter. +- **Field-access args**: `f(x.Field)` → approximate hop into `f`'s matching parameter (propagation continues). +- **Struct-literal wraps**: `w := Wrapper{Field: x}` → approximate assignment hop to `w`, then tracking continues on `w`. +- **Pointer-write intrinsics** (`json.Unmarshal`, `yaml.Unmarshal`, `xml.Unmarshal`, `toml.Unmarshal`, `proto.Unmarshal`, `bson.Unmarshal`, `msgpack.Unmarshal`): `json.Unmarshal(raw, &out)` binds `raw`'s flow into `out`, and further uses of `out` are tracked. +- **Method receivers**: `x.Method(...)` → hop into the receiver parameter name (Go `func (u *T) Method(...)`, Rust `&self`). +- **Destructuring assigns**: `a, b := f()` and `{a, b} = f()` → tracking splits onto the new bindings. + +Hops marked `"approximate": true` are lossy (field access, struct wraps, writer intrinsics) — the flow exists but the exact subfield is not resolved. + +### Basic Commands ```bash aft read [start_line] [limit] # Read with line numbers @@ -71,7 +100,10 @@ Need to understand files? | -> aft impact | +-- Debugging how execution reaches a point? - -> aft trace_to + | -> aft trace_to + | + +-- Tracking where a value came from or where it flows? + -> aft trace_data ``` ## When to Use What @@ -84,17 +116,26 @@ Need to understand files? | Understanding dependencies | `aft call_tree` | Structured graph | | Finding usage sites | `aft callers` | All call sites | | Planning refactors | `aft impact` | Change propagation | -| Debugging call paths | `aft trace_to` | Execution paths | +| Debugging control flow | `aft trace_to` | Execution paths | +| Debugging data flow | `aft trace_data` | Value propagation | + +## Rules + +Match the command to the task type. Outline is universal; the semantic graph tools (zoom/call_tree/callers/impact/trace_to) pay off for *comprehension* tasks, not for *verification* tasks. + +**Verification tasks** — "does X still exist?", "is this doc still accurate?", "what files are in this dir?": +1. **ALWAYS start with outline** - `aft outline` to confirm structure and anchor symbols. +2. **Outline is usually enough.** Don't reach for zoom/call_tree/callers unless you need to see actual behavior, not just presence. +3. **ALWAYS outline before delegating** - When briefing a subagent to explore a repo or directory, run `aft outline ` yourself first and include the output in the subagent prompt. Never leave outline as a mid-step instruction — subagents don't follow ordering guarantees. + +**Comprehension tasks** — "how does this flow work?", "what breaks if I change X?", "where is this called?": +4. **Use zoom** to read a specific function body without reading the whole file. +5. **Use call_tree / callers** to map cross-file relationships that grep cannot see. +6. **Use impact before a refactor** to understand blast radius before editing. -## Rules (NOT suggestions) +**When grep is fine.** `aft grep` for a bare identifier is correct when you just need to know "does this string appear, and where." Reach for semantic commands when you need to understand *behavior* behind the name, not every time a name shows up. -1. **ALWAYS start with outline** - Before reading ANY file, use `aft outline` to understand structure -2. **ALWAYS zoom to symbols** - Never read full files when you need specific functions -3. **ALWAYS use call graphs** - For understanding code flow, `call_tree` and `callers` reveal what grep cannot -4. **ALWAYS impact before refactor** - Run `aft impact` before making changes to understand blast radius -5. **NEVER grep with limited context** - If you need more than the symbol name, use AFT semantic commands -6. **ALWAYS outline before sampling** - Even for "just checking what files are" tasks, outline first -7. **ALWAYS outline before delegating** - When briefing a subagent to explore a repo or directory, run `aft outline ` yourself first and include the output in the subagent prompt. Never leave outline as a mid-step instruction — subagents don't follow ordering guarantees. +**Context protection still applies.** See the Context Protection section — even when a task is verification-only, don't read full files; outline first and selectively read what you need. ## Context Protection diff --git a/Cargo.lock b/Cargo.lock index b5822af8..3eea5678 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -10,7 +10,7 @@ checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa" [[package]] name = "agent-file-tools" -version = "0.12.2" +version = "0.13.0" dependencies = [ "ast-grep-core", "content_inspector", diff --git a/crates/aft/src/callgraph.rs b/crates/aft/src/callgraph.rs index d24577da..4b2db8ee 100644 --- a/crates/aft/src/callgraph.rs +++ b/crates/aft/src/callgraph.rs @@ -335,15 +335,196 @@ pub struct TraceDataResult { pub depth_limited: bool, } +impl TraceDataResult { + /// Compact LLM-friendly rendering. + pub fn render_text(&self) -> String { + let mut out = String::new(); + out.push_str(&format!( + "trace_data {} in {} ({})\n", + self.expression, self.origin_symbol, self.origin_file + )); + if self.hops.is_empty() { + out.push_str(" (no hops)\n"); + } else { + for (i, hop) in self.hops.iter().enumerate() { + let approx = if hop.approximate { " approximate" } else { "" }; + out.push_str(&format!( + " {}. {}.{} {}:{} {}{}\n", + i + 1, + hop.symbol, + hop.variable, + hop.file, + hop.line, + hop.flow_type, + approx, + )); + } + } + if self.depth_limited { + out.push_str("depth limit reached\n"); + } + out + } +} + +impl TraceToResult { + /// Compact LLM-friendly rendering. + pub fn render_text(&self) -> String { + let mut out = String::new(); + out.push_str(&format!( + "trace_to {} ({}) paths={} entries={} truncated={}{}\n", + self.target_symbol, + self.target_file, + self.total_paths, + self.entry_points_found, + self.truncated_paths, + if self.max_depth_reached { + " max_depth_reached" + } else { + "" + }, + )); + if self.paths.is_empty() { + out.push_str(" (no paths)\n"); + } else { + for (i, path) in self.paths.iter().enumerate() { + out.push_str(&format!(" path {}:\n", i + 1)); + for hop in &path.hops { + let entry = if hop.is_entry_point { " [entry]" } else { "" }; + out.push_str(&format!( + " {} ({}:{}){}\n", + hop.symbol, hop.file, hop.line, entry + )); + } + } + } + out + } +} + +impl CallersResult { + /// Compact LLM-friendly rendering. + pub fn render_text(&self) -> String { + let mut out = String::new(); + out.push_str(&format!( + "callers of {} ({}) total={} files={} scanned={}\n", + self.symbol, + self.file, + self.total_callers, + self.callers.len(), + self.scanned_files, + )); + if self.callers.is_empty() { + out.push_str(" (no callers)\n"); + } else { + for group in &self.callers { + out.push_str(&format!(" {} ({}):\n", group.file, group.callers.len())); + for c in &group.callers { + out.push_str(&format!(" - {}:{}\n", c.symbol, c.line)); + } + } + } + out + } +} + +impl ImpactResult { + /// Compact LLM-friendly rendering. + pub fn render_text(&self) -> String { + let mut out = String::new(); + let sig = self.signature.as_deref().unwrap_or(""); + out.push_str(&format!( + "impact {} ({}) affected={} files={}\n", + self.symbol, self.file, self.total_affected, self.affected_files, + )); + if !sig.is_empty() { + out.push_str(&format!(" signature: {}\n", sig)); + } + if !self.parameters.is_empty() { + out.push_str(&format!(" parameters: {}\n", self.parameters.join(", "))); + } + if self.callers.is_empty() { + out.push_str(" (no callers)\n"); + } else { + for c in &self.callers { + let entry = if c.is_entry_point { " [entry]" } else { "" }; + out.push_str(&format!( + " - {} ({}:{}){}\n", + c.caller_symbol, c.caller_file, c.line, entry + )); + if let Some(expr) = &c.call_expression { + out.push_str(&format!(" {}\n", expr)); + } + } + } + out + } +} + +impl CallTreeNode { + /// Compact LLM-friendly rendering (indented tree). + pub fn render_text(&self) -> String { + let mut out = String::new(); + self.render_into(&mut out, 0); + out + } + + fn render_into(&self, out: &mut String, depth: usize) { + let indent = " ".repeat(depth); + let unresolved = if self.resolved { "" } else { " [unresolved]" }; + let sig = self + .signature + .as_deref() + .map(|s| format!(" {}", s)) + .unwrap_or_default(); + out.push_str(&format!( + "{}- {} ({}:{}){}{}\n", + indent, self.name, self.file, self.line, unresolved, sig + )); + for child in &self.children { + child.render_into(out, depth + 1); + } + } +} + /// Extract parameter names from a function signature string. /// /// Strips language-specific receivers (`self`, `&self`, `&mut self` for Rust, /// `self` for Python) and type annotations / default values. Returns just /// the parameter names. pub fn extract_parameters(signature: &str, lang: LangId) -> Vec { + // Go methods look like `func (recv Type) Name(params) ret`. The first + // parenthesised block is the receiver — skip it so we find the real + // parameter list. Other languages have no receiver block, so start from + // the first `(` as usual. + let scan_from = if lang == LangId::Go { + let trimmed = signature.trim_start(); + let offset = signature.len() - trimmed.len(); + if let Some(rest) = trimmed.strip_prefix("func ") { + let rest = rest.trim_start(); + if rest.starts_with('(') { + // Walk the receiver parens to find its matching close. + if let Some(close) = first_top_level(&rest[1..], ')') { + // Absolute index of the char after the receiver's ')'. + let receiver_abs_end = + offset + (trimmed.len() - rest.len()) + 1 + close + 1; + receiver_abs_end + } else { + 0 + } + } else { + 0 + } + } else { + 0 + } + } else { + 0 + }; + // Find the parameter list between parentheses - let start = match signature.find('(') { - Some(i) => i + 1, + let start = match signature[scan_from..].find('(') { + Some(i) => scan_from + i + 1, None => return Vec::new(), }; let end = match signature[start..].find(')') { @@ -1414,8 +1595,15 @@ impl CallGraph { let root = tree.root_node(); - // Find the symbol's body node (the function/method definition node) - let body_node = match find_node_covering_range(root, body_start, body_end) { + // Find the symbol's function/method-declaration node. The symbol's + // reported range may include leading doc comments that are siblings of + // the function node — in that case a simple covers-range lookup falls + // back to the source_file root and the walker leaks into unrelated + // functions. Prefer a name-matched declaration whose range is inside + // the reported range; fall back to covering-range if nothing matches. + let body_node = find_function_decl_node(root, &source, body_start, body_end, symbol) + .or_else(|| find_node_covering_range(root, body_start, body_end)); + let body_node = match body_node { Some(n) => n, None => return, }; @@ -1468,6 +1656,7 @@ impl CallGraph { | "assignment_expression" | "augmented_assignment_expression" | "assignment" + | "assignment_statement" | "let_declaration" | "short_var_declaration" ); @@ -1476,30 +1665,53 @@ impl CallGraph { if let Some((new_name, init_text, line, is_approx)) = self.extract_assignment_info(node, source, lang, tracked_names) { - // The RHS references a tracked name — add assignment hop - if !is_approx { + // If the LHS is a destructuring pattern, we can't attribute the + // flow to a single name — emit an approximate hop carrying the + // pattern text and stop tracking through this branch. + let is_destructuring_pattern = + new_name.starts_with('{') || new_name.starts_with('['); + + if is_destructuring_pattern { hops.push(DataFlowHop { file: rel_file.to_string(), symbol: symbol.to_string(), - variable: new_name.clone(), + variable: init_text, line, flow_type: "assignment".to_string(), - approximate: false, + approximate: true, }); - tracked_names.push(new_name); - } else { - // Destructuring or pattern — approximate + return; + } + + // LHS is a field/index write (e.g. `m.Field = x` or `a[i] = x`). + // Emit a distinct `field_write` hop. Don't extend tracked_names + // with the compound path — downstream tracking treats names as + // bare identifiers, and "m.Field" would produce noisy matches. + let is_field_write = new_name.contains('.') + || new_name.contains('[') + || new_name.starts_with('*'); + + if is_field_write { hops.push(DataFlowHop { file: rel_file.to_string(), symbol: symbol.to_string(), - variable: init_text, + variable: new_name, line, - flow_type: "assignment".to_string(), - approximate: true, + flow_type: "field_write".to_string(), + approximate: is_approx, }); - // Don't track further through this branch return; } + + hops.push(DataFlowHop { + file: rel_file.to_string(), + symbol: symbol.to_string(), + variable: new_name.clone(), + line, + flow_type: "assignment".to_string(), + approximate: is_approx, + }); + tracked_names.push(new_name); } } @@ -1549,7 +1761,12 @@ impl CallGraph { } /// Check if an assignment/declaration node assigns from a tracked name. - /// Returns (new_name, init_text, line, is_approximate). + /// Returns `(new_name, init_text, line, is_approximate)`. + /// + /// Uses [`classify_expr_text`] so all assignment forms uniformly handle: + /// direct identifier matches, reference/deref prefixes (`&x`/`*x`), + /// field/index accesses (`x.F`/`x[i]`), and composite literals containing + /// a tracked value (`Foo{F: x}`). fn extract_assignment_info( &self, node: tree_sitter::Node, @@ -1568,22 +1785,16 @@ impl CallGraph { let name_text = node_text(name_node, source); let value_text = node_text(value_node, source); - // Check if name is a destructuring pattern if name_node.kind() == "object_pattern" || name_node.kind() == "array_pattern" { - // Check if value references a tracked name if tracked_names.iter().any(|t| value_text.contains(t)) { return Some((name_text.clone(), name_text, line, true)); } return None; } - // Check if value references any tracked name - if tracked_names.iter().any(|t| { - value_text == *t - || value_text.starts_with(&format!("{}.", t)) - || value_text.starts_with(&format!("{}[", t)) - }) { - return Some((name_text, value_text, line, false)); + if let Some((_matched, kind)) = classify_expr_text(&value_text, tracked_names) { + let is_approx = matches!(kind, ExprMatch::Derived); + return Some((name_text, value_text, line, is_approx)); } None } @@ -1594,8 +1805,9 @@ impl CallGraph { let left_text = node_text(left, source); let right_text = node_text(right, source); - if tracked_names.iter().any(|t| right_text == *t) { - return Some((left_text, right_text, line, false)); + if let Some((_matched, kind)) = classify_expr_text(&right_text, tracked_names) { + let is_approx = matches!(kind, ExprMatch::Derived); + return Some((left_text, right_text, line, is_approx)); } None } @@ -1606,8 +1818,22 @@ impl CallGraph { let left_text = node_text(left, source); let right_text = node_text(right, source); - if tracked_names.iter().any(|t| right_text == *t) { - return Some((left_text, right_text, line, false)); + if let Some((_matched, kind)) = classify_expr_text(&right_text, tracked_names) { + let is_approx = matches!(kind, ExprMatch::Derived); + return Some((left_text, right_text, line, is_approx)); + } + None + } + "assignment_statement" => { + // Go: x = (tree-sitter-go wraps both sides in expression_list). + let left = node.child_by_field_name("left")?; + let right = node.child_by_field_name("right")?; + let left_text = node_text(left, source); + let right_text = node_text(right, source); + + if let Some((_matched, kind)) = classify_expr_text(&right_text, tracked_names) { + let is_approx = matches!(kind, ExprMatch::Derived); + return Some((left_text, right_text, line, is_approx)); } None } @@ -1622,8 +1848,9 @@ impl CallGraph { let left_text = node_text(left, source); let right_text = node_text(right, source); - if tracked_names.iter().any(|t| right_text == *t) { - return Some((left_text, right_text, line, false)); + if let Some((_matched, kind)) = classify_expr_text(&right_text, tracked_names) { + let is_approx = matches!(kind, ExprMatch::Derived); + return Some((left_text, right_text, line, is_approx)); } None } @@ -1631,14 +1858,27 @@ impl CallGraph { } } - /// Check if a call expression uses a tracked name as an argument, and if so, - /// resolve the callee and recurse into its body tracking the parameter name. + /// Check if a call expression uses a tracked name (as an argument or as a + /// method receiver), and if so resolve the callee and recurse into its body + /// tracking the parameter name. + /// + /// Handles: + /// - Generalized arg matching via [`classify_expr_text`]: `x`, `&x`, `*x` + /// are `Direct`; `x.F`, `x[i]`, `Foo{F: x}` are `Derived` (approximate). + /// - Method receivers: `x.Method(...)` where `x` is tracked binds to the + /// method's receiver param (via [`extract_receiver_name`]). + /// - Known-writer intrinsics ([`known_writer_spec`]): e.g. + /// `json.Unmarshal(data, &out)` — when the input arg is tracked, the + /// pointer output arg is added as a new tracked name via a `writer` hop. + /// + /// `tracked_names` is mutable so known-writer outputs propagate to later + /// siblings in the same function body. #[allow(clippy::too_many_arguments)] fn check_call_for_data_flow( &mut self, node: tree_sitter::Node, source: &str, - tracked_names: &[String], + tracked_names: &mut Vec, file: &Path, _symbol: &str, rel_file: &str, @@ -1649,81 +1889,154 @@ impl CallGraph { depth_limited: &mut bool, visited: &mut HashSet<(PathBuf, String, String)>, ) { - // Find the arguments node - let args_node = find_child_by_kind(node, "arguments") - .or_else(|| find_child_by_kind(node, "argument_list")); - - let args_node = match args_node { - Some(n) => n, + // Extract callee names (full = "pkg.Name" or "x.Method", short = "Method"). + let (full_callee, short_callee) = extract_callee_names(node, source); + let full_callee = match full_callee { + Some(f) => f, + None => return, + }; + let short_callee = match short_callee { + Some(s) => s, None => return, }; - // Collect argument texts and find which position a tracked name appears at - let mut arg_positions: Vec<(usize, String)> = Vec::new(); // (position, tracked_name) - let mut arg_idx = 0; + // Detect method-call receiver: if the callee is a selector/member + // expression (`.Method`) and `` references a tracked name. + let receiver_match: Option = node + .child_by_field_name("function") + .and_then(|fn_node| { + let k = fn_node.kind(); + if matches!( + k, + "selector_expression" + | "member_expression" + | "field_expression" + | "attribute" + ) { + let base = fn_node + .child_by_field_name("operand") + .or_else(|| fn_node.child_by_field_name("object")) + .or_else(|| fn_node.child_by_field_name("value")); + base.and_then(|b| { + let text = node_text(b, source); + classify_expr_text(&text, tracked_names).map(|(_, k)| k) + }) + } else { + None + } + }); - let mut cursor = args_node.walk(); - if cursor.goto_first_child() { - loop { - let child = cursor.node(); - let child_kind = child.kind(); + // Find the arguments node. + let args_node = find_child_by_kind(node, "arguments") + .or_else(|| find_child_by_kind(node, "argument_list")); - // Skip punctuation (parentheses, commas) - if child_kind == "(" || child_kind == ")" || child_kind == "," { - if !cursor.goto_next_sibling() { - break; + // Collect arg matches via classifier: (position, arg_text, match_kind). + let mut arg_matches: Vec<(usize, String, ExprMatch)> = Vec::new(); + + if let Some(args_node) = args_node { + let mut arg_idx = 0; + let mut cursor = args_node.walk(); + if cursor.goto_first_child() { + loop { + let child = cursor.node(); + let child_kind = child.kind(); + + if child_kind == "(" || child_kind == ")" || child_kind == "," { + if !cursor.goto_next_sibling() { + break; + } + continue; } - continue; - } - let arg_text = node_text(child, source); + let arg_text = node_text(child, source); - // Check for spread element — approximate - if child_kind == "spread_element" || child_kind == "dictionary_splat" { - if tracked_names.iter().any(|t| arg_text.contains(t)) { - hops.push(DataFlowHop { - file: rel_file.to_string(), - symbol: _symbol.to_string(), - variable: arg_text, - line: child.start_position().row as u32 + 1, - flow_type: "parameter".to_string(), - approximate: true, - }); + // Spread / splat — approximate, no param binding (position + // mapping is ambiguous). + if child_kind == "spread_element" || child_kind == "dictionary_splat" { + if tracked_names.iter().any(|t| arg_text.contains(t.as_str())) { + hops.push(DataFlowHop { + file: rel_file.to_string(), + symbol: _symbol.to_string(), + variable: arg_text.clone(), + line: child.start_position().row as u32 + 1, + flow_type: "parameter".to_string(), + approximate: true, + }); + } + arg_idx += 1; + if !cursor.goto_next_sibling() { + break; + } + continue; + } + + if let Some((_matched, kind)) = classify_expr_text(&arg_text, tracked_names) { + arg_matches.push((arg_idx, arg_text.clone(), kind)); } + + arg_idx += 1; if !cursor.goto_next_sibling() { break; } - arg_idx += 1; - continue; - } - - if tracked_names.iter().any(|t| arg_text == *t) { - arg_positions.push((arg_idx, arg_text)); - } - - arg_idx += 1; - if !cursor.goto_next_sibling() { - break; } } } - if arg_positions.is_empty() { + if arg_matches.is_empty() && receiver_match.is_none() { return; } - // Resolve the callee - let (full_callee, short_callee) = extract_callee_names(node, source); - let full_callee = match full_callee { - Some(f) => f, - None => return, - }; - let short_callee = match short_callee { - Some(s) => s, - None => return, - }; + // Known-writer intrinsic: e.g. json.Unmarshal(data, &out). If a tracked + // name appears at an input position, mark pointer output args as new + // tracked names and emit "writer" hops. Skip parameter-hop recursion + // since the intrinsic's body isn't meaningful for the tracker. + if let Some((input_positions, output_positions)) = known_writer_spec(&full_callee) { + let tracked_at_input = arg_matches + .iter() + .any(|(pos, _, _)| input_positions.contains(pos)); + if tracked_at_input { + if let Some(args_node) = args_node { + let mut oarg_idx = 0; + let mut oc = args_node.walk(); + if oc.goto_first_child() { + loop { + let child = oc.node(); + let ck = child.kind(); + if ck == "(" || ck == ")" || ck == "," { + if !oc.goto_next_sibling() { + break; + } + continue; + } + if output_positions.contains(&oarg_idx) { + let txt = node_text(child, source); + let stripped = strip_ref_deref_prefix(&txt).trim().to_string(); + if !stripped.is_empty() + && !tracked_names.iter().any(|t| t == &stripped) + { + hops.push(DataFlowHop { + file: rel_file.to_string(), + symbol: _symbol.to_string(), + variable: stripped.clone(), + line: node.start_position().row as u32 + 1, + flow_type: "writer".to_string(), + approximate: true, + }); + tracked_names.push(stripped); + } + } + oarg_idx += 1; + if !oc.goto_next_sibling() { + break; + } + } + } + } + return; + } + } - // Try to resolve cross-file edge + // Resolve the callee (cross-file via imports, else same-file lookup). let import_block = { match self.data.get(file) { Some(fd) => fd.import_block.clone(), @@ -1733,7 +2046,8 @@ impl CallGraph { let edge = self.resolve_cross_file_edge(&full_callee, &short_callee, file, &import_block); - match edge { + // Unify resolution into (target_file, target_symbol, params, target_lang). + let resolved: Option<(PathBuf, String, Vec, LangId)> = match edge { EdgeResolution::Resolved { file: target_file, symbol: target_symbol, @@ -1742,8 +2056,6 @@ impl CallGraph { *depth_limited = true; return; } - - // Build target file to get parameter info if let Err(e) = self.build_file(&target_file) { log::debug!( "callgraph: skipping target file {}: {}", @@ -1751,51 +2063,20 @@ impl CallGraph { e ); } - let (params, _target_lang) = { - match self.data.get(&target_file) { - Some(fd) => { - let meta = fd.symbol_metadata.get(&target_symbol); - let sig = meta.and_then(|m| m.signature.clone()); - let params = sig - .as_deref() - .map(|s| extract_parameters(s, fd.lang)) - .unwrap_or_default(); - (params, fd.lang) - } - None => return, - } - }; - - let target_rel = self.relative_path(&target_file); - - for (pos, _tracked) in &arg_positions { - if let Some(param_name) = params.get(*pos) { - // Add parameter hop - hops.push(DataFlowHop { - file: target_rel.clone(), - symbol: target_symbol.clone(), - variable: param_name.clone(), - line: get_symbol_meta(&target_file, &target_symbol).0, - flow_type: "parameter".to_string(), - approximate: false, - }); - - // Recurse into callee's body tracking the parameter name - self.trace_data_inner( - &target_file.clone(), - &target_symbol.clone(), - param_name, - max_depth, - current_depth + 1, - hops, - depth_limited, - visited, - ); + match self.data.get(&target_file) { + Some(fd) => { + let meta = fd.symbol_metadata.get(&target_symbol); + let sig = meta.and_then(|m| m.signature.clone()); + let params = sig + .as_deref() + .map(|s| extract_parameters(s, fd.lang)) + .unwrap_or_default(); + Some((target_file, target_symbol, params, fd.lang)) } + None => None, } } EdgeResolution::Unresolved { callee_name } => { - // Check if it's a same-file call let has_local = self .data .get(file) @@ -1804,63 +2085,103 @@ impl CallGraph { || fd.symbol_metadata.contains_key(&callee_name) }) .unwrap_or(false); - - if has_local { - // Same-file call — get param info - let (params, _target_lang) = { - let Some(fd) = self.data.get(file) else { - return; - }; + if !has_local { + // Truly unresolved — approximate arg hops, no receiver + // binding (we don't know the target), no recursion. + for (_pos, arg_text, _kind) in &arg_matches { + hops.push(DataFlowHop { + file: self.relative_path(file), + symbol: callee_name.clone(), + variable: arg_text.clone(), + line: node.start_position().row as u32 + 1, + flow_type: "parameter".to_string(), + approximate: true, + }); + } + return; + } + if current_depth + 1 > max_depth { + *depth_limited = true; + return; + } + match self.data.get(file) { + Some(fd) => { let meta = fd.symbol_metadata.get(&callee_name); let sig = meta.and_then(|m| m.signature.clone()); let params = sig .as_deref() .map(|s| extract_parameters(s, fd.lang)) .unwrap_or_default(); - (params, fd.lang) - }; + Some((file.to_path_buf(), callee_name, params, fd.lang)) + } + None => None, + } + } + }; - let file_rel = self.relative_path(file); + let (target_file, target_symbol, params, target_lang) = match resolved { + Some(t) => t, + None => return, + }; - for (pos, _tracked) in &arg_positions { - if let Some(param_name) = params.get(*pos) { - hops.push(DataFlowHop { - file: file_rel.clone(), - symbol: callee_name.clone(), - variable: param_name.clone(), - line: get_symbol_meta(file, &callee_name).0, - flow_type: "parameter".to_string(), - approximate: false, - }); + let target_rel = self.relative_path(&target_file); - // Recurse into same-file function - self.trace_data_inner( - file, - &callee_name.clone(), - param_name, - max_depth, - current_depth + 1, - hops, - depth_limited, - visited, - ); - } - } - } else { - // Truly unresolved — approximate hop - for (_pos, tracked) in &arg_positions { - hops.push(DataFlowHop { - file: self.relative_path(file), - symbol: callee_name.clone(), - variable: tracked.clone(), - line: node.start_position().row as u32 + 1, - flow_type: "parameter".to_string(), - approximate: true, - }); - } + // Receiver hop: bind tracked base to the method's receiver param name. + if let Some(recv_kind) = receiver_match { + let target_sig = self + .data + .get(&target_file) + .and_then(|fd| fd.symbol_metadata.get(&target_symbol)) + .and_then(|m| m.signature.clone()); + if let Some(sig) = target_sig { + if let Some(recv_name) = extract_receiver_name(&sig, target_lang) { + let is_approx = matches!(recv_kind, ExprMatch::Derived); + hops.push(DataFlowHop { + file: target_rel.clone(), + symbol: target_symbol.clone(), + variable: recv_name.clone(), + line: get_symbol_meta(&target_file, &target_symbol).0, + flow_type: "parameter".to_string(), + approximate: is_approx, + }); + self.trace_data_inner( + &target_file.clone(), + &target_symbol.clone(), + &recv_name, + max_depth, + current_depth + 1, + hops, + depth_limited, + visited, + ); } } } + + // Explicit-arg parameter hops (approximate iff the arg match was Derived). + for (pos, _arg_text, kind) in &arg_matches { + if let Some(param_name) = params.get(*pos) { + let is_approx = matches!(kind, ExprMatch::Derived); + hops.push(DataFlowHop { + file: target_rel.clone(), + symbol: target_symbol.clone(), + variable: param_name.clone(), + line: get_symbol_meta(&target_file, &target_symbol).0, + flow_type: "parameter".to_string(), + approximate: is_approx, + }); + self.trace_data_inner( + &target_file.clone(), + &target_symbol.clone(), + param_name, + max_depth, + current_depth + 1, + hops, + depth_limited, + visited, + ); + } + } } /// Read a single source line (1-based) from a file, trimmed. @@ -2177,6 +2498,69 @@ fn find_node_covering_range( best } +/// Find a function/method declaration node by name within a byte range. +/// +/// The symbol-range reported by [`list_symbols_from_tree`] may include +/// preceding doc comments that are siblings (not ancestors) of the function +/// node. A plain covers-range lookup would fail to narrow past the source +/// file. This helper descends the tree looking for a declaration node whose +/// `name` field matches `symbol_name` and whose byte range is contained in +/// `[start, end]`. +fn find_function_decl_node<'a>( + root: tree_sitter::Node<'a>, + source: &str, + start: usize, + end: usize, + symbol_name: &str, +) -> Option> { + fn is_function_decl_kind(kind: &str) -> bool { + matches!( + kind, + "function_declaration" + | "method_declaration" + | "function_definition" + | "function_item" + | "method_definition" + ) + } + + fn node_name_matches(node: tree_sitter::Node, source: &str, name: &str) -> bool { + if let Some(n) = node.child_by_field_name("name") { + return node_text(n, source) == name; + } + false + } + + fn dfs<'a>( + node: tree_sitter::Node<'a>, + source: &str, + start: usize, + end: usize, + name: &str, + ) -> Option> { + if node.end_byte() < start || node.start_byte() > end { + return None; + } + if is_function_decl_kind(node.kind()) && node_name_matches(node, source, name) { + return Some(node); + } + let mut cursor = node.walk(); + if cursor.goto_first_child() { + loop { + if let Some(found) = dfs(cursor.node(), source, start, end, name) { + return Some(found); + } + if !cursor.goto_next_sibling() { + break; + } + } + } + None + } + + dfs(root, source, start, end, symbol_name) +} + /// Find a direct child node by kind name. fn find_child_by_kind<'a>( node: tree_sitter::Node<'a>, @@ -2214,6 +2598,221 @@ fn extract_callee_names(node: tree_sitter::Node, source: &str) -> (Option Option<(String, ExprMatch)> { + if tracked.is_empty() { + return None; + } + let t_expr = expr_text.trim(); + let stripped = strip_ref_deref_prefix(t_expr); + + // Direct: exact identifier (optionally behind &, *, &mut). + for t in tracked { + if t_expr == t.as_str() || stripped == t.as_str() { + return Some((t.clone(), ExprMatch::Direct)); + } + } + // Derived: field / index access on a tracked name. + for t in tracked { + let dot = format!("{}.", t); + let idx = format!("{}[", t); + if t_expr.starts_with(&dot) + || t_expr.starts_with(&idx) + || stripped.starts_with(&dot) + || stripped.starts_with(&idx) + { + return Some((t.clone(), ExprMatch::Derived)); + } + } + // Derived: composite literal containing a tracked value. + if let (Some(open), Some(close)) = (t_expr.find('{'), t_expr.rfind('}')) { + if open < close { + let body = &t_expr[open + 1..close]; + if let Some(name) = literal_body_first_tracked(body, tracked) { + return Some((name, ExprMatch::Derived)); + } + } + } + None +} + +/// Strip a single `&` or `*` prefix, plus an optional `mut ` (Rust). +fn strip_ref_deref_prefix(s: &str) -> &str { + let s = s.trim(); + if let Some(rest) = s.strip_prefix('&') { + let rest = rest.trim_start(); + return rest + .strip_prefix("mut ") + .map(|r| r.trim_start()) + .unwrap_or(rest); + } + if let Some(rest) = s.strip_prefix('*') { + return rest.trim_start(); + } + s +} + +/// Scan a brace-body (content between `{` and `}`) looking for a tracked name +/// appearing as a value (not a key). Handles `key: value` and bare positional +/// entries. Nesting-aware via a depth counter. +fn literal_body_first_tracked(body: &str, tracked: &[String]) -> Option { + for part in split_top_level(body, ',') { + let value_str = match first_top_level(&part, ':') { + Some(colon) => part[colon + 1..].to_string(), + None => part, + }; + let v = strip_ref_deref_prefix(value_str.trim()); + for t in tracked { + if v == t.as_str() { + return Some(t.clone()); + } + let dot = format!("{}.", t); + let idx = format!("{}[", t); + if v.starts_with(&dot) || v.starts_with(&idx) { + return Some(t.clone()); + } + } + } + None +} + +fn split_top_level(s: &str, sep: char) -> Vec { + let mut out = Vec::new(); + let mut depth: i32 = 0; + let mut start = 0; + for (i, c) in s.char_indices() { + match c { + '(' | '[' | '{' => depth += 1, + ')' | ']' | '}' => depth -= 1, + ch if ch == sep && depth == 0 => { + out.push(s[start..i].to_string()); + start = i + c.len_utf8(); + } + _ => {} + } + } + if start < s.len() { + out.push(s[start..].to_string()); + } + out +} + +fn first_top_level(s: &str, sep: char) -> Option { + let mut depth: i32 = 0; + for (i, c) in s.char_indices() { + if c == sep && depth == 0 { + return Some(i); + } + match c { + '(' | '[' | '{' => depth += 1, + ')' | ']' | '}' => depth -= 1, + _ => {} + } + } + None +} + +// --------------------------------------------------------------------------- +// Method receiver name extraction +// --------------------------------------------------------------------------- + +/// Extract the receiver parameter name from a method signature. +/// +/// - Go: `func (u *User) Method(...)` → `Some("u")` +/// - Rust: `fn method(&self, ...)` / `fn method(&mut self, ...)` / `fn method(self, ...)` → `Some("self")` +/// - Other languages or non-methods: `None` +pub(crate) fn extract_receiver_name(signature: &str, lang: LangId) -> Option { + let sig = signature.trim(); + match lang { + LangId::Go => { + // Expect: "func (recv Type) Name(...)" + let rest = sig.strip_prefix("func ")?.trim_start(); + if !rest.starts_with('(') { + return None; + } + let close = first_top_level(&rest[1..], ')')?; + let recv_text = &rest[1..1 + close]; + // recv_text is like "u *User" or "u User" or "User" (anonymous recv) + let parts: Vec<&str> = recv_text.split_whitespace().collect(); + if parts.len() >= 2 { + Some(parts[0].to_string()) + } else { + None + } + } + LangId::Rust => { + let paren_pos = sig.find('(')?; + let after = &sig[paren_pos + 1..]; + let close = first_top_level(after, ')')?; + let params = &after[..close]; + let first = params.split(',').next()?.trim(); + let normalized = first + .trim_start_matches('&') + .trim_start() + .trim_start_matches("mut ") + .trim(); + if normalized == "self" + || normalized.starts_with("self:") + || normalized.starts_with("self ") + { + Some("self".to_string()) + } else { + None + } + } + _ => None, + } +} + +// --------------------------------------------------------------------------- +// Known-writer intrinsics — functions that write an input into a pointer arg +// --------------------------------------------------------------------------- + +/// For a known writer function, returns `(input_positions, output_positions)` +/// where data at `input_positions` flows into pointer args at +/// `output_positions`. +/// +/// Example: `json.Unmarshal(data, &out)` → input=\[0\], output=\[1\]. +pub(crate) fn known_writer_spec(full_callee: &str) -> Option<(Vec, Vec)> { + match full_callee { + // Go: Unmarshal-family — (data, &out) + "json.Unmarshal" + | "yaml.Unmarshal" + | "xml.Unmarshal" + | "toml.Unmarshal" + | "proto.Unmarshal" + | "bson.Unmarshal" + | "msgpack.Unmarshal" => Some((vec![0], vec![1])), + _ => None, + } +} + // --------------------------------------------------------------------------- // Module path resolution // --------------------------------------------------------------------------- @@ -3425,6 +4024,22 @@ function testValidation() { assert_eq!(params, vec!["input", "count"]); } + #[test] + fn extract_parameters_go_method_skips_receiver() { + // Go method: receiver block is the first `(...)`, actual params come second. + let params = extract_parameters( + "func (s *concreteSvc) concreteMethod(x int, y string) int", + LangId::Go, + ); + assert_eq!(params, vec!["x", "y"]); + } + + #[test] + fn extract_parameters_go_method_no_params() { + let params = extract_parameters("func (s *svc) noParams() int", LangId::Go); + assert!(params.is_empty(), "no params should yield empty, got {:?}", params); + } + #[test] fn extract_parameters_empty() { let params = extract_parameters("function noArgs(): void", LangId::TypeScript); diff --git a/crates/aft/src/commands/call_tree.rs b/crates/aft/src/commands/call_tree.rs index df66f784..f794d197 100644 --- a/crates/aft/src/commands/call_tree.rs +++ b/crates/aft/src/commands/call_tree.rs @@ -69,7 +69,8 @@ pub fn handle_call_tree(req: &RawRequest, ctx: &AppContext) -> Response { Ok(data) => { // Check if the symbol exists in the file (as a call-site container or exported symbol) let has_symbol = data.calls_by_symbol.contains_key(symbol) - || data.exported_symbols.contains(&symbol.to_string()); + || data.exported_symbols.contains(&symbol.to_string()) + || data.symbol_metadata.contains_key(symbol); if !has_symbol { return Response::error( &req.id, @@ -85,7 +86,11 @@ pub fn handle_call_tree(req: &RawRequest, ctx: &AppContext) -> Response { match graph.forward_tree(&file_path, symbol, depth) { Ok(tree) => { - let tree_json = serde_json::to_value(&tree).unwrap_or_default(); + let text = tree.render_text(); + let mut tree_json = serde_json::to_value(&tree).unwrap_or_default(); + if let Some(obj) = tree_json.as_object_mut() { + obj.insert("text".to_string(), serde_json::Value::String(text)); + } Response::success(&req.id, tree_json) } Err(e) => Response::error(&req.id, e.code(), e.to_string()), diff --git a/crates/aft/src/commands/callers.rs b/crates/aft/src/commands/callers.rs index 88216c7b..8f9283df 100644 --- a/crates/aft/src/commands/callers.rs +++ b/crates/aft/src/commands/callers.rs @@ -69,7 +69,8 @@ pub fn handle_callers(req: &RawRequest, ctx: &AppContext) -> Response { match graph.build_file(&file_path) { Ok(data) => { let has_symbol = data.calls_by_symbol.contains_key(symbol) - || data.exported_symbols.contains(&symbol.to_string()); + || data.exported_symbols.contains(&symbol.to_string()) + || data.symbol_metadata.contains_key(symbol); if !has_symbol { return Response::error( &req.id, @@ -85,7 +86,11 @@ pub fn handle_callers(req: &RawRequest, ctx: &AppContext) -> Response { match graph.callers_of(&file_path, symbol, depth) { Ok(result) => { - let result_json = serde_json::to_value(&result).unwrap_or_default(); + let text = result.render_text(); + let mut result_json = serde_json::to_value(&result).unwrap_or_default(); + if let Some(obj) = result_json.as_object_mut() { + obj.insert("text".to_string(), serde_json::Value::String(text)); + } Response::success(&req.id, result_json) } Err(e) => Response::error(&req.id, e.code(), e.to_string()), diff --git a/crates/aft/src/commands/impact.rs b/crates/aft/src/commands/impact.rs index 96dbf258..bfc6e27e 100644 --- a/crates/aft/src/commands/impact.rs +++ b/crates/aft/src/commands/impact.rs @@ -89,7 +89,11 @@ pub fn handle_impact(req: &RawRequest, ctx: &AppContext) -> Response { match graph.impact(&file_path, symbol, depth) { Ok(result) => { - let result_json = serde_json::to_value(&result).unwrap_or_default(); + let text = result.render_text(); + let mut result_json = serde_json::to_value(&result).unwrap_or_default(); + if let Some(obj) = result_json.as_object_mut() { + obj.insert("text".to_string(), serde_json::Value::String(text)); + } Response::success(&req.id, result_json) } Err(e) => Response::error(&req.id, e.code(), e.to_string()), diff --git a/crates/aft/src/commands/trace_data.rs b/crates/aft/src/commands/trace_data.rs index 79353e98..c5086ad9 100644 --- a/crates/aft/src/commands/trace_data.rs +++ b/crates/aft/src/commands/trace_data.rs @@ -102,7 +102,11 @@ pub fn handle_trace_data(req: &RawRequest, ctx: &AppContext) -> Response { match graph.trace_data(&file_path, symbol, expression, depth) { Ok(result) => { - let result_json = serde_json::to_value(&result).unwrap_or_default(); + let text = result.render_text(); + let mut result_json = serde_json::to_value(&result).unwrap_or_default(); + if let Some(obj) = result_json.as_object_mut() { + obj.insert("text".to_string(), serde_json::Value::String(text)); + } Response::success(&req.id, result_json) } Err(e) => Response::error(&req.id, e.code(), e.to_string()), diff --git a/crates/aft/src/commands/trace_to.rs b/crates/aft/src/commands/trace_to.rs index 64d7d8b6..c914a9a5 100644 --- a/crates/aft/src/commands/trace_to.rs +++ b/crates/aft/src/commands/trace_to.rs @@ -89,7 +89,11 @@ pub fn handle_trace_to(req: &RawRequest, ctx: &AppContext) -> Response { match graph.trace_to(&file_path, symbol, depth) { Ok(result) => { - let result_json = serde_json::to_value(&result).unwrap_or_default(); + let text = result.render_text(); + let mut result_json = serde_json::to_value(&result).unwrap_or_default(); + if let Some(obj) = result_json.as_object_mut() { + obj.insert("text".to_string(), serde_json::Value::String(text)); + } Response::success(&req.id, result_json) } Err(e) => Response::error(&req.id, e.code(), e.to_string()), diff --git a/crates/aft/tests/fixtures/callgraph/data_flow_patterns.go b/crates/aft/tests/fixtures/callgraph/data_flow_patterns.go new file mode 100644 index 00000000..6a35c99e --- /dev/null +++ b/crates/aft/tests/fixtures/callgraph/data_flow_patterns.go @@ -0,0 +1,64 @@ +package callgraph + +import "encoding/json" + +type User struct { + Name string +} + +type Wrapper struct { + Name string +} + +type Ctx struct{} + +// Gap 1: reference arg (&u) should produce a direct parameter hop. +func refArgCase(u User) { + saveRef(&u) +} + +func saveRef(user *User) {} + +// Gap 2: field access as arg (u.Name) should produce an approximate parameter hop. +func fieldArgCase(u User) { + consumeString(u.Name) +} + +func consumeString(name string) {} + +// Gap 3: struct literal wrap (Wrapper{Name: name}) should produce an approximate +// assignment hop, and subsequent uses of the new binding should be tracked. +func structLitCase(name string) { + w := Wrapper{Name: name} + saveWrapper(w) +} + +func saveWrapper(wr Wrapper) {} + +// Gap 4: intrinsic pointer-arg write (json.Unmarshal(raw, &user)) should bind +// raw's flow into user, so the subsequent call passes user as tracked. +func pointerWriteCase(raw []byte) { + var user User + _ = json.Unmarshal(raw, &user) + consumeUser(user) +} + +func consumeUser(u User) {} + +// Gap 5: method receiver (u.saveMethod(...)) should produce a parameter hop +// into the method's receiver parameter. +func methodReceiverCase(u User) { + u.saveMethod(Ctx{}) +} + +func (u *User) saveMethod(c Ctx) {} + +// Gap 6: field-write origin (m.Account = name) should produce a field_write hop +// showing that the tracked value flowed into the field. +type Message struct { + Account string +} + +func fieldWriteCase(m *Message, name string) { + m.Account = name +} diff --git a/crates/aft/tests/fixtures/callgraph/go_resolution.go b/crates/aft/tests/fixtures/callgraph/go_resolution.go new file mode 100644 index 00000000..1d686513 --- /dev/null +++ b/crates/aft/tests/fixtures/callgraph/go_resolution.go @@ -0,0 +1,53 @@ +package callgraph + +// Case A: bare package-level function called unqualified from the same file. +// `callers` / `trace_to` should surface the call site at line in barePkgCaller. +func barePkgTarget(x int) int { + return x + 1 +} + +func barePkgCaller(x int) int { + return barePkgTarget(x) +} + +// Case B: method on a concrete receiver called via a local var of that type. +// `s.concreteMethod(...)` where `s` is typed `*concreteSvc` should resolve +// to `func (s *concreteSvc) concreteMethod(...)`. +type concreteSvc struct{} + +func (s *concreteSvc) concreteMethod(x int) int { + return x * 2 +} + +func concreteMethodCaller(x int) int { + s := &concreteSvc{} + return s.concreteMethod(x) +} + +// Case C: interface-method dispatch. A variable typed as an interface should +// resolve to every implementation of that interface's method. +type Doer interface { + Do(x int) int +} + +type doerA struct{} + +func (a *doerA) Do(x int) int { return x + 10 } + +type doerB struct{} + +func (b *doerB) Do(x int) int { return x + 100 } + +func interfaceCaller(d Doer, x int) int { + return d.Do(x) +} + +// Case D: field-write origin. Writing to a field of a tracked value should +// register a hop from the RHS into the LHS field. +type Message struct { + Account string +} + +func fieldWriteCase(m *Message, name string) { + m.Account = name +} diff --git a/crates/aft/tests/integration/callgraph_test.rs b/crates/aft/tests/integration/callgraph_test.rs index 6d9f7185..014f7bce 100644 --- a/crates/aft/tests/integration/callgraph_test.rs +++ b/crates/aft/tests/integration/callgraph_test.rs @@ -1186,3 +1186,225 @@ fn callgraph_trace_data_approximation() { aft.shutdown(); } + +// --------------------------------------------------------------------------- +// trace_data pattern tests — Go fixtures for the 5 gaps +// (reference arg, field access, struct literal, intrinsic writer, method receiver) +// --------------------------------------------------------------------------- + +fn trace_data_configure_go(aft: &mut AftProcess) -> String { + let fixtures = fixture_path("callgraph"); + let root = fixtures.display().to_string(); + aft.send(&format!( + r#"{{"id":"1","command":"configure","project_root":"{}"}}"#, + root + )); + root +} + +/// Gap 1: a tracked name passed as `&x` should produce a direct (non-approximate) +/// parameter hop into the callee. +#[test] +fn callgraph_trace_data_reference_arg() { + let mut aft = AftProcess::spawn(); + let root = trace_data_configure_go(&mut aft); + + let resp = aft.send(&format!( + r#"{{"id":"2","command":"trace_data","file":"{}/data_flow_patterns.go","symbol":"refArgCase","expression":"u","depth":5}}"#, + root + )); + + assert_eq!(resp["success"], true, "trace_data should succeed: {:?}", resp); + let hops = resp["hops"].as_array().expect("hops array"); + + let param_hop = hops.iter().find(|h| { + h["flow_type"] == "parameter" && h["symbol"] == "saveRef" + }); + assert!( + param_hop.is_some(), + "expected parameter hop into saveRef for &u, got hops: {:?}", + hops + ); + let ph = param_hop.unwrap(); + assert_eq!(ph["variable"], "user", "should map to saveRef's param 'user'"); + assert_eq!( + ph["approximate"], false, + "reference arg is a direct flow, not approximate" + ); + + aft.shutdown(); +} + +/// Gap 2: a tracked name used as `x.F` in an argument should produce an +/// approximate parameter hop. +#[test] +fn callgraph_trace_data_field_access_arg() { + let mut aft = AftProcess::spawn(); + let root = trace_data_configure_go(&mut aft); + + let resp = aft.send(&format!( + r#"{{"id":"2","command":"trace_data","file":"{}/data_flow_patterns.go","symbol":"fieldArgCase","expression":"u","depth":5}}"#, + root + )); + + assert_eq!(resp["success"], true, "trace_data should succeed: {:?}", resp); + let hops = resp["hops"].as_array().expect("hops array"); + + let param_hop = hops.iter().find(|h| { + h["flow_type"] == "parameter" && h["symbol"] == "consumeString" + }); + assert!( + param_hop.is_some(), + "expected parameter hop into consumeString for u.Name, got hops: {:?}", + hops + ); + let ph = param_hop.unwrap(); + assert_eq!(ph["variable"], "name"); + assert_eq!( + ph["approximate"], true, + "field-access flow is approximate (we know u flowed in, but only via a field)" + ); + + aft.shutdown(); +} + +/// Gap 3: `w := Wrapper{Name: name}` where `name` is tracked should produce an +/// approximate assignment hop binding the struct-literal target, and subsequent +/// uses of the new binding should be tracked. +#[test] +fn callgraph_trace_data_struct_literal_assign() { + let mut aft = AftProcess::spawn(); + let root = trace_data_configure_go(&mut aft); + + let resp = aft.send(&format!( + r#"{{"id":"2","command":"trace_data","file":"{}/data_flow_patterns.go","symbol":"structLitCase","expression":"name","depth":5}}"#, + root + )); + + assert_eq!(resp["success"], true, "trace_data should succeed: {:?}", resp); + let hops = resp["hops"].as_array().expect("hops array"); + + let assign_hop = hops.iter().find(|h| { + h["flow_type"] == "assignment" && h["variable"] == "w" + }); + assert!( + assign_hop.is_some(), + "expected assignment hop binding 'w' from struct literal, got hops: {:?}", + hops + ); + assert_eq!( + assign_hop.unwrap()["approximate"], true, + "struct-literal wrap is approximate" + ); + + // The new binding `w` should propagate to saveWrapper's parameter. + let param_hop = hops.iter().find(|h| { + h["flow_type"] == "parameter" && h["symbol"] == "saveWrapper" + }); + assert!( + param_hop.is_some(), + "struct-lit-bound name should propagate to saveWrapper's param, got hops: {:?}", + hops + ); + assert_eq!(param_hop.unwrap()["variable"], "wr"); + + aft.shutdown(); +} + +/// Gap 4: `json.Unmarshal(raw, &user)` where `raw` is tracked should bind +/// `raw`'s flow into `user` via the known-writer intrinsic, and subsequent +/// uses of `user` should be tracked. +#[test] +fn callgraph_trace_data_pointer_write_intrinsic() { + let mut aft = AftProcess::spawn(); + let root = trace_data_configure_go(&mut aft); + + let resp = aft.send(&format!( + r#"{{"id":"2","command":"trace_data","file":"{}/data_flow_patterns.go","symbol":"pointerWriteCase","expression":"raw","depth":5}}"#, + root + )); + + assert_eq!(resp["success"], true, "trace_data should succeed: {:?}", resp); + let hops = resp["hops"].as_array().expect("hops array"); + + // Should produce a "writer" hop (approximate) binding `user`. + let writer_hop = hops.iter().find(|h| { + h["variable"] == "user" + && (h["flow_type"] == "writer" || h["flow_type"] == "assignment") + }); + assert!( + writer_hop.is_some(), + "expected a writer/assignment hop binding 'user' from json.Unmarshal, got hops: {:?}", + hops + ); + + // Should propagate to consumeUser's parameter as a direct flow. + let param_hop = hops.iter().find(|h| { + h["flow_type"] == "parameter" && h["symbol"] == "consumeUser" + }); + assert!( + param_hop.is_some(), + "writer-bound 'user' should propagate to consumeUser, got hops: {:?}", + hops + ); + assert_eq!(param_hop.unwrap()["variable"], "u"); + + aft.shutdown(); +} + +/// Gap 5: `u.saveMethod(...)` where `u` is tracked should produce a parameter +/// hop binding to the method's receiver name. +#[test] +fn callgraph_trace_data_method_receiver() { + let mut aft = AftProcess::spawn(); + let root = trace_data_configure_go(&mut aft); + + let resp = aft.send(&format!( + r#"{{"id":"2","command":"trace_data","file":"{}/data_flow_patterns.go","symbol":"methodReceiverCase","expression":"u","depth":5}}"#, + root + )); + + assert_eq!(resp["success"], true, "trace_data should succeed: {:?}", resp); + let hops = resp["hops"].as_array().expect("hops array"); + + let recv_hop = hops.iter().find(|h| { + h["flow_type"] == "parameter" && h["symbol"] == "saveMethod" + }); + assert!( + recv_hop.is_some(), + "expected parameter hop into saveMethod's receiver, got hops: {:?}", + hops + ); + // The receiver's local name inside saveMethod is `u` (see fixture). + assert_eq!(recv_hop.unwrap()["variable"], "u"); + + aft.shutdown(); +} + +/// Gap 6: `m.Account = name` should produce a `field_write` hop with variable +/// `m.Account`, since the tracked value flowed into a field. +#[test] +fn callgraph_trace_data_field_write() { + let mut aft = AftProcess::spawn(); + let root = trace_data_configure_go(&mut aft); + + let resp = aft.send(&format!( + r#"{{"id":"2","command":"trace_data","file":"{}/data_flow_patterns.go","symbol":"fieldWriteCase","expression":"name","depth":5}}"#, + root + )); + + assert_eq!(resp["success"], true, "trace_data should succeed: {:?}", resp); + let hops = resp["hops"].as_array().expect("hops array"); + + let fw_hop = hops.iter().find(|h| { + h["flow_type"] == "field_write" && h["symbol"] == "fieldWriteCase" + }); + assert!( + fw_hop.is_some(), + "expected field_write hop for m.Account = name, got hops: {:?}", + hops + ); + assert_eq!(fw_hop.unwrap()["variable"], "m.Account"); + + aft.shutdown(); +} diff --git a/scripts/install-claude-hooks.sh b/scripts/install-claude-hooks.sh index 056c6c91..70d86941 100755 --- a/scripts/install-claude-hooks.sh +++ b/scripts/install-claude-hooks.sh @@ -43,15 +43,16 @@ cat > "$HOOKS_DIR/aft" << 'WRAPPER_EOF' # Usage: aft [args...] # # Commands: -# outline - Get file/directory structure (symbols, functions, classes) -# zoom [symbol] - Inspect symbol with call-graph annotations -# call_tree - What does this function call? (forward graph) -# callers - Who calls this function? (reverse graph) -# impact - What breaks if this changes? -# trace_to - How does execution reach this function? -# read [start] [limit] - Read file with line numbers -# grep [path] - Search with trigram index -# glob [path] - Find files by pattern +# outline - Get file/directory structure (symbols, functions, classes) +# zoom [symbol] - Inspect symbol with call-graph annotations +# call_tree - What does this function call? (forward graph) +# callers - Who calls this function? (reverse graph) +# impact - What breaks if this changes? +# trace_to - How does execution reach this function? +# trace_data [depth] - How does this value flow through the code? +# read [start] [limit] - Read file with line numbers +# grep [path] - Search with trigram index +# glob [path] - Find files by pattern set -euo pipefail @@ -73,7 +74,10 @@ call_aft() { local config_req=$(jq -cn --arg root "$WORK_DIR" '{id:"cfg",command:"configure",project_root:$root}') local cmd_req=$(echo "$params" | jq -c --arg cmd "$cmd" '{id:"cmd",command:$cmd} + .') - local result=$( (echo "$config_req"; echo "$cmd_req") | "$AFT_BINARY" 2>/dev/null | grep '"id":"cmd"' | head -1) + # `awk '… exit'` drains stdin safely; `grep | head -1` under `set -o pipefail` + # triggers SIGPIPE (exit 141) on the upstream grep once the response exceeds the + # pipe buffer, silently killing the script on large outlines. + local result=$( (echo "$config_req"; echo "$cmd_req") | "$AFT_BINARY" 2>/dev/null | awk '/"id":"cmd"/ {print; found=1; exit} END {exit !found}') # Check success local success=$(echo "$result" | jq -r '.success // false') @@ -102,11 +106,19 @@ case "$CMD" in # Check if directory - discover source files if [ -d "$FILE" ]; then + # `awk 'NR<=100'` caps output without SIGPIPE-ing the upstream find; + # `head -100` would close stdin early and, under `set -o pipefail`, kill the script. FILES=$(find "$FILE" -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \ -o -name "*.py" -o -name "*.rs" -o -name "*.go" -o -name "*.c" -o -name "*.cpp" -o -name "*.h" \ -o -name "*.java" -o -name "*.rb" -o -name "*.md" \) \ ! -path "*/node_modules/*" ! -path "*/.git/*" ! -path "*/target/*" ! -path "*/dist/*" \ - 2>/dev/null | head -100 | jq -R . | jq -s .) + 2>/dev/null | awk 'NR<=100' | jq -R . | jq -s .) + + FILE_COUNT=$(echo "$FILES" | jq 'length') + if [ "$FILE_COUNT" = "0" ]; then + echo "No supported source files found in '$FILE' (looked for .ts/.tsx/.js/.jsx/.py/.rs/.go/.c/.cpp/.h/.java/.rb/.md, excluding node_modules/target/dist/.git)." >&2 + exit 1 + fi PARAMS=$(jq -cn --argjson files "$FILES" '{files:$files}') else PARAMS=$(jq -cn --arg f "$FILE" '{file:$f}') @@ -163,6 +175,23 @@ case "$CMD" in call_aft "trace_to" "$PARAMS" ;; + trace_data) + FILE="${1:-}" + SYMBOL="${2:-}" + EXPR="${3:-}" + DEPTH="${4:-5}" + if [ -z "$FILE" ] || [ -z "$SYMBOL" ] || [ -z "$EXPR" ]; then + echo "Usage: aft trace_data [depth]" + echo " Traces how flows through assignments and across function boundaries." + echo " is the function containing the expression; [depth] defaults to 5 (max 100)." + exit 1 + fi + + PARAMS=$(jq -cn --arg f "$FILE" --arg s "$SYMBOL" --arg e "$EXPR" --argjson d "$DEPTH" \ + '{file:$f,symbol:$s,expression:$e,depth:$d}') + call_aft "trace_data" "$PARAMS" + ;; + read) FILE="${1:-}" START="${2:-1}" @@ -197,23 +226,26 @@ case "$CMD" in AFT - Agent File Tools (Tree-sitter powered code analysis) SEMANTIC COMMANDS (massive context savings): - aft outline Structure without content (~10% tokens) - aft zoom Symbol + call graph annotations - aft call_tree Forward call graph (what does it call?) - aft callers Reverse call graph (who calls it?) - aft impact What breaks if this changes? - aft trace_to How does execution reach this? + aft outline Structure without content (~10% tokens) + aft zoom Symbol + call graph annotations + aft call_tree Forward call graph (what does it call?) + aft callers Reverse call graph (who calls it?) + aft impact What breaks if this changes? + aft trace_to How does execution reach this? + aft trace_data [depth] + How does a value flow through assignments/calls? BASIC COMMANDS: - aft read [start] [limit] Read with line numbers - aft grep [path] Trigram-indexed search - aft glob [path] File pattern matching + aft read [start] [limit] Read with line numbers + aft grep [path] Trigram-indexed search + aft glob [path] File pattern matching EXAMPLES: - aft outline src/ # Get structure of all files in src/ - aft zoom main.go main # Inspect main() with call graph - aft callers api.go HandleRequest # Find all callers - aft call_tree service.go Process # See what Process() calls + aft outline src/ # Get structure of all files in src/ + aft zoom main.go main # Inspect main() with call graph + aft callers api.go HandleRequest # Find all callers + aft call_tree service.go Process # See what Process() calls + aft trace_data svc.go handle userId # Trace where userId came from and where it goes EOF ;; @@ -256,7 +288,8 @@ call_aft() { local config_req=$(jq -cn --arg root "$WORK_DIR" '{id:"cfg",command:"configure",project_root:$root}') local cmd_req=$(echo "$params" | jq -c --arg cmd "$cmd" '{id:"cmd",command:$cmd} + .') - (echo "$config_req"; echo "$cmd_req") | "$AFT_BINARY" 2>/dev/null | grep '"id":"cmd"' | head -1 + # awk avoids the SIGPIPE-from-head-under-pipefail trap that silently killed large responses. + (echo "$config_req"; echo "$cmd_req") | "$AFT_BINARY" 2>/dev/null | awk '/"id":"cmd"/ {print; found=1; exit} END {exit !found}' } case "$TOOL_NAME" in @@ -351,11 +384,24 @@ cat > "$CLAUDE_DIR/AFT.md" << 'INSTRUCTIONS_EOF' Tree-sitter powered code analysis for massive context savings (60-90% token reduction). +## Start With Outline, Escalate From There + +**Outline is the default entry point.** Before reading full files, run `aft outline` to get structure — ~10% the tokens of a full read. This applies to code, markdown, config, and docs. + +**Escalate to semantic commands only when the task needs them:** +- `aft zoom ` — when you need to read a specific function body. +- `aft call_tree` / `aft callers` — when you need cross-file call relationships (grep can't infer these). +- `aft impact` — before a refactor, to see what breaks. +- `aft trace_to` — when debugging how execution reaches a point. +- `aft trace_data` — when tracking where a value came from or where it flows next. + +**Don't use semantic commands reflexively.** For verification tasks — "does this symbol still exist?", "is this doc accurate?" — outline alone is usually enough. Reaching for zoom/call_tree on every task inflates work without improving answers. + ## AFT CLI Commands Use `aft` commands via Bash for code navigation. These provide structured output optimized for LLM consumption. -### Semantic Commands (prefer these over raw file reads) +### Semantic Commands ```bash # Get structure without content (~10% of full read tokens) @@ -373,8 +419,11 @@ aft callers # Impact analysis - what breaks if this changes? aft impact -# Trace analysis - how does execution reach this? +# Control flow - how does execution reach this function? aft trace_to + +# Data flow - how does a value flow through assignments and across calls? +aft trace_data [depth] ``` ### Basic Commands @@ -385,6 +434,30 @@ aft grep [path] # Trigram-indexed search aft glob [path] # File pattern matching ``` +## Tracing: control flow vs. data flow + +Two different questions, two commands: +- **"How does execution reach this function?"** → `aft trace_to` (control flow). + Example: `aft trace_to api/handler.go ChargePayment` — shows the call chain that lands on ChargePayment. +- **"Where did this value come from / where does it go next?"** → `aft trace_data` (data flow through assignments and parameter passing). + Example: `aft trace_data api/handler.go ChargePayment merchantID` — traces how `merchantID` propagates within and across function boundaries. + +For a bug like "this field got the wrong value," `trace_data` is usually the right starting point; for "why did this handler run," `trace_to` is. + +### Patterns trace_data handles + +`trace_data` follows values across these constructs — use it confidently on idiomatic code instead of manually reading every caller: + +- **Direct args**: `f(x)` → hop into `f`'s matching parameter. +- **Reference args**: `f(&x)` → hop into `f`'s pointer parameter. +- **Field-access args**: `f(x.Field)` → approximate hop into `f`'s matching parameter (propagation continues). +- **Struct-literal wraps**: `w := Wrapper{Field: x}` → approximate assignment hop to `w`, then tracking continues on `w`. +- **Pointer-write intrinsics** (`json.Unmarshal`, `yaml.Unmarshal`, `xml.Unmarshal`, `toml.Unmarshal`, `proto.Unmarshal`, `bson.Unmarshal`, `msgpack.Unmarshal`): `json.Unmarshal(raw, &out)` binds `raw`'s flow into `out`, and further uses of `out` are tracked. +- **Method receivers**: `x.Method(...)` → hop into the receiver parameter name (Go `func (u *T) Method(...)`, Rust `&self`). +- **Destructuring assigns**: `a, b := f()` and `{a, b} = f()` → tracking splits onto the new bindings. + +Hops marked `"approximate": true` are lossy (field access, struct wraps, writer intrinsics) — the flow exists but the exact subfield is not resolved. + ## When to Use What | Task | Command | Token Savings | @@ -394,14 +467,32 @@ aft glob [path] # File pattern matching | Understanding dependencies | `aft call_tree` | Structured graph | | Finding usage sites | `aft callers` | All call sites | | Planning refactors | `aft impact` | Change propagation | -| Debugging call paths | `aft trace_to` | Execution paths | +| Debugging control flow | `aft trace_to` | Execution paths | +| Debugging data flow | `aft trace_data` | Value propagation | + +## Rules + +Match the command to the task type. Outline is universal; the semantic graph tools pay off for *comprehension* tasks, not for *verification* tasks. + +**Verification** ("does X still exist?", "is this doc accurate?"): +1. Start with `aft outline`. +2. Outline is usually enough — don't reach for zoom/call_tree unless you need behavior, not just presence. +3. **Outline before delegating.** When briefing a subagent to explore a repo or directory, run `aft outline ` yourself first and include the output in the subagent prompt. + +**Comprehension** ("how does this flow work?", "what breaks if I change X?", "where did this value come from?"): +4. Use `zoom` for function bodies. +5. Use `call_tree` / `callers` for cross-file call relationships grep cannot see. +6. Use `impact` before a refactor. +7. Use `trace_to` for control flow questions, `trace_data` for data flow questions. + +**When grep is fine.** `aft grep` for a bare identifier is correct when you just need to know "does this string appear, and where." Reach for semantic commands when you need to understand *behavior* behind the name, not every time a name shows up. -## Best Practices +## Context Protection -1. **Start with outline** - Before reading a file, use `aft outline` to understand structure -2. **Zoom to symbols** - Instead of reading full files, use `aft zoom` for specific functions -3. **Use call graphs** - For understanding code flow, `call_tree` and `callers` are more efficient than grep -4. **Impact before refactor** - Run `aft impact` before making changes to understand blast radius +Context is finite. Even when a user explicitly requests "contents" or "read all files": +- For directories with 5+ files, run `aft outline` first and confirm which files are actually needed. +- Never read more than 3-5 files in a single action without confirming user intent. +- "Read all files" is a request, not a command to fill context — propose outline + selective reads instead. ## Supported Languages