Skip to content

feat(correlation): add workflow-aware confidence scoring#2762

Draft
cerencamkiran wants to merge 3 commits into
Tracer-Cloud:mainfrom
cerencamkiran:feat/workflow-aware-confidence
Draft

feat(correlation): add workflow-aware confidence scoring#2762
cerencamkiran wants to merge 3 commits into
Tracer-Cloud:mainfrom
cerencamkiran:feat/workflow-aware-confidence

Conversation

@cerencamkiran

@cerencamkiran cerencamkiran commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Fixes #1441

Summary

Adds workflow-aware reasoning and shared confidence scoring to the correlation pipeline.

Previously, correlation ranking relied primarily on time-window correlation, topology adjacency, periodicity, and operator hints. This PR introduces a feature/workflow hypothesis layer that allows the system to associate likely initiating features or workflows with correlated candidates and explain why a candidate was ranked highly.

The implementation also adds a lightweight file-based configuration mechanism for endpoint-to-feature mapping, feature-to-service mapping, and optional operator hints such as recently shipped features or scheduled workflows.

What Changed

Shared Confidence Scoring

Added a new shared confidence model that aggregates evidence from:

  • Correlation evidence
  • Topology adjacency evidence
  • Periodicity evidence
  • Feature/workflow hypothesis evidence

Each contribution includes:

  • Source
  • Score
  • Weight
  • Rationale

The final runtime payload now includes:

  • confidence_label
  • evidence_breakdown

allowing downstream consumers to understand what drove the ranking decision.

Feature / Workflow Hypothesis Layer

Added feature workflow scoring that:

  • Matches candidate services against workflow-related operator hints
  • Produces feature/workflow evidence
  • Contributes to shared confidence scoring
  • Generates explainable rationale for ranked candidates

File-Based Feature Configuration

Added a lightweight YAML configuration layer supporting:

  • Endpoint → feature tags
  • Feature → service mapping
  • Optional operator hints

This keeps workflow attribution configurable without requiring code changes.

Tests

Added coverage for:

  • Shared confidence evidence breakdown
  • Feature/workflow hypothesis scoring
  • File-based configuration loading
  • Endpoint-to-feature resolution
  • Feature-to-service resolution
  • Operator hint influence on candidate ranking

Validation

ruff format app/agent/correlation tests/synthetic/rds_postgres/correlation
ruff check app/agent/correlation tests/synthetic/rds_postgres/correlation

pytest tests/synthetic/rds_postgres/correlation \
       tests/synthetic/rds_postgres/test_observation_correlation.py -q

Result:

24 passed
Ekran görüntüsü 2026-06-05 202508 Ekran görüntüsü 2026-06-05 214154

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces workflow-aware confidence scoring to the correlation pipeline by adding a shared SharedConfidence model, a feature/workflow hypothesis scorer, and a YAML-based feature config layer. The divergent weight formula issue from the previous review cycle is resolved — final_confidence now derives directly from shared_confidence.score.

  • Shared confidence model (confidence.py, scoring.py): build_shared_confidence replaces the inline weighted sum, unifying ranking and label computation under a single formula with four named evidence contributions.
  • Feature/workflow layer (feature_config.py, feature_workflow.py, runtime.py): Endpoint-to-feature and feature-to-service mappings are loaded from a YAML file specified via OPENSRE_FEATURE_WORKFLOW_CONFIG; the resulting keywords are merged with metric-name tokens before scoring.
  • Runtime wiring (runtime.py): _runtime_feature_keywords is called once per upstream metric inside the loop, re-reading the config file on every iteration; loading should happen once before the loop. The evidence_breakdown field added to UpstreamCandidate uses tuple[dict[str, object], ...] inside a frozen=True dataclass, which silently breaks hashability when the tuple is non-empty.

Confidence Score: 5/5

Safe to merge; changes are additive and the findings do not affect current runtime behaviour.

All findings are style and performance observations with no current runtime breakage. The redundant YAML re-reads degrade performance but do not produce wrong results. The frozen-dataclass-with-dict issue is latent — nothing in the current call graph hashes an UpstreamCandidate with a populated evidence_breakdown.

app/agent/correlation/runtime.py (config loaded per metric) and app/agent/correlation/models.py (evidence_breakdown field type) are worth a follow-up, but neither blocks merging.

Important Files Changed

Filename Overview
app/agent/correlation/confidence.py New shared confidence model; correctly normalises weighted scores by total_weight and assigns labels — clean implementation.
app/agent/correlation/feature_config.py New YAML config loader and keyword resolver; logic is sound but FeatureWorkflowConfig is declared frozen=True despite holding mutable dict fields.
app/agent/correlation/feature_workflow.py New feature/workflow hypothesis scorer; binary 0/1 scoring is a known limitation already tracked.
app/agent/correlation/models.py Adds confidence_label and evidence_breakdown to UpstreamCandidate; evidence_breakdown uses tuple[dict] which breaks hashability of the frozen dataclass when populated.
app/agent/correlation/runtime.py Wires feature config and workflow scoring into the correlation pipeline; _runtime_feature_keywords reads and parses YAML once per metric in the inner loop instead of once per build_runtime_correlation call.
app/agent/correlation/scoring.py Replaces separate final_confidence formula with shared_confidence.score, resolving the previously noted weight divergence; operator_hint_score removed cleanly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[build_runtime_correlation] --> B[Extract endpoint hint]
    A --> C[For each upstream metric]
    C --> D[score_time_window_correlation]
    C --> E[score_topology_adjacency]
    C --> F[score_periodic_spikes]
    C --> G[_runtime_feature_keywords\nreads YAML per iteration]
    G --> H{OPENSRE_FEATURE_WORKFLOW_CONFIG set?}
    H -- Yes --> I[load_feature_workflow_config]
    H -- No --> J[return empty tuple]
    I --> K[resolve_feature_keywords]
    K --> L[candidate_keywords]
    L --> M[score_feature_workflow_hypothesis]
    D & E & F & M --> N[score_candidate_correlation]
    N --> O[build_shared_confidence]
    O --> P[UpstreamCandidate with confidence_label and evidence_breakdown]
    P --> Q[rank_upstream_candidates]
    Q --> R[correlation_report_to_payload]
Loading

Reviews (4): Last reviewed commit: "feat(correlation): add workflow-aware co..." | Re-trigger Greptile

Comment thread app/agent/correlation/feature_config.py
Comment thread app/agent/correlation/scoring.py Outdated
hint for hint in matched_hints if "workflow" in hint.lower() or "scheduled" in hint.lower()
)

score = 1.0 if matched_hints else 0.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Binary 0/1 scoring inflates confidence for common metric name tokens

score is set to 1.0 whenever any candidate_keyword appears anywhere in any operator hint string, which accounts for the full 15% feature_workflow weight. Short or generic tokens extracted from metric names (e.g., "web", "api", "rds") can easily substring-match loosely-written hint strings, giving a full 1.0 score for what is effectively a loose partial match. Consider a proportional score (e.g., len(matched_hints) / len(operator_hints)) or a minimum keyword length guard beyond the existing len(token) > 2 filter.

@cerencamkiran cerencamkiran marked this pull request as draft June 5, 2026 17:34
Comment thread app/agent/correlation/runtime.py Outdated
Comment thread app/agent/correlation/runtime.py
@cerencamkiran cerencamkiran force-pushed the feat/workflow-aware-confidence branch from dd4bcd0 to f68ce16 Compare June 5, 2026 18:39
@cerencamkiran

Copy link
Copy Markdown
Collaborator Author

@greptile review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature/workflow-aware reasoning + shared confidence scoring

1 participant