Skip to content
This repository was archived by the owner on May 27, 2026. It is now read-only.

feat: JIRA project-key → work_type mapping in classifier (tier-3 signal) #62

@bob-duetto

Description

@bob-duetto

Problem

The batch classifier (batch_classifier.py) does not map JIRA project keys extracted from commit messages to work_type. When a commit message contains a ticket reference like ADV-1234 or FD-567, the classifier has no configured mapping from the project key (ADV, FD) to a work type (NPS enhancements, Bug fixes, etc.).

This is the primary cause of the catch-all maintenance / KTLO over-classification in the CTO analytics app. Roughly 60–70 % of commits that reference known feature-work project keys (ADV = Advance, FD = Forecasting, BI = Business Intelligence, BB = Blockbuster, IS = Integrations) were classified as KTLO because the classifier fell through to heuristics.

Current behavior

The batch classifier extracts ticket_references (JSON array) from commit messages. It stores them in commit_ticket_correlations. However, there is no code path that maps the project key from those references to a work_type category before or after LLM classification.

Desired behavior

Add a tier-3 classification rule: after heuristic pattern matching and before falling back to maintenance:

  1. Extract [A-Z]{2,6}-\d+ from the commit message.
  2. Look up the project key in a configurable jira_project_work_type mapping (YAML config section).
  3. If a mapping exists, use it as the work_type — but apply a message-aware veto: if the commit message also matches maintenance patterns (bump, gradle, merge branch, terraform plan, PR-comment noise), do NOT override; keep maintenance.

Example YAML config:

classification:
  jira_project_work_type:
    ADV: "NPS enhancements"   # Advance product
    FD:  "NPS enhancements"   # Forecasting/Demand
    BI:  "NPS enhancements"   # Business Intelligence
    BB:  "NPS enhancements"   # Blockbuster
    IS:  "Integrations"       # Integrations
    DP:  "Platform work"      # Data Platform
    INF: "Platform work"      # Infrastructure
    SEC: "Platform work"      # Security
    BUG: "Bug fixes"          # Bug project

Impact

This is the single largest source of classification inaccuracy in the CTO analytics app. It required a 3-tier manual override system (commit_classification_overrides table + v_fact_commits_merged view) patched directly into the CTO DuckDB to correct the data. Fixing it in gitflow-analytics at source would eliminate the need for ongoing manual correction.

Implementation hint

Add config field to schema.py (jira_project_work_type: dict[str, str] = field(default_factory=dict)). In batch_classifier.py _classify_with_rules(), after the existing heuristic tiers, add a JIRA-key lookup tier. Veto patterns already partially exist in the maintenance pattern list — extract them into a shared _MAINTENANCE_VETO_PATTERNS constant reused by both the maintenance classifier and this new tier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions