Problem
The batch classifier (batch_classifier.py) does not map JIRA project keys extracted from commit messages to work_type. When a commit message contains a ticket reference like ADV-1234 or FD-567, the classifier has no configured mapping from the project key (ADV, FD) to a work type (NPS enhancements, Bug fixes, etc.).
This is the primary cause of the catch-all maintenance / KTLO over-classification in the CTO analytics app. Roughly 60–70 % of commits that reference known feature-work project keys (ADV = Advance, FD = Forecasting, BI = Business Intelligence, BB = Blockbuster, IS = Integrations) were classified as KTLO because the classifier fell through to heuristics.
Current behavior
The batch classifier extracts ticket_references (JSON array) from commit messages. It stores them in commit_ticket_correlations. However, there is no code path that maps the project key from those references to a work_type category before or after LLM classification.
Desired behavior
Add a tier-3 classification rule: after heuristic pattern matching and before falling back to maintenance:
- Extract
[A-Z]{2,6}-\d+ from the commit message.
- Look up the project key in a configurable
jira_project_work_type mapping (YAML config section).
- If a mapping exists, use it as the
work_type — but apply a message-aware veto: if the commit message also matches maintenance patterns (bump, gradle, merge branch, terraform plan, PR-comment noise), do NOT override; keep maintenance.
Example YAML config:
classification:
jira_project_work_type:
ADV: "NPS enhancements" # Advance product
FD: "NPS enhancements" # Forecasting/Demand
BI: "NPS enhancements" # Business Intelligence
BB: "NPS enhancements" # Blockbuster
IS: "Integrations" # Integrations
DP: "Platform work" # Data Platform
INF: "Platform work" # Infrastructure
SEC: "Platform work" # Security
BUG: "Bug fixes" # Bug project
Impact
This is the single largest source of classification inaccuracy in the CTO analytics app. It required a 3-tier manual override system (commit_classification_overrides table + v_fact_commits_merged view) patched directly into the CTO DuckDB to correct the data. Fixing it in gitflow-analytics at source would eliminate the need for ongoing manual correction.
Implementation hint
Add config field to schema.py (jira_project_work_type: dict[str, str] = field(default_factory=dict)). In batch_classifier.py _classify_with_rules(), after the existing heuristic tiers, add a JIRA-key lookup tier. Veto patterns already partially exist in the maintenance pattern list — extract them into a shared _MAINTENANCE_VETO_PATTERNS constant reused by both the maintenance classifier and this new tier.
Problem
The batch classifier (
batch_classifier.py) does not map JIRA project keys extracted from commit messages towork_type. When a commit message contains a ticket reference likeADV-1234orFD-567, the classifier has no configured mapping from the project key (ADV,FD) to a work type (NPS enhancements,Bug fixes, etc.).This is the primary cause of the catch-all
maintenance/ KTLO over-classification in the CTO analytics app. Roughly 60–70 % of commits that reference known feature-work project keys (ADV = Advance, FD = Forecasting, BI = Business Intelligence, BB = Blockbuster, IS = Integrations) were classified asKTLObecause the classifier fell through to heuristics.Current behavior
The batch classifier extracts
ticket_references(JSON array) from commit messages. It stores them incommit_ticket_correlations. However, there is no code path that maps the project key from those references to awork_typecategory before or after LLM classification.Desired behavior
Add a tier-3 classification rule: after heuristic pattern matching and before falling back to
maintenance:[A-Z]{2,6}-\d+from the commit message.jira_project_work_typemapping (YAML config section).work_type— but apply a message-aware veto: if the commit message also matches maintenance patterns (bump,gradle,merge branch,terraform plan, PR-comment noise), do NOT override; keepmaintenance.Example YAML config:
Impact
This is the single largest source of classification inaccuracy in the CTO analytics app. It required a 3-tier manual override system (
commit_classification_overridestable +v_fact_commits_mergedview) patched directly into the CTO DuckDB to correct the data. Fixing it in gitflow-analytics at source would eliminate the need for ongoing manual correction.Implementation hint
Add config field to
schema.py(jira_project_work_type: dict[str, str] = field(default_factory=dict)). Inbatch_classifier.py_classify_with_rules(), after the existing heuristic tiers, add a JIRA-key lookup tier. Veto patterns already partially exist in themaintenancepattern list — extract them into a shared_MAINTENANCE_VETO_PATTERNSconstant reused by both the maintenance classifier and this new tier.