Completed: New Project Proposal On Fraud detection using Explainable AI #1796 #1808

Parthavi19 · 2025-10-04T17:00:13Z

Issue Title:
Fraud Detection System with Explainable AI (XAI) — End-to-End Implementation

What's the goal of the project?
The goal of this project is to build a near-real-time fraud detection system with Explainable AI (XAI) support. The system not only detects fraudulent transactions but also explains why a transaction was flagged, provides counterfactual scenarios, reduces false positives, and equips analysts with monitoring and triage tools.

Name
Parthavi Kurugundla
GSSOC’25 Contributor

GitHub ID
@Parthavi19

Email ID
[email protected]

Closes
Closes: #1796

I have implemented the complete fraud detection system as outlined in the proposal for issue #1796. Key contributions include:

Data Preparation (data_prep.py) → Feature engineering (velocity, aggregations), imbalance handling, scaling, and time-based splitting.
Model Training (train_model.py) → XGBoost classifier with PR-AUC & ROC-AUC evaluation.
Data Ingestion (ingestion_sim.py) → Queue-based simulation of real-time streaming (Uses python Built-in Queue.queue).
Feature Store (feature_store.py) → Redis in-memory store with Postgres fallback.
Explainability (explainability.py) → SHAP (global/local explanations), DiCE (counterfactuals), and human-readable summaries.
Decision Engine (decision_engine.py) → Rule-based actions combined with ML scores and explanations, supporting multi-row batch mode.
API Serving (api_serve.py) → FastAPI backend for real-time/batch scoring with explanations.
UI Dashboard (ui_dashboard.py) → Streamlit interface for analysts to view transaction scores, SHAP plots, counterfactuals, and timelines, plus a feedback loop.
Monitoring (monitoring.py) → Feature & explanation drift detection using KS tests.
Bug Fixes → Fixed 32-feature validation mismatch, resolved SHAP TypeError by using Explanation objects.
Multi-Row Support → Updated modules to process multiple transactions at once for scalability.

How Has This Been Tested?

Trained and validated the XGBoost model on the Kaggle Credit Card Fraud Dataset.
Verified feature schema integrity (strict 32-feature validation).
Tested ingestion → feature store → scoring → explanations pipeline end-to-end.
Checked SHAP plots, counterfactual explanations, and dashboard outputs.
Ran batch scoring on a sample multi-row CSV generated via create_sample_csv.py.
API tested using uvicorn (real-time and batch mode).
Streamlit dashboard tested with interactive file uploads and analyst feedback.
Verified Redis integration and Postgres fallback.
Validated monitoring with simulated drift scenarios.

Checklist:

My code follows the guidelines of this project.
I have performed a self-review of my own code.
I have commented my code, particularly wherever it was hard to understand.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added things that prove my fix is effective or that my feature works.

Summary by CodeRabbit

New Features
- Batch fraud scoring API with per-transaction explanations and recommended actions.
- Decision engine applying tiered policies (auto-block, escalate, soft-action, auto-clear).
- Analyst dashboard for scoring, visual explanations, counterfactuals, feedback, and optional API testing.
- Redis-backed feature store for storing and retrieving user features.
- Ingestion simulator mimicking streaming transactions.
- Data drift monitoring with per-feature alerts and logfile output.
- End-to-end data prep and model training scripts, including evaluation metrics and model export.
- Explainability utilities producing plots and concise contributor summaries.
Chores
- Added dependencies for data processing, modeling, explainability, APIs, storage, and visualization.

….py to Explainable-AI/Fraud Detection using Explainable AI

…able-AI/Fraud Detection using ExplainableAI/api_serve.py

github-actions · 2025-10-04T17:00:21Z

✅ PR validation passed! Syncing labels and assignees from the linked issue...

github-actions · 2025-10-04T17:00:24Z

👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!

Feel free to join our community on Discord to discuss more!

github-actions · 2025-10-04T17:01:31Z

✅ PR validation passed! Syncing labels and assignees from the linked issue...

coderabbitai · 2025-10-04T17:07:11Z

Walkthrough

Adds a complete fraud-detection prototype: data prep and model training, explainability workflows, a Streamlit analyst UI, a FastAPI batch scoring API with SHAP-based explanations, a decision engine, Redis-backed feature store, ingestion simulation, and KS-based monitoring. Also introduces a requirements file.

Changes

Cohort / File(s)	Summary
API Serving (Batch Scoring + SHAP) `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/api_serve.py`	New FastAPI app loading model and SHAP explainer; POST /score_batch validates 32-feature rows, returns per-txn score, top-3 contributor summary, and action. Exposes `TransactionBatch`, `app`, `model`, `explainer`, `feature_names`.
Decision Engine `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/decision_engine.py`	New module computing scores and SHAP importances, applying multi-threshold action policy; provides `decide_actions(txn_df)` and example usage.
Feature Store (Redis) `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py`	Adds Redis client `r`, feature `store_features`/`get_features`, connection checks, and optional sample seeding from creditcard.csv with error handling.
Ingestion Simulation `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ingestion_sim.py`	Threaded producer-consumer over an in-memory queue; reads first 100 CSV rows, streams JSON messages, processes until sentinel. Provides `producer`, `consumer`, `dummy_process`.
Monitoring (Drift Checks) `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitoring.py`	Loads X_train/X_test; per-feature KS test with logging to monitor.log; prints completion note; includes SHAP-drift comment.
Data Prep and Training `Explainable-AI/Fraud Detection using ExplainableAI/data_prep.py`, `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py`	Data pipeline with feature engineering, imbalance handling, scaling, time-based split; saves CSVs. Training script fits XGBoost, evaluates PR-AUC/ROC-AUC, saves model via joblib.
Explainability and UI `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/explainability.py`, `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py`	SHAP TreeExplainer, waterfall plots, DiCE counterfactuals, and summary helper; Streamlit dashboard for scoring, explanations, simple counterfactuals, timeline, feedback, and optional API calls.
Dependencies `Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/requirements.txt`	Adds libraries for data, modeling, explainability, storage, API, visualization, and utilities.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant A as FastAPI /score_batch
  participant M as Model
  participant S as SHAP Explainer

  C->>A: POST /score_batch { transactions: [[...32 feats...], ...] }
  A->>A: Validate feature count (32 per row)
  A->>M: predict_proba(batch)
  M-->>A: scores (P(fraud))
  A->>S: shap_values(batch)
  S-->>A: per-txn attributions
  A->>A: Build top-3 contributor summaries + actions
  A-->>C: [{txn_id, score, explanation, action}, ...]

sequenceDiagram
  autonumber
  participant P as Producer
  participant Q as In-Memory Queue
  participant C as Consumer
  participant F as process_func

  P->>Q: Enqueue txn JSON (loop over CSV rows)
  P->>Q: Enqueue None (end)
  loop until None
    C->>Q: Dequeue item
    alt item is JSON
      C->>F: process_func(txn_dict)
    else item is None
      C-->>C: Exit loop
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Suggested labels

Advanced

Suggested reviewers

TheChaoticor
UTSAVS26

Poem

A hop, a skip, I sniff the wire—
Scores and SHAPs light up the spire.
Queues go squeak, the Redis hums,
Drift checks tap on kettle drums.
With carrots cached and features neat,
I thump “All clear!”—or flag the cheat. 🥕🕵️‍♂️

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The title includes an issue reference and “Completed:” prefix and does not succinctly describe the implemented end-to-end fraud detection system or its key explainability features, making it verbose and noisy rather than a clear summary of the main change.	Rename the pull request to a concise, descriptive summary such as “Implement end-to-end fraud detection system with explainable AI” to clearly convey the core functionality added.
Linked Issues Check	⚠️ Warning	The changes deliver many core components—data preparation, model training, ingestion simulation, feature store, decision engine, API serving, explainability, UI, and monitoring—but the API and modules omit several per-transaction outputs specified in issue #1796 (baseline contributions, full top-5 attributions, counterfactual suggestions, transaction timelines, and provenance metadata) and do not address all required fields such as MCC, country, device/IP, and customer metadata.	Extend the API and supporting modules to include baseline scores, top-5 feature contributions, counterfactual suggestions, transaction timelines, and provenance (model/explainer versions), and update the data pipeline or documentation to meet the specified transaction-level and customer metadata requirements.
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Out of Scope Changes Check	✅ Passed	All added and modified modules—including ingestion simulation, feature store, data preparation, modeling, explainability, serving, UI, and monitoring—directly align with the objectives defined in issue #1796 and no unrelated code changes are present.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-10-04T17:09:27Z

✅ PR validation passed! Syncing labels and assignees from the linked issue...

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (1)

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py (1)
135-140: Add a timeout to the API call.

requests.post on Line 137 has no timeout, so a stalled API leaves the Streamlit session hanging forever. Please specify a sensible timeout (and catch requests.Timeout) so the UI remains responsive.
-        response = requests.post("http://127.0.0.1:8000/score_batch",
-                                json={"transactions": txn_df.values.tolist()})
+        response = requests.post(
+            "http://127.0.0.1:8000/score_batch",
+            json={"transactions": txn_df.values.tolist()},
+            timeout=5,
+        )
Also consider handling requests.Timeout separately to surface a clearer message.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8abd0a0 and 0623ab8.

⛔ Files ignored due to path filters (3)

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feedback.log is excluded by !**/*.log
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/fraud_model.pkl is excluded by !**/*.pkl
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitor.log is excluded by !**/*.log

📒 Files selected for processing (10)

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/api_serve.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/decision_engine.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ingestion_sim.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitoring.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/requirements.txt (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/data_prep.py (1 hunks)
Explainable-AI/Fraud Detection using ExplainableAI/explainability.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py (1)

Machine_Learning/Bitcoin_Price_Prediction/app/routes.py (1)

model (61-112)

🪛 Ruff (0.13.3)

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py

117-117: Consider [*list(txn_df['Amount']), 50, 100] instead of concatenation

Replace with [*list(txn_df['Amount']), 50, 100]

(RUF005)

118-118: Consider [*list(scores), 0.1, 0.05] instead of concatenation

Replace with [*list(scores), 0.1, 0.05]

(RUF005)

137-137: Probable use of requests call without timeout

(S113)

140-140: Do not catch blind exception: Exception

(BLE001)

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py

48-48: Do not catch blind exception: Exception

(BLE001)

Explainable-AI/Fraud Detection using ExplainableAI/data_prep.py

Explainable-AI/Fraud Detection using ExplainableAI/explainability.py

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/api_serve.py

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/decision_engine.py

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ingestion_sim.py

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py

Parthavi19 · 2025-10-07T05:32:16Z

@UTSAVS26 Please check the issue and merge the PR. Let me know if any changes need to be made

github-actions · 2025-11-06T05:40:56Z

This PR is stale because it has been open for 30 days with no activity. Remove the stale label or comment, or this will be closed in 5 days.

Parthavi19 added 5 commits October 3, 2025 21:05

Create api_serve.py

8c1f344

Rename Explainable AI/ Fraud Detection using Explainable AI/api_serve…

a88ad05

….py to Explainable-AI/Fraud Detection using Explainable AI

Rename Explainable-AI/Fraud Detection using Explainable AI to Explain…

9919827

…able-AI/Fraud Detection using ExplainableAI/api_serve.py

Add files via upload

787c17e

Delete Explainable-AI/Fraud Detection using ExplainableAI/api_serve.py

0623ab8

github-actions bot assigned Parthavi19 Oct 4, 2025

github-actions bot added Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. gssoc25 level3 Status: Review Ongoing 🔄 PR is currently under review and awaiting feedback from reviewers. labels Oct 4, 2025

github-actions bot requested review from TheChaoticor and UTSAVS26 October 4, 2025 17:00

Parthavi19 changed the title ~~Fixed: New Project Proposal On Fraud detection using Explainable AI #1796~~ Completed: New Project Proposal On Fraud detection using Explainable AI #1796 Oct 4, 2025

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

github-actions bot added the stale label Nov 6, 2025

Completed: New Project Proposal On Fraud detection using Explainable AI #1796 #1808

Are you sure you want to change the base?

Completed: New Project Proposal On Fraud detection using Explainable AI #1796 #1808

Uh oh!

Conversation

Parthavi19 commented Oct 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

coderabbitai bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Parthavi19 commented Oct 7, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Parthavi19 commented Oct 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 4, 2025 •

edited

Loading