Skip to content

Conversation

@Parthavi19
Copy link

@Parthavi19 Parthavi19 commented Oct 4, 2025

Issue Title:
Fraud Detection System with Explainable AI (XAI) — End-to-End Implementation

What's the goal of the project?
The goal of this project is to build a near-real-time fraud detection system with Explainable AI (XAI) support. The system not only detects fraudulent transactions but also explains why a transaction was flagged, provides counterfactual scenarios, reduces false positives, and equips analysts with monitoring and triage tools.

Name
Parthavi Kurugundla
GSSOC’25 Contributor

GitHub ID
@Parthavi19

Email ID
[email protected]

Closes
Closes: #1796

I have implemented the complete fraud detection system as outlined in the proposal for issue #1796. Key contributions include:

Data Preparation (data_prep.py) → Feature engineering (velocity, aggregations), imbalance handling, scaling, and time-based splitting.
Model Training (train_model.py) → XGBoost classifier with PR-AUC & ROC-AUC evaluation.
Data Ingestion (ingestion_sim.py) → Queue-based simulation of real-time streaming (Uses python Built-in Queue.queue).
Feature Store (feature_store.py) → Redis in-memory store with Postgres fallback.
Explainability (explainability.py) → SHAP (global/local explanations), DiCE (counterfactuals), and human-readable summaries.
Decision Engine (decision_engine.py) → Rule-based actions combined with ML scores and explanations, supporting multi-row batch mode.
API Serving (api_serve.py) → FastAPI backend for real-time/batch scoring with explanations.
UI Dashboard (ui_dashboard.py) → Streamlit interface for analysts to view transaction scores, SHAP plots, counterfactuals, and timelines, plus a feedback loop.
Monitoring (monitoring.py) → Feature & explanation drift detection using KS tests.
Bug Fixes → Fixed 32-feature validation mismatch, resolved SHAP TypeError by using Explanation objects.
Multi-Row Support → Updated modules to process multiple transactions at once for scalability.

How Has This Been Tested?

Trained and validated the XGBoost model on the Kaggle Credit Card Fraud Dataset.
Verified feature schema integrity (strict 32-feature validation).
Tested ingestion → feature store → scoring → explanations pipeline end-to-end.
Checked SHAP plots, counterfactual explanations, and dashboard outputs.
Ran batch scoring on a sample multi-row CSV generated via create_sample_csv.py.
API tested using uvicorn (real-time and batch mode).
Streamlit dashboard tested with interactive file uploads and analyst feedback.
Verified Redis integration and Postgres fallback.
Validated monitoring with simulated drift scenarios.

Checklist:

My code follows the guidelines of this project.
I have performed a self-review of my own code.
I have commented my code, particularly wherever it was hard to understand.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added things that prove my fix is effective or that my feature works.

Summary by CodeRabbit

  • New Features
    • Batch fraud scoring API with per-transaction explanations and recommended actions.
    • Decision engine applying tiered policies (auto-block, escalate, soft-action, auto-clear).
    • Analyst dashboard for scoring, visual explanations, counterfactuals, feedback, and optional API testing.
    • Redis-backed feature store for storing and retrieving user features.
    • Ingestion simulator mimicking streaming transactions.
    • Data drift monitoring with per-feature alerts and logfile output.
    • End-to-end data prep and model training scripts, including evaluation metrics and model export.
    • Explainability utilities producing plots and concise contributor summaries.
  • Chores
    • Added dependencies for data processing, modeling, explainability, APIs, storage, and visualization.

@github-actions
Copy link

github-actions bot commented Oct 4, 2025

✅ PR validation passed! Syncing labels and assignees from the linked issue...

@github-actions github-actions bot added Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. gssoc25 level3 Status: Review Ongoing 🔄 PR is currently under review and awaiting feedback from reviewers. labels Oct 4, 2025
@github-actions
Copy link

github-actions bot commented Oct 4, 2025

👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!

Feel free to join our community on Discord to discuss more!

@Parthavi19 Parthavi19 changed the title Fixed: New Project Proposal On Fraud detection using Explainable AI #1796 Completed: New Project Proposal On Fraud detection using Explainable AI #1796 Oct 4, 2025
@github-actions
Copy link

github-actions bot commented Oct 4, 2025

✅ PR validation passed! Syncing labels and assignees from the linked issue...

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 4, 2025

Walkthrough

Adds a complete fraud-detection prototype: data prep and model training, explainability workflows, a Streamlit analyst UI, a FastAPI batch scoring API with SHAP-based explanations, a decision engine, Redis-backed feature store, ingestion simulation, and KS-based monitoring. Also introduces a requirements file.

Changes

Cohort / File(s) Summary
API Serving (Batch Scoring + SHAP)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/api_serve.py
New FastAPI app loading model and SHAP explainer; POST /score_batch validates 32-feature rows, returns per-txn score, top-3 contributor summary, and action. Exposes TransactionBatch, app, model, explainer, feature_names.
Decision Engine
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/decision_engine.py
New module computing scores and SHAP importances, applying multi-threshold action policy; provides decide_actions(txn_df) and example usage.
Feature Store (Redis)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py
Adds Redis client r, feature store_features/get_features, connection checks, and optional sample seeding from creditcard.csv with error handling.
Ingestion Simulation
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ingestion_sim.py
Threaded producer-consumer over an in-memory queue; reads first 100 CSV rows, streams JSON messages, processes until sentinel. Provides producer, consumer, dummy_process.
Monitoring (Drift Checks)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitoring.py
Loads X_train/X_test; per-feature KS test with logging to monitor.log; prints completion note; includes SHAP-drift comment.
Data Prep and Training
Explainable-AI/Fraud Detection using ExplainableAI/data_prep.py, Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py
Data pipeline with feature engineering, imbalance handling, scaling, time-based split; saves CSVs. Training script fits XGBoost, evaluates PR-AUC/ROC-AUC, saves model via joblib.
Explainability and UI
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/explainability.py, Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py
SHAP TreeExplainer, waterfall plots, DiCE counterfactuals, and summary helper; Streamlit dashboard for scoring, explanations, simple counterfactuals, timeline, feedback, and optional API calls.
Dependencies
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/requirements.txt
Adds libraries for data, modeling, explainability, storage, API, visualization, and utilities.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant A as FastAPI /score_batch
  participant M as Model
  participant S as SHAP Explainer

  C->>A: POST /score_batch { transactions: [[...32 feats...], ...] }
  A->>A: Validate feature count (32 per row)
  A->>M: predict_proba(batch)
  M-->>A: scores (P(fraud))
  A->>S: shap_values(batch)
  S-->>A: per-txn attributions
  A->>A: Build top-3 contributor summaries + actions
  A-->>C: [{txn_id, score, explanation, action}, ...]
Loading
sequenceDiagram
  autonumber
  participant P as Producer
  participant Q as In-Memory Queue
  participant C as Consumer
  participant F as process_func

  P->>Q: Enqueue txn JSON (loop over CSV rows)
  P->>Q: Enqueue None (end)
  loop until None
    C->>Q: Dequeue item
    alt item is JSON
      C->>F: process_func(txn_dict)
    else item is None
      C-->>C: Exit loop
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Suggested labels

Advanced

Suggested reviewers

  • TheChaoticor
  • UTSAVS26

Poem

A hop, a skip, I sniff the wire—
Scores and SHAPs light up the spire.
Queues go squeak, the Redis hums,
Drift checks tap on kettle drums.
With carrots cached and features neat,
I thump “All clear!”—or flag the cheat. 🥕🕵️‍♂️

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title includes an issue reference and “Completed:” prefix and does not succinctly describe the implemented end-to-end fraud detection system or its key explainability features, making it verbose and noisy rather than a clear summary of the main change. Rename the pull request to a concise, descriptive summary such as “Implement end-to-end fraud detection system with explainable AI” to clearly convey the core functionality added.
Linked Issues Check ⚠️ Warning The changes deliver many core components—data preparation, model training, ingestion simulation, feature store, decision engine, API serving, explainability, UI, and monitoring—but the API and modules omit several per-transaction outputs specified in issue #1796 (baseline contributions, full top-5 attributions, counterfactual suggestions, transaction timelines, and provenance metadata) and do not address all required fields such as MCC, country, device/IP, and customer metadata. Extend the API and supporting modules to include baseline scores, top-5 feature contributions, counterfactual suggestions, transaction timelines, and provenance (model/explainer versions), and update the data pipeline or documentation to meet the specified transaction-level and customer metadata requirements.
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Out of Scope Changes Check ✅ Passed All added and modified modules—including ingestion simulation, feature store, data preparation, modeling, explainability, serving, UI, and monitoring—directly align with the objectives defined in issue #1796 and no unrelated code changes are present.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Oct 4, 2025

✅ PR validation passed! Syncing labels and assignees from the linked issue...

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (1)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py (1)

135-140: Add a timeout to the API call.

requests.post on Line 137 has no timeout, so a stalled API leaves the Streamlit session hanging forever. Please specify a sensible timeout (and catch requests.Timeout) so the UI remains responsive.

-        response = requests.post("http://127.0.0.1:8000/score_batch",
-                                json={"transactions": txn_df.values.tolist()})
+        response = requests.post(
+            "http://127.0.0.1:8000/score_batch",
+            json={"transactions": txn_df.values.tolist()},
+            timeout=5,
+        )

Also consider handling requests.Timeout separately to surface a clearer message.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8abd0a0 and 0623ab8.

⛔ Files ignored due to path filters (3)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feedback.log is excluded by !**/*.log
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/fraud_model.pkl is excluded by !**/*.pkl
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitor.log is excluded by !**/*.log
📒 Files selected for processing (10)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/api_serve.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/decision_engine.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ingestion_sim.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitoring.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/requirements.txt (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/data_prep.py (1 hunks)
  • Explainable-AI/Fraud Detection using ExplainableAI/explainability.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py (1)
Machine_Learning/Bitcoin_Price_Prediction/app/routes.py (1)
  • model (61-112)
🪛 Ruff (0.13.3)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py

117-117: Consider [*list(txn_df['Amount']), 50, 100] instead of concatenation

Replace with [*list(txn_df['Amount']), 50, 100]

(RUF005)


118-118: Consider [*list(scores), 0.1, 0.05] instead of concatenation

Replace with [*list(scores), 0.1, 0.05]

(RUF005)


137-137: Probable use of requests call without timeout

(S113)


140-140: Do not catch blind exception: Exception

(BLE001)

Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py

48-48: Do not catch blind exception: Exception

(BLE001)

@Parthavi19
Copy link
Author

@UTSAVS26 Please check the issue and merge the PR. Let me know if any changes need to be made

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

This PR is stale because it has been open for 30 days with no activity. Remove the stale label or comment, or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. gssoc25 level3 stale Status: Review Ongoing 🔄 PR is currently under review and awaiting feedback from reviewers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Custom]: New Project Proposal On Fraud detection using Explainable AI

1 participant