-
Notifications
You must be signed in to change notification settings - Fork 324
Completed: New Project Proposal On Fraud detection using Explainable AI #1796 #1808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
….py to Explainable-AI/Fraud Detection using Explainable AI
…able-AI/Fraud Detection using ExplainableAI/api_serve.py
|
✅ PR validation passed! Syncing labels and assignees from the linked issue... |
|
👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly! Feel free to join our community on Discord to discuss more! |
|
✅ PR validation passed! Syncing labels and assignees from the linked issue... |
WalkthroughAdds a complete fraud-detection prototype: data prep and model training, explainability workflows, a Streamlit analyst UI, a FastAPI batch scoring API with SHAP-based explanations, a decision engine, Redis-backed feature store, ingestion simulation, and KS-based monitoring. Also introduces a requirements file. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant A as FastAPI /score_batch
participant M as Model
participant S as SHAP Explainer
C->>A: POST /score_batch { transactions: [[...32 feats...], ...] }
A->>A: Validate feature count (32 per row)
A->>M: predict_proba(batch)
M-->>A: scores (P(fraud))
A->>S: shap_values(batch)
S-->>A: per-txn attributions
A->>A: Build top-3 contributor summaries + actions
A-->>C: [{txn_id, score, explanation, action}, ...]
sequenceDiagram
autonumber
participant P as Producer
participant Q as In-Memory Queue
participant C as Consumer
participant F as process_func
P->>Q: Enqueue txn JSON (loop over CSV rows)
P->>Q: Enqueue None (end)
loop until None
C->>Q: Dequeue item
alt item is JSON
C->>F: process_func(txn_dict)
else item is None
C-->>C: Exit loop
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
✅ PR validation passed! Syncing labels and assignees from the linked issue... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (1)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py (1)
135-140: Add a timeout to the API call.
requests.poston Line 137 has no timeout, so a stalled API leaves the Streamlit session hanging forever. Please specify a sensible timeout (and catchrequests.Timeout) so the UI remains responsive.- response = requests.post("http://127.0.0.1:8000/score_batch", - json={"transactions": txn_df.values.tolist()}) + response = requests.post( + "http://127.0.0.1:8000/score_batch", + json={"transactions": txn_df.values.tolist()}, + timeout=5, + )Also consider handling
requests.Timeoutseparately to surface a clearer message.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feedback.logis excluded by!**/*.logExplainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/fraud_model.pklis excluded by!**/*.pklExplainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitor.logis excluded by!**/*.log
📒 Files selected for processing (10)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/api_serve.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/decision_engine.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ingestion_sim.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/monitoring.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/requirements.txt(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/data_prep.py(1 hunks)Explainable-AI/Fraud Detection using ExplainableAI/explainability.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py (1)
Machine_Learning/Bitcoin_Price_Prediction/app/routes.py (1)
model(61-112)
🪛 Ruff (0.13.3)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ui_dashboard.py
117-117: Consider [*list(txn_df['Amount']), 50, 100] instead of concatenation
Replace with [*list(txn_df['Amount']), 50, 100]
(RUF005)
118-118: Consider [*list(scores), 0.1, 0.05] instead of concatenation
Replace with [*list(scores), 0.1, 0.05]
(RUF005)
137-137: Probable use of requests call without timeout
(S113)
140-140: Do not catch blind exception: Exception
(BLE001)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py
48-48: Do not catch blind exception: Exception
(BLE001)
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/api_serve.py
Show resolved
Hide resolved
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/decision_engine.py
Show resolved
Hide resolved
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/feature_store.py
Show resolved
Hide resolved
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/ingestion_sim.py
Show resolved
Hide resolved
Explainable-AI/Fraud Detection using ExplainableAI/ExplainableAI/train_model.py
Show resolved
Hide resolved
|
@UTSAVS26 Please check the issue and merge the PR. Let me know if any changes need to be made |
|
This PR is stale because it has been open for 30 days with no activity. Remove the stale label or comment, or this will be closed in 5 days. |
Issue Title:
Fraud Detection System with Explainable AI (XAI) — End-to-End Implementation
What's the goal of the project?
The goal of this project is to build a near-real-time fraud detection system with Explainable AI (XAI) support. The system not only detects fraudulent transactions but also explains why a transaction was flagged, provides counterfactual scenarios, reduces false positives, and equips analysts with monitoring and triage tools.
Name
Parthavi Kurugundla
GSSOC’25 Contributor
GitHub ID
@Parthavi19
Email ID
[email protected]
Closes
Closes: #1796
I have implemented the complete fraud detection system as outlined in the proposal for issue #1796. Key contributions include:
Data Preparation (data_prep.py) → Feature engineering (velocity, aggregations), imbalance handling, scaling, and time-based splitting.
Model Training (train_model.py) → XGBoost classifier with PR-AUC & ROC-AUC evaluation.
Data Ingestion (ingestion_sim.py) → Queue-based simulation of real-time streaming (Uses python Built-in Queue.queue).
Feature Store (feature_store.py) → Redis in-memory store with Postgres fallback.
Explainability (explainability.py) → SHAP (global/local explanations), DiCE (counterfactuals), and human-readable summaries.
Decision Engine (decision_engine.py) → Rule-based actions combined with ML scores and explanations, supporting multi-row batch mode.
API Serving (api_serve.py) → FastAPI backend for real-time/batch scoring with explanations.
UI Dashboard (ui_dashboard.py) → Streamlit interface for analysts to view transaction scores, SHAP plots, counterfactuals, and timelines, plus a feedback loop.
Monitoring (monitoring.py) → Feature & explanation drift detection using KS tests.
Bug Fixes → Fixed 32-feature validation mismatch, resolved SHAP TypeError by using Explanation objects.
Multi-Row Support → Updated modules to process multiple transactions at once for scalability.
How Has This Been Tested?
Trained and validated the XGBoost model on the Kaggle Credit Card Fraud Dataset.
Verified feature schema integrity (strict 32-feature validation).
Tested ingestion → feature store → scoring → explanations pipeline end-to-end.
Checked SHAP plots, counterfactual explanations, and dashboard outputs.
Ran batch scoring on a sample multi-row CSV generated via create_sample_csv.py.
API tested using uvicorn (real-time and batch mode).
Streamlit dashboard tested with interactive file uploads and analyst feedback.
Verified Redis integration and Postgres fallback.
Validated monitoring with simulated drift scenarios.
Checklist:
My code follows the guidelines of this project.
I have performed a self-review of my own code.
I have commented my code, particularly wherever it was hard to understand.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added things that prove my fix is effective or that my feature works.
Summary by CodeRabbit