End-to-end fraud detection pipeline with interactive dashboard. Built for the TECH SAGAR FinTech Fraud Detection Hackathon — YTIET, Bhivpuri (23-24 March 2026)
Build an end-to-end fraud detection system that cleans messy real-world transaction data (100K+ records, inconsistent formats, duplicates, no explicit fraud labels), engineers behavioural features, trains an ML model, and provides explainable fraud predictions.
participant_dataset.csv
│
▼
┌─────────────────────────────┐
│ PYTHON ML PIPELINE │
│ │
│ Stage 1: Data Cleaning │ → Parse 5 amount formats, 7 timestamp formats
│ Stage 2: EDA │ → Statistical analysis, distributions
│ Stage 3: Feature Engineering│ → 12 behavioural features
│ Stage 4: Fraud Detection │ → Isolation Forest + XGBoost + SMOTE
│ Stage 5: Explainability │ → Feature importance, fraud patterns
│ │
│ Output: dashboard.json │
└──────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ REACT DASHBOARD │
│ │
│ • Overview metrics │
│ • Data Quality Report │
│ • EDA visualizations │
│ • Feature Engineering docs │
│ • Model Performance gauges │
│ • Fraud Patterns discovery │
│ • Transaction Explorer │
│ │
│ Deployed on Vercel │
└─────────────────────────────┘
| Layer | Technology |
|---|---|
| Pipeline | Python 3.14, pandas, scikit-learn, XGBoost, imbalanced-learn |
| Frontend | React 19, TypeScript, Vite 8, Tailwind CSS 4 |
| Charts | Recharts 3 |
| Animation | Framer Motion 12 |
| Deploy | Vercel (static site) |
Zero API keys. Zero LLM calls. Pure ML.
# Install Python dependencies
pip install pandas scikit-learn xgboost imbalanced-learn numpy
# Place dataset in pipeline/data/
cp participant_dataset.csv pipeline/data/participant_dataset.csv
# Run pipeline (outputs dashboard.json to frontend/public/data/)
python pipeline/fraud_pipeline.pycd frontend
npm install
npm run devcd frontend
npm run build
# Deploy frontend/dist/ to Vercel- Parses 5 amount formats (₹3,200 | 3200 INR | Rs 3200 | 3200.0 | null)
- Unifies 7 timestamp formats (ISO 8601, Unix epoch, DD/MM/YYYY, textual, compact, DD-Mon-YYYY)
- Normalizes 70+ city name variants to 14 canonical cities
- Merges shadow
amtcolumn intotransaction_amount - Removes duplicates (exact rows + duplicate transaction_ids)
- Flags 107 invalid/malformed IP addresses
- Transaction statistics (mean ₹16,348, median ₹3,262, std ₹141K)
- Distributions by category, device type, payment method, location, hour of day
- Top users by transaction volume and fraud rate
| Feature | Description |
|---|---|
| txn_velocity_1h | Transactions by user in 1-hour window |
| amount_zscore | Z-score vs user's historical mean |
| location_mismatch | User city ≠ merchant city |
| new_device_flag | First time user uses this device |
| hour_of_day | Hour extracted from timestamp |
| amt_to_balance_ratio | Amount / account balance |
| ip_is_invalid | IP missing or structurally invalid |
| cross_user_device | Device used by multiple users |
| weekend_flag | Saturday or Sunday |
| category_risk_score | Historical fraud rate for category |
| time_since_last_txn | Seconds since user's previous transaction |
| txn_count_per_user | Total transactions per user |
- Isolation Forest (unsupervised) for initial anomaly detection → 8% anomaly rate
- XGBoost classifier trained on anomaly labels with SMOTE balancing
- 80/20 stratified train-test split
- Feature importance ranking from XGBoost
- 7 fraud patterns detected with confidence scores
- Per-transaction fraud reasons (multi-signal reasoning)
| Metric | Value |
|---|---|
| Accuracy | 93.0% |
| Precision | 57.1% |
| Recall | 52.2% |
| F1 Score | 54.5% |
| AUC-ROC | 95.3% |
| Fraud Detected | 112 / 1,426 (7.9%) |
- New Device Fraud — 55.4% of fraud from first-time devices
- Velocity Attack — 49.1% of fraud involves rapid successive transactions
- Geographic Anomaly — 34.8% of fraud has location mismatch
- Late-Night Activity — 34.8% of fraud occurs 0-5 AM
- IP Address Anomaly — 33.0% of fraud has invalid IPs
CLAUDY_CODERS/
├── pipeline/
│ ├── fraud_pipeline.py # Complete Python ML pipeline (5 stages)
│ ├── run_pipeline.mjs # Node.js fallback pipeline
│ └── data/
│ └── participant_dataset.csv
├── frontend/
│ ├── src/
│ │ ├── App.tsx # Main app with 7-tab navigation
│ │ ├── components/
│ │ │ ├── Overview.tsx # Dashboard overview with key metrics
│ │ │ ├── DataQualityReport.tsx# Data quality issues resolved
│ │ │ ├── EDACharts.tsx # 6 EDA visualizations
│ │ │ ├── FeatureEngineering.tsx# Feature importance ranking
│ │ │ ├── ModelPerformance.tsx # Gauge charts + confusion matrix
│ │ │ ├── FraudPatterns.tsx # Discovered fraud patterns
│ │ │ ├── TransactionTable.tsx # Searchable transaction explorer
│ │ │ ├── MetricsCards.tsx # KPI cards
│ │ │ └── Sidebar.tsx # Navigation sidebar
│ │ ├── types/index.ts # TypeScript interfaces
│ │ └── data/sampleData.ts # Fallback demo data
│ └── public/data/
│ └── dashboard.json # Generated by pipeline
├── docs/
│ ├── prompt_documentation.md # AI usage transparency
│ └── technical_summary.md # One-page technical summary
└── README.md
Team Name: CLAUDY CODERS Domain: FinTech — Fraud Detection
MIT