Skip to content

SaudSatopay/FraudShield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FraudShield — FinTech Fraud Detection System

End-to-end fraud detection pipeline with interactive dashboard. Built for the TECH SAGAR FinTech Fraud Detection Hackathon — YTIET, Bhivpuri (23-24 March 2026)

Live Demo

Deployed Link — Vercel

Problem Statement

Build an end-to-end fraud detection system that cleans messy real-world transaction data (100K+ records, inconsistent formats, duplicates, no explicit fraud labels), engineers behavioural features, trains an ML model, and provides explainable fraud predictions.

Architecture

participant_dataset.csv
        │
        ▼
┌─────────────────────────────┐
│  PYTHON ML PIPELINE          │
│                              │
│  Stage 1: Data Cleaning      │  → Parse 5 amount formats, 7 timestamp formats
│  Stage 2: EDA                │  → Statistical analysis, distributions
│  Stage 3: Feature Engineering│  → 12 behavioural features
│  Stage 4: Fraud Detection    │  → Isolation Forest + XGBoost + SMOTE
│  Stage 5: Explainability     │  → Feature importance, fraud patterns
│                              │
│  Output: dashboard.json      │
└──────────────┬───────────────┘
               │
               ▼
┌─────────────────────────────┐
│  REACT DASHBOARD             │
│                              │
│  • Overview metrics          │
│  • Data Quality Report       │
│  • EDA visualizations        │
│  • Feature Engineering docs  │
│  • Model Performance gauges  │
│  • Fraud Patterns discovery  │
│  • Transaction Explorer      │
│                              │
│  Deployed on Vercel          │
└─────────────────────────────┘

Tech Stack

Layer Technology
Pipeline Python 3.14, pandas, scikit-learn, XGBoost, imbalanced-learn
Frontend React 19, TypeScript, Vite 8, Tailwind CSS 4
Charts Recharts 3
Animation Framer Motion 12
Deploy Vercel (static site)

Zero API keys. Zero LLM calls. Pure ML.

Quick Start

1. Run the ML Pipeline

# Install Python dependencies
pip install pandas scikit-learn xgboost imbalanced-learn numpy

# Place dataset in pipeline/data/
cp participant_dataset.csv pipeline/data/participant_dataset.csv

# Run pipeline (outputs dashboard.json to frontend/public/data/)
python pipeline/fraud_pipeline.py

2. Start the Dashboard

cd frontend
npm install
npm run dev

Open http://localhost:5173

3. Build for Production

cd frontend
npm run build
# Deploy frontend/dist/ to Vercel

Pipeline Stages

Stage 1: Data Cleaning & Standardisation

  • Parses 5 amount formats (₹3,200 | 3200 INR | Rs 3200 | 3200.0 | null)
  • Unifies 7 timestamp formats (ISO 8601, Unix epoch, DD/MM/YYYY, textual, compact, DD-Mon-YYYY)
  • Normalizes 70+ city name variants to 14 canonical cities
  • Merges shadow amt column into transaction_amount
  • Removes duplicates (exact rows + duplicate transaction_ids)
  • Flags 107 invalid/malformed IP addresses

Stage 2: Exploratory Data Analysis

  • Transaction statistics (mean ₹16,348, median ₹3,262, std ₹141K)
  • Distributions by category, device type, payment method, location, hour of day
  • Top users by transaction volume and fraud rate

Stage 3: Feature Engineering (12 Features)

Feature Description
txn_velocity_1h Transactions by user in 1-hour window
amount_zscore Z-score vs user's historical mean
location_mismatch User city ≠ merchant city
new_device_flag First time user uses this device
hour_of_day Hour extracted from timestamp
amt_to_balance_ratio Amount / account balance
ip_is_invalid IP missing or structurally invalid
cross_user_device Device used by multiple users
weekend_flag Saturday or Sunday
category_risk_score Historical fraud rate for category
time_since_last_txn Seconds since user's previous transaction
txn_count_per_user Total transactions per user

Stage 4: Fraud Detection Model

  • Isolation Forest (unsupervised) for initial anomaly detection → 8% anomaly rate
  • XGBoost classifier trained on anomaly labels with SMOTE balancing
  • 80/20 stratified train-test split

Stage 5: Explainability & Insights

  • Feature importance ranking from XGBoost
  • 7 fraud patterns detected with confidence scores
  • Per-transaction fraud reasons (multi-signal reasoning)

Results (Sample Dataset — 1,447 rows)

Metric Value
Accuracy 93.0%
Precision 57.1%
Recall 52.2%
F1 Score 54.5%
AUC-ROC 95.3%
Fraud Detected 112 / 1,426 (7.9%)

Top Fraud Patterns Discovered

  1. New Device Fraud — 55.4% of fraud from first-time devices
  2. Velocity Attack — 49.1% of fraud involves rapid successive transactions
  3. Geographic Anomaly — 34.8% of fraud has location mismatch
  4. Late-Night Activity — 34.8% of fraud occurs 0-5 AM
  5. IP Address Anomaly — 33.0% of fraud has invalid IPs

Project Structure

CLAUDY_CODERS/
├── pipeline/
│   ├── fraud_pipeline.py        # Complete Python ML pipeline (5 stages)
│   ├── run_pipeline.mjs         # Node.js fallback pipeline
│   └── data/
│       └── participant_dataset.csv
├── frontend/
│   ├── src/
│   │   ├── App.tsx              # Main app with 7-tab navigation
│   │   ├── components/
│   │   │   ├── Overview.tsx         # Dashboard overview with key metrics
│   │   │   ├── DataQualityReport.tsx# Data quality issues resolved
│   │   │   ├── EDACharts.tsx        # 6 EDA visualizations
│   │   │   ├── FeatureEngineering.tsx# Feature importance ranking
│   │   │   ├── ModelPerformance.tsx  # Gauge charts + confusion matrix
│   │   │   ├── FraudPatterns.tsx     # Discovered fraud patterns
│   │   │   ├── TransactionTable.tsx  # Searchable transaction explorer
│   │   │   ├── MetricsCards.tsx      # KPI cards
│   │   │   └── Sidebar.tsx          # Navigation sidebar
│   │   ├── types/index.ts      # TypeScript interfaces
│   │   └── data/sampleData.ts  # Fallback demo data
│   └── public/data/
│       └── dashboard.json       # Generated by pipeline
├── docs/
│   ├── prompt_documentation.md  # AI usage transparency
│   └── technical_summary.md     # One-page technical summary
└── README.md

Team

Team Name: CLAUDY CODERS Domain: FinTech — Fraud Detection

License

MIT

About

FraudShield — FinTech Fraud Detection System End-to-end fraud detection pipeline with interactive dashboard. Built for the TECH SAGAR FinTech Fraud Detection Hackathon — YTIET, Bhivpuri (23-24 March 2026)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors