Skip to content

DeepShah111/predictive-maintenance-engine

Repository files navigation

Predictive Maintenance Engine — Enterprise Edition

An end-to-end production ML pipeline that predicts industrial machine failures before they happen.
Optimizes for total business cost in dollars — not accuracy, not F1.


🚀 Live Demo

Try it now — no setup, no API key required:

👉 https://predictive-maintenance-deep-shah.streamlit.app/

The live application allows you to:

  • Adjust real-time sensor sliders and watch the failure probability gauge update instantly
  • Upload a CSV of machine readings and get a full fleet risk assessment in seconds
  • Explore the business dashboard — compare reactive vs preventive vs AI-driven maintenance costs
  • Drag the decision threshold slider and watch FP/FN counts and total cost update live

📸 Application Screenshots

Live Prediction — Safe Machine (H-Type, Fresh Tool)

Safe Machine Prediction

18.9% failure probability — SAFE. H-type machine with 25 minutes of tool wear, 2000 RPM, and 30 Nm torque. The gauge, risk badge, and cost analysis update in real time as sliders are adjusted. Cost if ignored: $1,891. Preventive maintenance: $500. Model recommendation: Save $1,391.


Cost Impact Analysis — Safe Machine

Cost Analysis Safe

Physics-derived features visible at the bottom — Temp Differential: 9.50 K (above the 8.6 K Heat Dissipation threshold), Mechanical Power: 60,842 W (within safe operating range), Force Ratio: 0.01509 (well below the 0.035 Overstrain threshold). All three engineered features confirm the machine is operating within healthy parameters.


Live Prediction — Critical Danger (L-Type, Multiple Failure Modes)

Critical Danger Prediction

81.5% failure probability — DANGER. L-type machine with Temp_Diff of 7.40 K (below the 8.6 K Heat Dissipation threshold), 1,208 RPM (low), and 30 Nm torque. DANGER badge fires immediately. The model identifies Heat Dissipation Failure (HDF) as the active failure mode — the thermal gradient has collapsed, signaling imminent thermal failure.


Cost Impact Analysis — Critical Machine

Cost Analysis Danger

Expected cost if ignored: $8,151. Preventive maintenance cost: $500. Model recommendation: Save $7,651 — act immediately. Physics features confirm the failure signal: Temp Differential at 7.40 K (below the 8.6 K threshold), Mechanical Power at 36,602 W. The 20:1 cost asymmetry ($10,000 failure vs $500 inspection) makes the maintenance decision unambiguous.


Batch Analysis — Fleet-Wide Risk Assessment

Batch Analysis Summary

12 machines analyzed in one CSV upload. Fleet summary: 2 CRITICAL (16.7%), 3 MONITOR, 7 SAFE — $39,965 total cost at risk. The failure probability distribution chart separates the healthy cluster (left, below MONITOR threshold) from the at-risk machines (right, past the DANGER threshold line). Maintenance teams get an immediate prioritized action list.


Batch Analysis — Risk-Ranked Machine Table

Batch Analysis Table

Machines sorted by failure probability descending. MACHINE-003 and MACHINE-004 flagged DANGER in red (81.2% and 80.9% — both L-type with tool wear 240+ minutes). Three MONITOR machines follow in orange. Color-coded Risk_Level column and Expected_Cost_$ give maintenance teams an immediate dollar-ranked action list. Full results downloadable as CSV.


Business Dashboard — Annual Cost Comparison

Business Dashboard

1,000-machine fleet simulation: Reactive maintenance costs $340,000/year. Full preventive costs $500,000/year. This model costs $79,000/year — catching 32 of 34 failures (94% recall). Savings vs reactive: $261,000 (76.8%). Savings vs full preventive: $421,000 (84.2%). Fleet size, failure rate, and cost parameters are all adjustable via the Adjust Assumptions panel.


Model Leaderboard — 9-Model Benchmark

Model Leaderboard

LightGBM selected as champion via 5-fold cross-validated F1 mean (0.7857) — not by test-set score. CatBoost ranks second with lower CV std (0.051 vs 0.063), indicating more stable folds. Champion selection by CV score prevents the model selection bias that occurs when the test set is used to pick between models. All 9 models benchmarked under identical CV conditions.


Table of Contents

  1. The Business Problem
  2. What Makes This Different
  3. System Architecture
  4. Technical Decisions & Rationale
  5. Results
  6. Business Impact
  7. Repository Structure
  8. Quickstart
  9. Streamlit App
  10. FastAPI — REST Endpoints
  11. Docker Deployment
  12. Drift Detection & Monitoring
  13. Running Tests
  14. Dataset

1. The Business Problem

Every hour of unplanned downtime in heavy manufacturing costs between $10,000 and $250,000 depending on the industry. Yet the two standard maintenance strategies are both fundamentally broken:

Strategy What Goes Wrong Hidden Cost
Reactive Wait for failure, then fix it Emergency repair + full production halt
Preventive (fixed schedule) Service everything on a calendar Replacing healthy components, unnecessary labor

Predictive maintenance is the only strategy that is neither wasteful nor dangerous. It uses real-time sensor data to generate a maintenance alert only when a specific machine is genuinely showing signs of imminent failure — catching the failure before it happens, touching nothing that doesn't need attention.

This project builds a full production-structured ML pipeline on the AI4I 2020 Predictive Maintenance Dataset (UCI / Kaggle) — a realistic simulation of CNC machine sensor telemetry across 10,000 operating cycles with a 97:3 healthy-to-failure class ratio.


2. What Makes This Different

The majority of ML classification projects optimize for accuracy. Accuracy is the wrong metric for this problem. On a factory floor, errors are not symmetric:

  • A missed failure (False Negative) = unplanned downtime, possible safety incident → $10,000
  • A false alarm (False Positive) = a technician dispatched unnecessarily → $500

That is a 20:1 cost asymmetry. Every decision in this pipeline flows from that single insight.

Side-by-side comparison

What a standard ML project does What this pipeline does
Optimize accuracy or generic F1 Optimize total dollar cost: (FP × $500) + (FN × $10,000)
Single train/test split 3-way stratified split — train (60%) / val (20%) / test (20%)
Decision threshold fixed at 0.5 Threshold searched on validation set, reported on test set
GridSearchCV on F1 GridSearchCV on a custom business-cost scorer
SMOTE applied to the full dataset SMOTE inside CV folds only — no synthetic leakage
Pick champion by test-set F1 Pick champion by 5-fold cross-validated F1 mean
No unit tests 14 pytest unit tests covering all core functions
Notebook only Streamlit app + FastAPI + Docker + drift monitoring

3. System Architecture

3.1 ML Training Pipeline

Raw CSV (Google Drive / local cache)
        │
        ▼
┌─────────────────────────────────────┐
│         data_ingestion.py           │
│  Download → Schema validation       │
│  Deduplication → Null audit         │
│  Target column sanity check         │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│       feature_engineering.py        │
│  Physics feature creation           │
│  Drop leakage columns               │
│  3-way stratified split (60/20/20)  │
└───────┬─────────────┬───────────────┘
        │             │
   X_train        X_val, X_test
   y_train        y_val, y_test
        │             │
        ▼             │
┌─────────────────────────────────────┐
│           modeling.py               │
│  9-model zoo benchmarked via        │
│  5-fold StratifiedKFold CV          │
│                                     │
│  Each fold pipeline:                │
│    preprocessor (fit on fold only)  │
│    → SMOTE (train fold only)        │
│    → classifier                     │
│                                     │
│  Champion = highest CV_F1_Mean      │
│  GridSearchCV on business-cost      │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│          evaluation.py              │
│  optimize_threshold(X_val, y_val)   │  ← val set ONLY
│  Final report on (X_test, y_test)   │  ← test set, first touch here
│  Confusion matrix · ROC · Features  │
│  Save model → artifacts/models/     │
└─────────────────────────────────────┘

3.2 Full Production Stack

┌──────────────────────────────────────────────────────────────────┐
│                     PRODUCTION SYSTEM                            │
│                                                                  │
│  ┌─────────────────────┐     ┌──────────────────────────────┐   │
│  │   streamlit_app.py  │     │      api/main.py (FastAPI)   │   │
│  │                     │     │                              │   │
│  │  Tab 1: Live Pred.  │     │  POST /predict               │   │
│  │  Tab 2: Batch       │     │  POST /predict-batch         │   │
│  │  Tab 3: Dashboard   │     │  GET  /health                │   │
│  └──────────┬──────────┘     └──────────────┬───────────────┘   │
│             └─────────────┬─────────────────┘                   │
│                           ▼                                      │
│              ┌────────────────────────┐                         │
│              │  lightgbm_champion.pkl │                         │
│              └────────────────────────┘                         │
│                           ▼                                      │
│              ┌────────────────────────┐                         │
│              │     monitoring.py      │                         │
│              │  KS drift detection    │                         │
│              │  → drift_alerts.csv    │                         │
│              └────────────────────────┘                         │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │            Docker / docker-compose (port 8000)            │  │
│  └───────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

4. Technical Decisions & Rationale

4.1 Physics-Based Feature Engineering

Three features were engineered from first principles of thermodynamics and rotational mechanics rather than feeding raw sensor readings directly into the model.

Feature Formula Physical Interpretation
Temp_Diff Process Temp − Air Temp Thermal gradient: a rising value signals heat retention preceding thermal failure
Power Torque [Nm] × RPM Mechanical power input to spindle: sustained peaks accelerate tool wear
Force_Ratio Torque / (RPM + ε) Load per revolution: high ratio at low speed indicates heavy cutting conditions

The ε = 1e-5 guard in Force_Ratio prevents division-by-zero. The feature importance chart confirms Power ranks 2nd and Temp_Diff 3rd — above every raw sensor reading. Domain-driven features outperformed raw sensor data.

4.2 Class Imbalance — SMOTE in the Right Place

The dataset is 96.6% healthy machines and 3.4% failures. Three decisions handle this correctly:

Stratified splits preserve the 3.4% failure rate across all three subsets. SMOTE inside CV folds via imblearn.Pipeline ensures synthetic minority samples are generated from training data only — the common mistake of applying SMOTE before CV inflates CV metrics by leaking synthetic copies of validation samples into training folds. Business-cost scorer explicitly encodes the 20:1 class cost asymmetry into hyperparameter search.

4.3 Why Three Splits (Train / Val / Test)?

If the decision threshold were optimised on the test set and then reported on the same set, the reported cost would be the minimum achievable on that specific sample — overly optimistic and non-generalising. The validation set is used exclusively for threshold search. The test set is touched exactly once — in evaluation.py — for the final unbiased report.

4.4 Champion Selection by CV F1, Not Test F1

Selecting the champion model by test-set score is model selection bias. Once you use the test set to make a decision, it is no longer a clean estimate of generalisation. All 9 models are ranked by 5-fold cross-validated F1 mean. The test set is only used for the final report after both champion and threshold are locked in.

4.5 Hyperparameter Tuning Objective

GridSearchCV minimizes (FP × $500) + (FN × $10,000) via a custom make_scorer with greater_is_better=False. The tuner directly searches for the configuration that saves the most money — not the one that maximises an abstract metric.

4.6 OrdinalEncoder for Machine Type

Type encodes a genuine quality tier: L (Low) < M (Medium) < H (High). OrdinalEncoder with categories=[['L', 'M', 'H']] preserves this ordering as integers (0, 1, 2). OneHotEncoder would discard the ordinal structure. The handle_unknown='use_encoded_value', unknown_value=-1 guard ensures the pipeline never crashes on unseen categories at inference time.


5. Results

5.1 Model Leaderboard — 5-Fold Stratified CV

Rank Model CV F1 Mean CV F1 Std CV AUC Test F1 Test AUC
🥇 LightGBM 0.7857 0.0626 0.9707 0.7808 0.9847
🥈 CatBoost 0.7758 0.0512 0.9709 0.7200 0.9782
🥉 XGBoost 0.7543 0.0615 0.9638 0.7125 0.9799
4 Random Forest 0.7346 0.0522 0.9698 0.7355 0.9727
5 Gradient Boosting 0.6227 0.0217 0.9726 0.5957 0.9794
6 Decision Tree 0.5953 0.0370 0.8653 0.6067 0.8826
7 SVC 0.4972 0.0263 0.9621 0.4917 0.9731
8 Logistic Regression 0.2857 0.0147 0.9191 0.3021 0.9316
9 Gaussian NB 0.2654 0.0200 0.9075 0.2821 0.9038

LightGBM vs CatBoost: LightGBM wins on CV F1 mean (0.786 vs 0.776). CatBoost has lower CV std (0.051 vs 0.063) — more stable across folds. In production, an ensemble of both would be the natural next step.

5.2 Champion: LightGBM — Final Test-Set Report

Threshold optimized on validation set: 0.32

              precision    recall  f1-score   support

           0     0.9977    0.9063    0.9498      1932
           1     0.2612    0.9412    0.4089        68

    accuracy                         0.9075      2000
   macro avg     0.6295    0.9238    0.6794      2000
weighted avg     0.9652    0.9075    0.9320      2000

The model catches 64 of 68 actual failures (94.1% recall). 4 failures missed. 181 false alarms — a deliberate trade-off given a missed failure costs 20× more than a false alarm.

5.3 Diagnostic Plots

Confusion Matrix

Confusion Matrix

64 failures correctly flagged. 4 missed at $10,000 each ($40,000). 181 false alarms at $500 each ($90,500). Total projected test-set cost: $130,500.


ROC Curve

ROC Curve

AUC = 0.9847. The curve immediately reaches ~80% True Positive Rate at near-zero False Positive Rate.


Feature Importance

Feature Importance

Tool wear [min] ranks first. Power and Temp_Diff — both engineered features — rank 2nd and 3rd, above every raw sensor reading. Domain engineering validated.


6. Business Impact

Cost breakdown on the test set (2,000 machine cycles)

Outcome Count Unit Cost Total
False Negatives — missed failures 4 $10,000 $40,000
False Positives — unnecessary inspections 181 $500 $90,500
Total projected cost $130,500

Comparison against standard maintenance strategies (1,000-machine fleet)

Strategy Failures Caught Annual Cost Saving vs Reactive
Reactive — wait for breakdown 0% $340,000
Preventive — fixed schedule 100% $500,000 −$160,000
This Model — LightGBM, threshold 0.32 94% $79,000 $261,000 (76.8%)

7. Repository Structure

predictive-maintenance-engine/
│
├── assets/
│   └── screenshots/
│       ├── 01_live_prediction_safe.png
│       ├── 02_cost_analysis_safe.png
│       ├── 03_live_prediction_danger.png
│       ├── 04_cost_analysis_danger.png
│       ├── 05_batch_analysis_summary.png
│       ├── 06_batch_analysis_table.png
│       ├── 07_business_dashboard.png
│       └── 08_model_leaderboard.png
│
├── artifacts/                         # Auto-generated — gitignored
│   ├── graphs/
│   │   ├── confusion_matrix.png
│   │   ├── roc_curve.png
│   │   └── feature_importance.png
│   └── model_leaderboard.csv
│
├── api/
│   ├── __init__.py
│   └── main.py                        # /predict, /predict-batch, /health
│
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── data_ingestion.py
│   ├── feature_engineering.py
│   ├── modeling.py
│   └── evaluation.py
│
├── tests/
│   └── test_pipeline.py               # 14 pytest unit tests
│
├── main_execution.ipynb               # Training pipeline (Colab)
├── run_pipeline.py                    # Training pipeline (local)
├── streamlit_app.py                   # Streamlit dashboard
├── monitoring.py                      # KS drift detection
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

8. Quickstart

Option A — Live App (No Installation)

Visit https://predictive-maintenance-deep-shah.streamlit.app/ directly in your browser.


Option B — Google Colab (Recommended for Training)

1. Upload the project to Google Drive:

MyDrive/
└── predictive-maintenance-engine/
    ├── src/
    ├── api/
    ├── tests/
    ├── streamlit_app.py
    ├── monitoring.py
    └── requirements.txt

2. Open main_execution.ipynb in Google Colab and run all cells.

The pipeline mounts Drive, downloads the dataset automatically via gdown, trains all 9 models, tunes the champion, and saves every artifact back to Drive.


Option C — Local

git clone https://github.com/DeepShah111/predictive-maintenance-engine.git
cd predictive-maintenance-engine

python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Mac/Linux

pip install -r requirements.txt
python run_pipeline.py

9. Streamlit App

streamlit run streamlit_app.py
# → http://localhost:8501
Tab What it does
⚡ Live Prediction Sensor sliders → real-time failure probability gauge + risk level + cost impact
📂 Batch Analysis Upload CSV → ranked fleet risk table + distribution chart + downloadable results
📊 Business Dashboard Strategy cost comparison + live threshold slider with FP/FN/cost update

Live deployment: https://predictive-maintenance-deep-shah.streamlit.app/


10. FastAPI — REST Endpoints

uvicorn api.main:app --reload --port 8000
# → http://localhost:8000/docs
Method Endpoint Description
GET /health Model loaded status, threshold, version
POST /predict Single reading → probability + risk level + recommended action
POST /predict-batch List of readings → predictions + fleet summary

Example — Single Prediction:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "machine_type": "L",
    "air_temperature_K": 302.0,
    "process_temperature_K": 309.0,
    "rotational_speed_rpm": 1200,
    "torque_Nm": 65.0,
    "tool_wear_min": 240,
    "machine_id": "MACHINE-001"
  }'

Expected response:

{
  "machine_id": "MACHINE-001",
  "failure_probability": 0.812,
  "failure_probability_pct": 81.2,
  "risk_level": "DANGER",
  "recommended_action": "IMMEDIATE maintenance required. Take machine offline.",
  "expected_cost_if_ignored": 8120.0,
  "physics_features": {
    "Temp_Diff": 7.0,
    "Power": 72600.0,
    "Force_Ratio": 0.054167
  },
  "model_name": "Lightgbm",
  "threshold_used": 0.32
}

11. Docker Deployment

# Build and run
docker compose up --build
# → API available at http://localhost:8000

# Stop
docker compose down

The artifacts/ directory is mounted as a read-only volume so the container always uses the latest trained model without a rebuild.


12. Drift Detection & Monitoring

The monitoring.py module detects covariate shift between training and production data using the Kolmogorov-Smirnov test (α = 0.05).

from monitoring import DriftMonitor
import pandas as pd

monitor = DriftMonitor()
alerts = monitor.check_drift(pd.read_csv("new_readings.csv"), tag="production_batch_1")

if alerts:
    for a in alerts:
        print(f"DRIFT: {a['feature']} — shift {a['mean_shift_pct']:.1f}%")

CLI usage:

python monitoring.py --csv new_sensor_data.csv --tag production_jan_2025

All alerts logged to artifacts/drift_alerts.csv with timestamp, KS statistic, p-value, and mean shift percentage.


13. Running Tests

python -m pytest tests/ -v
collected 14 items

tests/test_pipeline.py::test_physics_features_columns_created             PASSED
tests/test_pipeline.py::test_physics_features_temp_diff_value             PASSED
tests/test_pipeline.py::test_physics_features_power_value                 PASSED
tests/test_pipeline.py::test_physics_features_no_infinities               PASSED
tests/test_pipeline.py::test_leakage_cols_dropped_after_split             PASSED
tests/test_pipeline.py::test_get_preprocessor_returns_column_transformer  PASSED
tests/test_pipeline.py::test_clean_data_removes_duplicates                PASSED
tests/test_pipeline.py::test_clean_data_index_is_contiguous               PASSED
tests/test_pipeline.py::test_build_features_and_split_returns_six_objects PASSED
tests/test_pipeline.py::test_build_features_and_split_sizes               PASSED
tests/test_pipeline.py::test_build_features_and_split_class_balance       PASSED
tests/test_pipeline.py::test_total_cost_metric_correct_value              PASSED
tests/test_pipeline.py::test_total_cost_metric_degenerate_returns_inf     PASSED
tests/test_pipeline.py::test_schema_validation_raises_on_missing_columns  PASSED

14 passed in ~18s

14. Dataset

AI4I 2020 Predictive Maintenance Dataset

Property Value
Source UCI ML Repository · Kaggle
Rows 10,000
Features used 11 (8 numerical + 1 categorical + 3 physics-derived)
Target Machine failure (binary: 0 = healthy, 1 = failure)
Class distribution 96.6% healthy / 3.4% failure
Leakage columns dropped UDI, Product ID, TWF, HDF, PWF, OSF, RNF

The leakage columns (TWF through RNF) are individual failure-mode sub-flags set to 1 only when Machine failure is also 1. Keeping them would let the model read the answer directly — they are dropped before any modelling step. The dataset downloads automatically on first run via gdown.


Built as a portfolio project demonstrating production ML engineering practices.
Structured for clarity, correctness, and interview-readiness.

🚀 Live Demo  |  📁 GitHub

About

Cost-optimized ML pipeline for industrial failure prediction. LightGBM: AUC 0.9847, 94.1% recall. Business-cost objective, leakage-proof 3-way split, 14 unit tests.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors