Predictive Maintenance Engine — Enterprise Edition

An end-to-end production ML pipeline that predicts industrial machine failures before they happen.
Optimizes for total business cost in dollars — not accuracy, not F1.

🚀 Live Demo

Try it now — no setup, no API key required:

👉 https://predictive-maintenance-deep-shah.streamlit.app/

The live application allows you to:

Adjust real-time sensor sliders and watch the failure probability gauge update instantly
Upload a CSV of machine readings and get a full fleet risk assessment in seconds
Explore the business dashboard — compare reactive vs preventive vs AI-driven maintenance costs
Drag the decision threshold slider and watch FP/FN counts and total cost update live

📸 Application Screenshots

Live Prediction — Safe Machine (H-Type, Fresh Tool)

18.9% failure probability — SAFE. H-type machine with 25 minutes of tool wear, 2000 RPM, and 30 Nm torque. The gauge, risk badge, and cost analysis update in real time as sliders are adjusted. Cost if ignored: $1,891. Preventive maintenance: $500. Model recommendation: Save $1,391.

Cost Impact Analysis — Safe Machine

Physics-derived features visible at the bottom — Temp Differential: 9.50 K (above the 8.6 K Heat Dissipation threshold), Mechanical Power: 60,842 W (within safe operating range), Force Ratio: 0.01509 (well below the 0.035 Overstrain threshold). All three engineered features confirm the machine is operating within healthy parameters.

Live Prediction — Critical Danger (L-Type, Multiple Failure Modes)

81.5% failure probability — DANGER. L-type machine with Temp_Diff of 7.40 K (below the 8.6 K Heat Dissipation threshold), 1,208 RPM (low), and 30 Nm torque. DANGER badge fires immediately. The model identifies Heat Dissipation Failure (HDF) as the active failure mode — the thermal gradient has collapsed, signaling imminent thermal failure.

Cost Impact Analysis — Critical Machine

Expected cost if ignored: $8,151. Preventive maintenance cost: $500. Model recommendation: Save $7,651 — act immediately. Physics features confirm the failure signal: Temp Differential at 7.40 K (below the 8.6 K threshold), Mechanical Power at 36,602 W. The 20:1 cost asymmetry ($10,000 failure vs $500 inspection) makes the maintenance decision unambiguous.

Batch Analysis — Fleet-Wide Risk Assessment

12 machines analyzed in one CSV upload. Fleet summary: 2 CRITICAL (16.7%), 3 MONITOR, 7 SAFE — $39,965 total cost at risk. The failure probability distribution chart separates the healthy cluster (left, below MONITOR threshold) from the at-risk machines (right, past the DANGER threshold line). Maintenance teams get an immediate prioritized action list.

Batch Analysis — Risk-Ranked Machine Table

Machines sorted by failure probability descending. MACHINE-003 and MACHINE-004 flagged DANGER in red (81.2% and 80.9% — both L-type with tool wear 240+ minutes). Three MONITOR machines follow in orange. Color-coded Risk_Level column and Expected_Cost_$ give maintenance teams an immediate dollar-ranked action list. Full results downloadable as CSV.

Business Dashboard — Annual Cost Comparison

1,000-machine fleet simulation: Reactive maintenance costs $340,000/year. Full preventive costs $500,000/year. This model costs $79,000/year — catching 32 of 34 failures (94% recall). Savings vs reactive: $261,000 (76.8%). Savings vs full preventive: $421,000 (84.2%). Fleet size, failure rate, and cost parameters are all adjustable via the Adjust Assumptions panel.

Model Leaderboard — 9-Model Benchmark

LightGBM selected as champion via 5-fold cross-validated F1 mean (0.7857) — not by test-set score. CatBoost ranks second with lower CV std (0.051 vs 0.063), indicating more stable folds. Champion selection by CV score prevents the model selection bias that occurs when the test set is used to pick between models. All 9 models benchmarked under identical CV conditions.

1. The Business Problem

Every hour of unplanned downtime in heavy manufacturing costs between $10,000 and $250,000 depending on the industry. Yet the two standard maintenance strategies are both fundamentally broken:

Strategy	What Goes Wrong	Hidden Cost
Reactive	Wait for failure, then fix it	Emergency repair + full production halt
Preventive (fixed schedule)	Service everything on a calendar	Replacing healthy components, unnecessary labor

Predictive maintenance is the only strategy that is neither wasteful nor dangerous. It uses real-time sensor data to generate a maintenance alert only when a specific machine is genuinely showing signs of imminent failure — catching the failure before it happens, touching nothing that doesn't need attention.

This project builds a full production-structured ML pipeline on the AI4I 2020 Predictive Maintenance Dataset (UCI / Kaggle) — a realistic simulation of CNC machine sensor telemetry across 10,000 operating cycles with a 97:3 healthy-to-failure class ratio.

2. What Makes This Different

The majority of ML classification projects optimize for accuracy. Accuracy is the wrong metric for this problem. On a factory floor, errors are not symmetric:

A missed failure (False Negative) = unplanned downtime, possible safety incident → $10,000
A false alarm (False Positive) = a technician dispatched unnecessarily → $500

That is a 20:1 cost asymmetry. Every decision in this pipeline flows from that single insight.

Side-by-side comparison

What a standard ML project does	What this pipeline does
Optimize accuracy or generic F1	Optimize total dollar cost: `(FP × $500) + (FN × $10,000)`
Single train/test split	3-way stratified split — train (60%) / val (20%) / test (20%)
Decision threshold fixed at 0.5	Threshold searched on validation set, reported on test set
`GridSearchCV` on F1	`GridSearchCV` on a custom business-cost scorer
SMOTE applied to the full dataset	SMOTE inside CV folds only — no synthetic leakage
Pick champion by test-set F1	Pick champion by 5-fold cross-validated F1 mean
No unit tests	14 pytest unit tests covering all core functions
Notebook only	Streamlit app + FastAPI + Docker + drift monitoring

3. System Architecture

3.1 ML Training Pipeline

Raw CSV (Google Drive / local cache)
        │
        ▼
┌─────────────────────────────────────┐
│         data_ingestion.py           │
│  Download → Schema validation       │
│  Deduplication → Null audit         │
│  Target column sanity check         │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│       feature_engineering.py        │
│  Physics feature creation           │
│  Drop leakage columns               │
│  3-way stratified split (60/20/20)  │
└───────┬─────────────┬───────────────┘
        │             │
   X_train        X_val, X_test
   y_train        y_val, y_test
        │             │
        ▼             │
┌─────────────────────────────────────┐
│           modeling.py               │
│  9-model zoo benchmarked via        │
│  5-fold StratifiedKFold CV          │
│                                     │
│  Each fold pipeline:                │
│    preprocessor (fit on fold only)  │
│    → SMOTE (train fold only)        │
│    → classifier                     │
│                                     │
│  Champion = highest CV_F1_Mean      │
│  GridSearchCV on business-cost      │
└──────────────────┬──────────────────┘
                   │
                   ▼
┌─────────────────────────────────────┐
│          evaluation.py              │
│  optimize_threshold(X_val, y_val)   │  ← val set ONLY
│  Final report on (X_test, y_test)   │  ← test set, first touch here
│  Confusion matrix · ROC · Features  │
│  Save model → artifacts/models/     │
└─────────────────────────────────────┘

3.2 Full Production Stack

┌──────────────────────────────────────────────────────────────────┐
│                     PRODUCTION SYSTEM                            │
│                                                                  │
│  ┌─────────────────────┐     ┌──────────────────────────────┐   │
│  │   streamlit_app.py  │     │      api/main.py (FastAPI)   │   │
│  │                     │     │                              │   │
│  │  Tab 1: Live Pred.  │     │  POST /predict               │   │
│  │  Tab 2: Batch       │     │  POST /predict-batch         │   │
│  │  Tab 3: Dashboard   │     │  GET  /health                │   │
│  └──────────┬──────────┘     └──────────────┬───────────────┘   │
│             └─────────────┬─────────────────┘                   │
│                           ▼                                      │
│              ┌────────────────────────┐                         │
│              │  lightgbm_champion.pkl │                         │
│              └────────────────────────┘                         │
│                           ▼                                      │
│              ┌────────────────────────┐                         │
│              │     monitoring.py      │                         │
│              │  KS drift detection    │                         │
│              │  → drift_alerts.csv    │                         │
│              └────────────────────────┘                         │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │            Docker / docker-compose (port 8000)            │  │
│  └───────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

4. Technical Decisions & Rationale

4.1 Physics-Based Feature Engineering

Three features were engineered from first principles of thermodynamics and rotational mechanics rather than feeding raw sensor readings directly into the model.

Feature	Formula	Physical Interpretation
`Temp_Diff`	Process Temp − Air Temp	Thermal gradient: a rising value signals heat retention preceding thermal failure
`Power`	Torque [Nm] × RPM	Mechanical power input to spindle: sustained peaks accelerate tool wear
`Force_Ratio`	Torque / (RPM + ε)	Load per revolution: high ratio at low speed indicates heavy cutting conditions

The ε = 1e-5 guard in Force_Ratio prevents division-by-zero. The feature importance chart confirms Power ranks 2nd and Temp_Diff 3rd — above every raw sensor reading. Domain-driven features outperformed raw sensor data.

4.2 Class Imbalance — SMOTE in the Right Place

The dataset is 96.6% healthy machines and 3.4% failures. Three decisions handle this correctly:

Stratified splits preserve the 3.4% failure rate across all three subsets. SMOTE inside CV folds via imblearn.Pipeline ensures synthetic minority samples are generated from training data only — the common mistake of applying SMOTE before CV inflates CV metrics by leaking synthetic copies of validation samples into training folds. Business-cost scorer explicitly encodes the 20:1 class cost asymmetry into hyperparameter search.

4.3 Why Three Splits (Train / Val / Test)?

If the decision threshold were optimised on the test set and then reported on the same set, the reported cost would be the minimum achievable on that specific sample — overly optimistic and non-generalising. The validation set is used exclusively for threshold search. The test set is touched exactly once — in evaluation.py — for the final unbiased report.

4.4 Champion Selection by CV F1, Not Test F1

Selecting the champion model by test-set score is model selection bias. Once you use the test set to make a decision, it is no longer a clean estimate of generalisation. All 9 models are ranked by 5-fold cross-validated F1 mean. The test set is only used for the final report after both champion and threshold are locked in.

4.5 Hyperparameter Tuning Objective

GridSearchCV minimizes (FP × $500) + (FN × $10,000) via a custom make_scorer with greater_is_better=False. The tuner directly searches for the configuration that saves the most money — not the one that maximises an abstract metric.

4.6 OrdinalEncoder for Machine Type

Type encodes a genuine quality tier: L (Low) < M (Medium) < H (High). OrdinalEncoder with categories=[['L', 'M', 'H']] preserves this ordering as integers (0, 1, 2). OneHotEncoder would discard the ordinal structure. The handle_unknown='use_encoded_value', unknown_value=-1 guard ensures the pipeline never crashes on unseen categories at inference time.

5. Results

5.1 Model Leaderboard — 5-Fold Stratified CV

Rank	Model	CV F1 Mean	CV F1 Std	CV AUC	Test F1	Test AUC
🥇	LightGBM	0.7857	0.0626	0.9707	0.7808	0.9847
🥈	CatBoost	0.7758	0.0512	0.9709	0.7200	0.9782
🥉	XGBoost	0.7543	0.0615	0.9638	0.7125	0.9799
4	Random Forest	0.7346	0.0522	0.9698	0.7355	0.9727
5	Gradient Boosting	0.6227	0.0217	0.9726	0.5957	0.9794
6	Decision Tree	0.5953	0.0370	0.8653	0.6067	0.8826
7	SVC	0.4972	0.0263	0.9621	0.4917	0.9731
8	Logistic Regression	0.2857	0.0147	0.9191	0.3021	0.9316
9	Gaussian NB	0.2654	0.0200	0.9075	0.2821	0.9038

LightGBM vs CatBoost: LightGBM wins on CV F1 mean (0.786 vs 0.776). CatBoost has lower CV std (0.051 vs 0.063) — more stable across folds. In production, an ensemble of both would be the natural next step.

5.2 Champion: LightGBM — Final Test-Set Report

Threshold optimized on validation set: 0.32

              precision    recall  f1-score   support

           0     0.9977    0.9063    0.9498      1932
           1     0.2612    0.9412    0.4089        68

    accuracy                         0.9075      2000
   macro avg     0.6295    0.9238    0.6794      2000
weighted avg     0.9652    0.9075    0.9320      2000

The model catches 64 of 68 actual failures (94.1% recall). 4 failures missed. 181 false alarms — a deliberate trade-off given a missed failure costs 20× more than a false alarm.

5.3 Diagnostic Plots

Confusion Matrix

64 failures correctly flagged. 4 missed at $10,000 each ($40,000). 181 false alarms at $500 each ($90,500). Total projected test-set cost: $130,500.

ROC Curve

AUC = 0.9847. The curve immediately reaches ~80% True Positive Rate at near-zero False Positive Rate.

Feature Importance

Tool wear [min] ranks first. Power and Temp_Diff — both engineered features — rank 2nd and 3rd, above every raw sensor reading. Domain engineering validated.

6. Business Impact

Cost breakdown on the test set (2,000 machine cycles)

Outcome	Count	Unit Cost	Total
False Negatives — missed failures	4	$10,000	$40,000
False Positives — unnecessary inspections	181	$500	$90,500
Total projected cost			$130,500

Comparison against standard maintenance strategies (1,000-machine fleet)

Strategy	Failures Caught	Annual Cost	Saving vs Reactive
Reactive — wait for breakdown	0%	$340,000	—
Preventive — fixed schedule	100%	$500,000	−$160,000
This Model — LightGBM, threshold 0.32	94%	$79,000	$261,000 (76.8%)

7. Repository Structure

predictive-maintenance-engine/
│
├── assets/
│   └── screenshots/
│       ├── 01_live_prediction_safe.png
│       ├── 02_cost_analysis_safe.png
│       ├── 03_live_prediction_danger.png
│       ├── 04_cost_analysis_danger.png
│       ├── 05_batch_analysis_summary.png
│       ├── 06_batch_analysis_table.png
│       ├── 07_business_dashboard.png
│       └── 08_model_leaderboard.png
│
├── artifacts/                         # Auto-generated — gitignored
│   ├── graphs/
│   │   ├── confusion_matrix.png
│   │   ├── roc_curve.png
│   │   └── feature_importance.png
│   └── model_leaderboard.csv
│
├── api/
│   ├── __init__.py
│   └── main.py                        # /predict, /predict-batch, /health
│
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── data_ingestion.py
│   ├── feature_engineering.py
│   ├── modeling.py
│   └── evaluation.py
│
├── tests/
│   └── test_pipeline.py               # 14 pytest unit tests
│
├── main_execution.ipynb               # Training pipeline (Colab)
├── run_pipeline.py                    # Training pipeline (local)
├── streamlit_app.py                   # Streamlit dashboard
├── monitoring.py                      # KS drift detection
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

8. Quickstart

Option A — Live App (No Installation)

Visit https://predictive-maintenance-deep-shah.streamlit.app/ directly in your browser.

Option B — Google Colab (Recommended for Training)

1. Upload the project to Google Drive:

MyDrive/
└── predictive-maintenance-engine/
    ├── src/
    ├── api/
    ├── tests/
    ├── streamlit_app.py
    ├── monitoring.py
    └── requirements.txt

2. Open main_execution.ipynb in Google Colab and run all cells.

The pipeline mounts Drive, downloads the dataset automatically via gdown, trains all 9 models, tunes the champion, and saves every artifact back to Drive.

Option C — Local

git clone https://github.com/DeepShah111/predictive-maintenance-engine.git
cd predictive-maintenance-engine

python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Mac/Linux

pip install -r requirements.txt
python run_pipeline.py

9. Streamlit App

streamlit run streamlit_app.py
# → http://localhost:8501

Tab	What it does
⚡ Live Prediction	Sensor sliders → real-time failure probability gauge + risk level + cost impact
📂 Batch Analysis	Upload CSV → ranked fleet risk table + distribution chart + downloadable results
📊 Business Dashboard	Strategy cost comparison + live threshold slider with FP/FN/cost update

Live deployment: https://predictive-maintenance-deep-shah.streamlit.app/

10. FastAPI — REST Endpoints

uvicorn api.main:app --reload --port 8000
# → http://localhost:8000/docs

Method	Endpoint	Description
`GET`	`/health`	Model loaded status, threshold, version
`POST`	`/predict`	Single reading → probability + risk level + recommended action
`POST`	`/predict-batch`	List of readings → predictions + fleet summary

Example — Single Prediction:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "machine_type": "L",
    "air_temperature_K": 302.0,
    "process_temperature_K": 309.0,
    "rotational_speed_rpm": 1200,
    "torque_Nm": 65.0,
    "tool_wear_min": 240,
    "machine_id": "MACHINE-001"
  }'

Expected response:

{
  "machine_id": "MACHINE-001",
  "failure_probability": 0.812,
  "failure_probability_pct": 81.2,
  "risk_level": "DANGER",
  "recommended_action": "IMMEDIATE maintenance required. Take machine offline.",
  "expected_cost_if_ignored": 8120.0,
  "physics_features": {
    "Temp_Diff": 7.0,
    "Power": 72600.0,
    "Force_Ratio": 0.054167
  },
  "model_name": "Lightgbm",
  "threshold_used": 0.32
}

11. Docker Deployment

# Build and run
docker compose up --build
# → API available at http://localhost:8000

# Stop
docker compose down

The artifacts/ directory is mounted as a read-only volume so the container always uses the latest trained model without a rebuild.

12. Drift Detection & Monitoring

The monitoring.py module detects covariate shift between training and production data using the Kolmogorov-Smirnov test (α = 0.05).

from monitoring import DriftMonitor
import pandas as pd

monitor = DriftMonitor()
alerts = monitor.check_drift(pd.read_csv("new_readings.csv"), tag="production_batch_1")

if alerts:
    for a in alerts:
        print(f"DRIFT: {a['feature']} — shift {a['mean_shift_pct']:.1f}%")

CLI usage:

python monitoring.py --csv new_sensor_data.csv --tag production_jan_2025

All alerts logged to artifacts/drift_alerts.csv with timestamp, KS statistic, p-value, and mean shift percentage.

13. Running Tests

python -m pytest tests/ -v

collected 14 items

tests/test_pipeline.py::test_physics_features_columns_created             PASSED
tests/test_pipeline.py::test_physics_features_temp_diff_value             PASSED
tests/test_pipeline.py::test_physics_features_power_value                 PASSED
tests/test_pipeline.py::test_physics_features_no_infinities               PASSED
tests/test_pipeline.py::test_leakage_cols_dropped_after_split             PASSED
tests/test_pipeline.py::test_get_preprocessor_returns_column_transformer  PASSED
tests/test_pipeline.py::test_clean_data_removes_duplicates                PASSED
tests/test_pipeline.py::test_clean_data_index_is_contiguous               PASSED
tests/test_pipeline.py::test_build_features_and_split_returns_six_objects PASSED
tests/test_pipeline.py::test_build_features_and_split_sizes               PASSED
tests/test_pipeline.py::test_build_features_and_split_class_balance       PASSED
tests/test_pipeline.py::test_total_cost_metric_correct_value              PASSED
tests/test_pipeline.py::test_total_cost_metric_degenerate_returns_inf     PASSED
tests/test_pipeline.py::test_schema_validation_raises_on_missing_columns  PASSED

14 passed in ~18s

14. Dataset

AI4I 2020 Predictive Maintenance Dataset

Property	Value
Source	UCI ML Repository · Kaggle
Rows	10,000
Features used	11 (8 numerical + 1 categorical + 3 physics-derived)
Target	`Machine failure` (binary: 0 = healthy, 1 = failure)
Class distribution	96.6% healthy / 3.4% failure
Leakage columns dropped	`UDI`, `Product ID`, `TWF`, `HDF`, `PWF`, `OSF`, `RNF`

The leakage columns (TWF through RNF) are individual failure-mode sub-flags set to 1 only when Machine failure is also 1. Keeping them would let the model read the answer directly — they are dropped before any modelling step. The dataset downloads automatically on first run via gdown.

Built as a portfolio project demonstrating production ML engineering practices.
Structured for clarity, correctness, and interview-readiness.

🚀 Live Demo | 📁 GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
api		api
artifacts		artifacts
assets/screenshots		assets/screenshots
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
monitoring.py		monitoring.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
runtime.txt		runtime.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

Predictive Maintenance Engine — Enterprise Edition

🚀 Live Demo

📸 Application Screenshots

Live Prediction — Safe Machine (H-Type, Fresh Tool)

Cost Impact Analysis — Safe Machine

Live Prediction — Critical Danger (L-Type, Multiple Failure Modes)

Cost Impact Analysis — Critical Machine

Batch Analysis — Fleet-Wide Risk Assessment

Batch Analysis — Risk-Ranked Machine Table

Business Dashboard — Annual Cost Comparison

Model Leaderboard — 9-Model Benchmark

Table of Contents

1. The Business Problem

2. What Makes This Different

Side-by-side comparison

3. System Architecture

3.1 ML Training Pipeline

3.2 Full Production Stack

4. Technical Decisions & Rationale

4.1 Physics-Based Feature Engineering

4.2 Class Imbalance — SMOTE in the Right Place

4.3 Why Three Splits (Train / Val / Test)?

4.4 Champion Selection by CV F1, Not Test F1

4.5 Hyperparameter Tuning Objective

4.6 OrdinalEncoder for Machine Type

5. Results

5.1 Model Leaderboard — 5-Fold Stratified CV

5.2 Champion: LightGBM — Final Test-Set Report

5.3 Diagnostic Plots

6. Business Impact

Cost breakdown on the test set (2,000 machine cycles)

Comparison against standard maintenance strategies (1,000-machine fleet)

7. Repository Structure

8. Quickstart

Option A — Live App (No Installation)

Option B — Google Colab (Recommended for Training)

Option C — Local

9. Streamlit App

10. FastAPI — REST Endpoints

11. Docker Deployment

12. Drift Detection & Monitoring

13. Running Tests

14. Dataset

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages