🏥 Kidney CT Scan Classification Project

AI-Powered Medical Image Classification using Deep Learning

📋 Table of Contents

Overview
✨ Key Features
🛠️ Tech Stack
🏗️ Project Architecture
📁 Project Structure
🚀 Quick Start
📦 Installation
💻 Usage
🔌 API Documentation
🐳 Docker Deployment
🔧 Configuration
📊 Model Details
🧪 Testing
📈 Results
🤝 Contributing
📝 License
🙏 Acknowledgments
📧 Contact

📖 Overview

KidneyScan AI is an end-to-end deep learning pipeline for classifying kidney CT scan images as either "Tumor" or "Normal". The project leverages transfer learning with the VGG16 pre-trained model to achieve accurate medical image classification.

This production-ready application includes:

🔄 Automated ML pipeline orchestration with DVC
📈 Experiment tracking with MLflow
🌐 RESTful API with Flask
🎨 Modern, responsive web interface
🐳 Docker containerization support

⚠️ Disclaimer: This model is for educational and research purposes only. It should NOT be used as a substitute for professional medical diagnosis.

✨ Key Features

Feature	Description
🔬 Medical Image Classification	Classify kidney CT scans as Tumor or Normal using deep learning
🧠 Transfer Learning	Utilizes VGG16 pre-trained on ImageNet for feature extraction
📊 Experiment Tracking	Full MLflow integration for metrics, parameters, and model versioning
🔄 Pipeline Automation	DVC-powered reproducible ML pipeline from data ingestion to evaluation
🌐 REST API	Flask-based API for seamless integration with other applications
💫 Modern UI	Beautiful, responsive web interface with drag-and-drop functionality
🐳 Containerization	Docker support for easy deployment and scaling
⚡ Lazy Model Loading	Efficient memory usage with on-demand model loading

🛠️ Tech Stack

🤖 Machine Learning & Deep Learning

🌐 Web Development

🔧 MLOps & DevOps

📦 Python Libraries

python-box - Configuration management
pyYAML - YAML file handling
gdown - Google Drive file downloads
tqdm - Progress bars
joblib - Model serialization

🏗️ Project Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           KIDNEY CT SCAN CLASSIFIER                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐       │
│  │   DATA SOURCE   │     │   DATA SOURCE   │     │   DATA SOURCE   │       │
│  │  (Google Drive) │────▶│   (DVC Cache)    │────▶│  (Training)     │       │
│  └─────────────────┘     └─────────────────┘     └─────────────────┘       │
│         │                                               │                   │
│         ▼                                               ▼                   │
│  ┌──────────────────────────────────────────────────────────────┐         │
│  │                        DVC PIPELINE                           │         │
│  │  ┌─────────────┐  ┌──────────────┐  ┌────────────┐  ┌───────┐ │         │
│  │  │  Stage 01   │─▶│   Stage 02   │─▶│  Stage 03  │─▶│ Stage │ │         │
│  │  │Data Ingestion│  │Prepare Base  │  │  Training  │  │   04  │ │         │
│  │  │              │  │    Model     │  │            │  │ Eval  │ │         │
│  │  └─────────────┘  └──────────────┘  └────────────┘  └───────┘ │         │
│  └──────────────────────────────────────────────────────────────┘         │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │                        MLflow Tracking                           │        │
│  │  📊 Metrics │ 📈 Parameters │ 📦 Model Registry │ 🕒 Runs    │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │                    FLASK WEB APPLICATION                         │        │
│  │                                                                  │        │
│  │   ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐ │        │
│  │   │   /train    │  │  /predict   │  │     / (Home)            │ │        │
│  │   │  (API)      │  │   (API)     │  │   (Web Interface)       │ │        │
│  │   └─────────────┘  └─────────────┘  └─────────────────────────┘ │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │                      FRONTEND (HTML/CSS/JS)                      │        │
│  │  🎨 Drag & Drop │ 📷 Image Preview │ 📊 Results Display        │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Pipeline Stages

Data Ingestion (stage_01_data_ingestion.py)
- Downloads dataset from Google Drive
- Extracts and organizes CT scan images
- Version control with DVC
Prepare Base Model (stage_02_prepare_base_model.py)
- Loads VGG16 pre-trained on ImageNet
- Removes top classification layers
- Configures for transfer learning
Model Training (stage_03_model_training.py)
- Trains custom classification head
- Applies data augmentation
- Logs parameters and metrics to MLflow
- Saves trained model
Model Evaluation (stage_04_model_evaluation.py)
- Evaluates model on test set
- Calculates loss and accuracy
- Logs metrics to MLflow
- Saves evaluation scores

📁 Project Structure

kidney-prediction-deep-learning/
│
├── 📄 app.py                          # Flask web application entry point
├── 📄 main.py                         # Direct pipeline execution script
├── 📄 dvc.yaml                        # DVC pipeline definition
├── 📄 params.yaml                     # Hyperparameters and configuration
├── 📄 config/
│   └── config.yaml                    # Application configuration
│
├── 📄 requirements.txt                # Python dependencies
├── 📄 setup.py                        # Package setup configuration
├── 📄 Dockerfile                      # Docker container configuration
│
├── 📂 src/
│   └── cnnClassifer/
│       ├── 📂 components/              # ML pipeline components
│       │   ├── data_ingestion.py      # Data download & extraction
│       │   ├── prepare_base_model.py  # VGG16 base model setup
│       │   ├── model_training.py      # Model training logic
│       │   └── model_evaluation_mlflow.py  # Evaluation with MLflow
│       │
│       ├── 📂 pipeline/                # Pipeline stage executors
│       │   ├── stage_01_data_ingestion.py
│       │   ├── stage_02_prepare_base_model.py
│       │   ├── stage_03_model_training.py
│       │   ├── stage_04_model_evaluation.py
│       │   └── prediction.py          # Prediction pipeline
│       │
│       ├── 📂 config/                  # Configuration management
│       │   └── configuration.py        # Config manager class
│       │
│       ├── 📂 entity/                   # Data entities
│       │   └── config_entity.py        # Configuration entities
│       │
│       ├── 📂 utils/                    # Utility functions
│       │   └── Common.py               # Common utilities
│       │
│       ├── 📂 constants/               # Constants
│       │   └── __init__.py
│       │
│       └── 📂 __init__.py              # Package initializer
│
├── 📂 model/
│   └── model.h5                        # Trained Keras model
│
├── 📂 templates/
│   └── index.html                      # Frontend web interface
│
├── 📂 research/
│   ├── 01_data_ingestion.ipynb         # Data exploration notebook
│   ├── 02_prepare_base_model.ipynb     # Base model notebook
│   ├── 03_model_training.ipynb         # Training notebook
│   ├── 04_model_evaluation_with_mlflow.ipynb  # Evaluation notebook
│   └── trials.ipynb                    # Experiment trials
│
├── 📂 artifacts/                       # ML pipeline outputs
│   ├── data_ingestion/                 # Downloaded dataset
│   ├── prepare_base_model/             # Base model files
│   └── training/                       # Trained model files
│
├── 📂 logs/                            # Application logs
│   └── (log files)
│
├── 📂 mlruns/                          # MLflow tracking data
│   └── 0/
│       └── meta.yaml
│
├── 📄 scores.json                      # Evaluation metrics
├── 📄 dvc.lock                         # DVC lock file
├── 📄 LICENSE                          # MIT License
└── 📄 .gitignore                       # Git ignore file

🚀 Quick Start

Prerequisites

Requirement	Version	Description
🐍 Python	3.8+	Programming language
🐳 Docker	20.10+	Containerization (optional)
💾 GPU	CUDA 11.8+	For faster training (optional)

One-Line Installation & Run

# Clone the repository
git clone https://github.com/vams2krish/Kidney-Disease-Classification-MLflow-DVC.git
cd Kidney-Disease-Classification-MLflow-DVC

# Install dependencies
pip install -r requirements.txt

# Start the web application
python app.py

Then open your browser and navigate to: http://localhost:8080

📦 Installation

1. Clone the Repository

git clone https://github.com/vams2krish/Kidney-Disease-Classification-MLflow-DVC.git
cd Kidney-Disease-Classification-MLflow-DVC

2. Create Virtual Environment (Recommended)

# Create virtual environment
python -m venv venv

# Activate on Windows
venv\Scripts\activate

# Activate on Linux/Mac
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Verify Installation

python -c "import tensorflow; import flask; import mlflow; print('All packages installed successfully!')"

💻 Usage

Running the ML Pipeline

Option 1: Using DVC (Recommended)

# Run the complete ML pipeline
dvc repro

Option 2: Using main.py

# Run pipeline directly
python main.py

Option 3: Run Individual Stages

# Stage 1: Data Ingestion
python src/cnnClassifer/pipeline/stage_01_data_ingestion.py

# Stage 2: Prepare Base Model
python src/cnnClassifer/pipeline/stage_02_prepare_base_model.py

# Stage 3: Model Training
python src/cnnClassifer/pipeline/stage_03_model_training.py

# Stage 4: Model Evaluation
python src/cnnClassifer/pipeline/stage_04_model_evaluation.py

Starting the Web Application

# Start Flask app (default port: 8080)
python app.py

# Or specify custom port
python app.py --port 5000

The web interface will be available at: http://localhost:8080

Making Predictions

Using cURL

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"image": "<base64_encoded_image>"}'

Using Python

import base64
import requests

# Read and encode image
with open("test_image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

# Make prediction
response = requests.post(
    "http://localhost:8080/predict",
    json={"image": image_data}
)

print(response.json())

Using the Web Interface

Open http://localhost:8080 in your browser
Drag and drop a CT scan image or click to browse
Click "Analyze CT Scan"
View the prediction results

Training via API

# Trigger training pipeline via API
curl -X GET http://localhost:8080/train

# Or using Python
import requests
response = requests.get("http://localhost:8080/train")
print(response.text)  # "Training done successfully!"

🔌 API Documentation

Base URL

http://localhost:8080

Endpoints

Method	Endpoint	Description	Request Body	Response
`GET`	`/`	Home page	-	HTML page
`GET`	`/train`	Trigger training	-	`String`
`POST`	`/predict`	Make prediction	`{"image": "<base64>"}`	`JSON`

Response Formats

Prediction Response

[
  {
    "image": "Normal"
  }
]

Or with error:

{
  "error": "Error message here"
}

Error Codes

Code	Description
`200`	Success
`400`	Bad Request - No image provided
`500`	Internal Server Error

🐳 Docker Deployment

Building the Docker Image

# Build the image
docker build -t kidney-scan-classifier:latest .

Running the Container

# Run the container
docker run -d -p 8080:8080 --name kidney-classifier kidney-scan-classifier:latest

# View logs
docker logs -f kidney-classifier

# Stop the container
docker stop kidney-classifier

Using Docker Compose

# docker-compose.yml
version: '3.8'

services:
  kidney-classifier:
    build: .
    ports:
      - "8080:8080"
    environment:
      - PYTHONUNBUFFERED=1
    volumes:
      - ./model:/app/model
      - ./artifacts:/app/artifacts

# Start with docker-compose
docker-compose up -d

Deployment to Cloud

AWS ECS

# Build and push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account>.dkr.ecr.us-east-1.amazonaws.com
docker build -t kidney-classifier .
docker tag kidney-classifier:latest <account>.dkr.ecr.us-east-1.amazonaws.com/kidney-classifier:latest
docker push <account>.dkr.ecr.us-east-1.amazonaws.com/kidney-classifier:latest

Heroku

# Using Heroku Container Registry
heroku login
heroku create kidney-classifier
heroku container:push web -a kidney-classifier
heroku container:release web -a kidney-classifier

🔧 Configuration

params.yaml

# Model Configuration
IMAGE_SIZE: [224, 224, 3]        # VGG16 standard input size
BATCH_SIZE: 16                   # Training batch size
EPOCHS: 10                       # Number of training epochs
CLASSES: 2                       # Number of output classes
WEIGHTS: imagenet                # Pre-trained weights
LEARNING_RATE: 0.01              # Initial learning rate
INCLUDE_TOP: False               # Exclude top layers
AUGMENTATION: True               # Enable data augmentation

config/config.yaml

artifacts_root: artifacts

data_ingestion:
  root_dir: artifacts/data_ingestion
  source_URL: https://drive.google.com/uc?id=<FILE_ID>
  local_data_file: artifacts/data_ingestion/data.zip
  unzip_dir: artifacts/data_ingestion

prepare_base_model:
  root_dir: artifacts/prepare_base_model
  base_model_path: artifacts/prepare_base_model/base_model.h5
  updated_base_model_path: artifacts/prepare_base_model/base_model_updated.h5

training:
  root_dir: artifacts/training
  trained_model_path: artifacts/training/model.h5

Environment Variables

Variable	Default	Description
`PYTHONUNBUFFERED`	1	Unbuffered Python output
`LANG`	en_US.UTF-8	Language setting
`LC_ALL`	en_US.UTF-8	Locale setting
`MLFLOW_TRACKING_URI`	-	MLflow tracking server URI

📊 Model Details

Architecture

┌─────────────────────────────────────────────────────┐
│                   VGG16 Backbone                    │
│  (Pre-trained on ImageNet, weights frozen)         │
├─────────────────────────────────────────────────────┤
│  Input: 224 x 224 x 3 (RGB Image)                   │
│                                                     │
│  Block 1: Conv2D(64) × 2 → MaxPool                  │
│  Block 2: Conv2D(128) × 2 → MaxPool                 │
│  Block 3: Conv2D(256) × 3 → MaxPool                 │
│  Block 4: Conv2D(512) × 3 → MaxPool                 │
│  Block 5: Conv2D(512) × 3 → MaxPool                 │
│                                                     │
│  Output: 7 × 7 × 512                                │
├─────────────────────────────────────────────────────┤
│              Custom Classification Head             │
│  ┌─────────────────────────────────────────────┐    │
│  │  GlobalAveragePooling2D                      │    │
│  │  Dense(512, activation='relu')               │    │
│  │  Dropout(0.5)                                │    │
│  │  Dense(2, activation='softmax')              │    │
│  └─────────────────────────────────────────────┘    │
│                                                     │
│  Output: [Probability_Tumor, Probability_Normal]   │
└─────────────────────────────────────────────────────┘

Training Configuration

Parameter	Value
Optimizer	Adam
Initial Learning Rate	0.01
Loss Function	Categorical Crossentropy
Batch Size	16
Epochs	10
Image Size	224 × 224 × 3
Data Augmentation	Yes (rotation, flip, zoom)

Dataset

Source: Kidney CT Scan Image Dataset
Classes: 2 (Tumor, Normal)
Split: Training, Validation, Test
Augmentation: Random horizontal flip, rotation, zoom

🧪 Testing

Running Tests

# Run all tests
pytest tests/

# Run specific test file
pytest tests/test_prediction.py -v

# Run with coverage
pytest --cov=src tests/

Manual Testing

# Test data ingestion
python -c "from cnnClassifer.components.data_ingestion import DataIngestion; print('DataIngestion imported successfully')"

# Test configuration
python -c "from cnnClassifer.config.configuration import ConfigurationManager; print('Config imported successfully')"

# Test prediction pipeline
python -c "from cnnClassifer.pipeline.prediction import PredictionPipeline; print('PredictionPipeline imported successfully')"

📈 Results

Model Performance

Metric	Value
Loss	22.63
Accuracy	48.2%

⚠️ Note: The current model performance can be improved with:

More training data

Extended training epochs

Hyperparameter tuning

Advanced architectures (ResNet, EfficientNet)

Learning rate scheduling

MLflow Tracking

To view experiment tracking:

# Start MLflow UI
mlflow ui

# Open browser at http://localhost:5000

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Workflow

# Fork the repository
# Create your feature branch
git checkout -b feature/AmazingFeature

# Make your changes
git commit -m 'Add some AmazingFeature'
git push origin feature/AmazingFeature

# Open a Pull Request

Coding Standards

Follow PEP 8 style guide
Write docstrings for all functions
Add type hints where applicable
Include unit tests for new features

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2024 KidneyScan AI

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

🙏 Acknowledgments

TensorFlow - Deep learning framework
Keras - High-level neural networks API
MLflow - ML lifecycle management
DVC - Data version control
Flask - Web framework
VGG16 - Pre-trained model
Google Drive - Data hosting
Community Contributors

📧 Contact

Contact	Details
👤 Author	Adam, Vamshi Krishna
📧 Email	adam.vamshikrishna@gmail.com
🔗 GitHub	vams2krish
💼 LinkedIn	Adam, Vamshi Krishna

⭐ Show Your Support

If you found this project helpful, please give it a ⭐ on GitHub!

Made with ❤️ by Adam Vamshi Krishna

Last updated: January 2024

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.devcontainer		.devcontainer
.dvc		.dvc
.github/workflows		.github/workflows
.vscode		.vscode
config		config
mlruns/0		mlruns/0
model		model
research		research
src/cnnClassifer		src/cnnClassifer
templates		templates
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
scores.json		scores.json
setup.py		setup.py
template.py		template.py

Folders and files

Latest commit

History

Repository files navigation