Skip to content

This project implements a federated learning approach for binary classification of ECG signals from the MIT-BIH dataset. The goal is to simulate a decentralized healthcare scenario where multiple hospitals contribute to training a shared model without sharing patient data.

Notifications You must be signed in to change notification settings

PrajwalAmte/Federated_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Federated Learning for ECG Classification

This project implements a privacy-preserving machine learning approach for ECG anomaly detection using the MIT-BIH Arrhythmia Database. The system demonstrates how multiple hospitals can collaboratively train a Convolutional Neural Network (CNN) model without sharing sensitive patient data, addressing critical privacy concerns in healthcare AI.

Understanding Federated Learning

Federated Learning is a revolutionary approach to machine learning that enables multiple organizations to collaborate on training AI models while keeping their data completely private and secure. Instead of centralizing data, the model travels to where the data lives, trains locally, and only shares learned parameters back to a central server. This approach is particularly valuable in healthcare, where patient privacy regulations like HIPAA make traditional data sharing nearly impossible.

Dataset Overview

This project uses the renowned MIT-BIH Arrhythmia Database, which contains high-quality ECG recordings from patients with various cardiac conditions. The dataset has been preprocessed for binary classification to distinguish between normal and abnormal heartbeat patterns.

Dataset Statistics:

  • Training samples: 87,554 ECG recordings
  • Test samples: 21,892 ECG recordings
  • Signal length: 187 data points per ECG beat
  • Classification task: Binary (Normal vs Abnormal ECG patterns)
  • Class distribution: Approximately 83% normal, 17% abnormal cases

The dataset can be downloaded from the following link: MIT-BIH Arrhythmia Dataset The project uses the MIT-BIH Arrhythmia Dataset, preprocessed into train and test CSV files.

  • mitbih_train.csv - Training data
  • mitbih_test.csv - Test data

System Architecture

The federated learning system follows a distributed training paradigm where a global model coordinates learning across multiple healthcare institutions. Each hospital maintains complete control over their patient data while contributing to a collectively trained model.

graph TD
    A[Global Model] --> B[Hospital 1]
    A --> C[Hospital 2] 
    A --> D[Hospital 3]
    
    B --> E[Local Training<br/>+ Differential Privacy]
    C --> F[Local Training<br/>+ Differential Privacy]
    D --> G[Local Training<br/>+ Differential Privacy]
    
    E --> H[Model Parameters]
    F --> I[Model Parameters]
    G --> J[Model Parameters]
    
    H --> K[Federated Averaging]
    I --> K
    J --> K
    
    K --> L[Updated Global Model]
    L --> A
    
    style A fill:#e1f5fe,color:#000000
    style L fill:#e8f5e8,color:#000000
    style E fill:#fff3e0,color:#000000
    style F fill:#fff3e0,color:#000000
    style G fill:#fff3e0,color:#000000
Loading

The architecture ensures that sensitive patient data never leaves the hospital premises. Only model parameters (weights and biases) are shared, which cannot be reverse-engineered to reconstruct original patient information.

Technical Implementation

Model Architecture: The system employs a 1D Convolutional Neural Network specifically designed for time-series ECG analysis. The CNN architecture includes two convolutional layers for feature extraction, max-pooling for dimensionality reduction, and fully connected layers for final classification.

Privacy Protection: Differential privacy is implemented using the Opacus library, which adds carefully calibrated noise to model updates. This provides mathematical guarantees that individual patient records cannot be identified from the shared model parameters.

Federated Averaging: The central server aggregates model updates from all participating hospitals using the Federated Averaging algorithm, which computes weighted averages of model parameters to create an improved global model.

Dependencies and Installation

The project requires several Python libraries for deep learning, data processing, and privacy-preserving computations. The main dependencies include PyTorch for neural network implementation, Opacus for differential privacy, and PySyft for federated learning simulation.

pip install torch pandas matplotlib seaborn scikit-learn opacus syft

📁 Project Structure

├── Dataset/
│   ├── mitbih_train.csv
│   └── mitbih_test.csv
├── federated_learning_notebook.ipynb
└── README.md

To get started with the project, download the MIT-BIH Arrhythmia Database CSV files and place them in the Dataset folder. The Jupyter notebook contains comprehensive documentation and can be run cell by cell to understand each component of the federated learning process.

Performance and Results

The federated learning approach achieves competitive performance while maintaining strict privacy guarantees. The trained model demonstrates robust classification accuracy across different types of cardiac arrhythmias, with comprehensive evaluation metrics including precision, recall, F1-score, and AUC-ROC analysis.

Expected Performance Metrics:

  • Test accuracy typically exceeds 95%
  • High sensitivity for detecting abnormal ECG patterns
  • Balanced precision and recall across both classes
  • Strong AUC-ROC scores indicating good discriminative ability

The system includes detailed visualizations showing training progress across hospitals, confusion matrices, ROC curves, and comparative analysis of federated versus centralized learning approaches.

Real-World Applications and Impact

Healthcare Networks: Hospital systems can collaboratively improve diagnostic models for rare diseases without compromising patient privacy. This approach enables smaller hospitals to benefit from the collective knowledge of larger medical centers.

Multi-Institutional Research: Medical researchers can conduct large-scale studies across multiple institutions while complying with strict privacy regulations like HIPAA and GDPR.

Global Health Initiatives: International health organizations can develop diagnostic tools that work across different populations and healthcare systems without requiring sensitive data to cross borders.

Regulatory Compliance: The system provides a framework for AI development that inherently respects patient privacy rights and regulatory requirements, making it suitable for production healthcare environments.

Privacy and Security Considerations

The implementation incorporates multiple layers of privacy protection to ensure patient data remains secure throughout the training process. Differential privacy adds mathematical guarantees by injecting carefully calibrated noise into model updates, making it computationally infeasible to extract individual patient information from the shared parameters.

The system is designed to be compliant with major healthcare privacy regulations including HIPAA in the United States and GDPR in Europe. By design, no raw patient data ever leaves the hospital's local environment, significantly reducing the risk of data breaches and unauthorized access.

Getting Started

To begin exploring federated learning with this project:

  1. Clone or download the project repository to your local machine
  2. Install the required Python dependencies using the provided requirements
  3. Download the MIT-BIH Arrhythmia Database and place the CSV files in the Dataset directory
  4. Open the Jupyter notebook and execute the cells sequentially
  5. Observe how the model trains across multiple simulated hospitals
  6. Analyze the comprehensive visualizations and performance metrics

License

This project is open-source and available for research and educational purposes.

Acknowledgments

  • MIT-BIH Arrhythmia Database
  • PyTorch for deep learning implementation

About

This project implements a federated learning approach for binary classification of ECG signals from the MIT-BIH dataset. The goal is to simulate a decentralized healthcare scenario where multiple hospitals contribute to training a shared model without sharing patient data.

Topics

Resources

Stars

Watchers

Forks