Federated Learning for ECG Classification

This project implements a privacy-preserving machine learning approach for ECG anomaly detection using the MIT-BIH Arrhythmia Database. The system demonstrates how multiple hospitals can collaboratively train a Convolutional Neural Network (CNN) model without sharing sensitive patient data, addressing critical privacy concerns in healthcare AI.

Understanding Federated Learning

Federated Learning is a revolutionary approach to machine learning that enables multiple organizations to collaborate on training AI models while keeping their data completely private and secure. Instead of centralizing data, the model travels to where the data lives, trains locally, and only shares learned parameters back to a central server. This approach is particularly valuable in healthcare, where patient privacy regulations like HIPAA make traditional data sharing nearly impossible.

Dataset Overview

This project uses the renowned MIT-BIH Arrhythmia Database, which contains high-quality ECG recordings from patients with various cardiac conditions. The dataset has been preprocessed for binary classification to distinguish between normal and abnormal heartbeat patterns.

Dataset Statistics:

Training samples: 87,554 ECG recordings
Test samples: 21,892 ECG recordings
Signal length: 187 data points per ECG beat
Classification task: Binary (Normal vs Abnormal ECG patterns)
Class distribution: Approximately 83% normal, 17% abnormal cases

The dataset can be downloaded from the following link: MIT-BIH Arrhythmia Dataset The project uses the MIT-BIH Arrhythmia Dataset, preprocessed into train and test CSV files.

mitbih_train.csv - Training data
mitbih_test.csv - Test data

System Architecture

The federated learning system follows a distributed training paradigm where a global model coordinates learning across multiple healthcare institutions. Each hospital maintains complete control over their patient data while contributing to a collectively trained model.

graph TD
    A[Global Model] --> B[Hospital 1]
    A --> C[Hospital 2] 
    A --> D[Hospital 3]
    
    B --> E[Local Training<br/>+ Differential Privacy]
    C --> F[Local Training<br/>+ Differential Privacy]
    D --> G[Local Training<br/>+ Differential Privacy]
    
    E --> H[Model Parameters]
    F --> I[Model Parameters]
    G --> J[Model Parameters]
    
    H --> K[Federated Averaging]
    I --> K
    J --> K
    
    K --> L[Updated Global Model]
    L --> A
    
    style A fill:#e1f5fe,color:#000000
    style L fill:#e8f5e8,color:#000000
    style E fill:#fff3e0,color:#000000
    style F fill:#fff3e0,color:#000000
    style G fill:#fff3e0,color:#000000

The architecture ensures that sensitive patient data never leaves the hospital premises. Only model parameters (weights and biases) are shared, which cannot be reverse-engineered to reconstruct original patient information.

Technical Implementation

Model Architecture: The system employs a 1D Convolutional Neural Network specifically designed for time-series ECG analysis. The CNN architecture includes two convolutional layers for feature extraction, max-pooling for dimensionality reduction, and fully connected layers for final classification.

Privacy Protection: Differential privacy is implemented using the Opacus library, which adds carefully calibrated noise to model updates. This provides mathematical guarantees that individual patient records cannot be identified from the shared model parameters.

Federated Averaging: The central server aggregates model updates from all participating hospitals using the Federated Averaging algorithm, which computes weighted averages of model parameters to create an improved global model.

Dependencies and Installation

The project requires several Python libraries for deep learning, data processing, and privacy-preserving computations. The main dependencies include PyTorch for neural network implementation, Opacus for differential privacy, and PySyft for federated learning simulation.

pip install torch pandas matplotlib seaborn scikit-learn opacus syft

📁 Project Structure

├── Dataset/
│   ├── mitbih_train.csv
│   └── mitbih_test.csv
├── federated_learning_notebook.ipynb
└── README.md

To get started with the project, download the MIT-BIH Arrhythmia Database CSV files and place them in the Dataset folder. The Jupyter notebook contains comprehensive documentation and can be run cell by cell to understand each component of the federated learning process.

Performance and Results

The federated learning approach achieves competitive performance while maintaining strict privacy guarantees. The trained model demonstrates robust classification accuracy across different types of cardiac arrhythmias, with comprehensive evaluation metrics including precision, recall, F1-score, and AUC-ROC analysis.

Expected Performance Metrics:

Test accuracy typically exceeds 95%
High sensitivity for detecting abnormal ECG patterns
Balanced precision and recall across both classes
Strong AUC-ROC scores indicating good discriminative ability

The system includes detailed visualizations showing training progress across hospitals, confusion matrices, ROC curves, and comparative analysis of federated versus centralized learning approaches.

Real-World Applications and Impact

Healthcare Networks: Hospital systems can collaboratively improve diagnostic models for rare diseases without compromising patient privacy. This approach enables smaller hospitals to benefit from the collective knowledge of larger medical centers.

Multi-Institutional Research: Medical researchers can conduct large-scale studies across multiple institutions while complying with strict privacy regulations like HIPAA and GDPR.

Global Health Initiatives: International health organizations can develop diagnostic tools that work across different populations and healthcare systems without requiring sensitive data to cross borders.

Regulatory Compliance: The system provides a framework for AI development that inherently respects patient privacy rights and regulatory requirements, making it suitable for production healthcare environments.

Privacy and Security Considerations

The implementation incorporates multiple layers of privacy protection to ensure patient data remains secure throughout the training process. Differential privacy adds mathematical guarantees by injecting carefully calibrated noise into model updates, making it computationally infeasible to extract individual patient information from the shared parameters.

The system is designed to be compliant with major healthcare privacy regulations including HIPAA in the United States and GDPR in Europe. By design, no raw patient data ever leaves the hospital's local environment, significantly reducing the risk of data breaches and unauthorized access.

Getting Started

To begin exploring federated learning with this project:

Clone or download the project repository to your local machine
Install the required Python dependencies using the provided requirements
Download the MIT-BIH Arrhythmia Database and place the CSV files in the Dataset directory
Open the Jupyter notebook and execute the cells sequentially
Observe how the model trains across multiple simulated hospitals
Analyze the comprehensive visualizations and performance metrics

License

This project is open-source and available for research and educational purposes.

Acknowledgments

MIT-BIH Arrhythmia Database
PyTorch for deep learning implementation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.DS_Store		.DS_Store
Federated Learning.ipynb		Federated Learning.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Federated Learning for ECG Classification

Understanding Federated Learning

Dataset Overview

System Architecture

Technical Implementation

Dependencies and Installation

📁 Project Structure

Performance and Results

Real-World Applications and Impact

Privacy and Security Considerations

Getting Started

License

Acknowledgments

About

Uh oh!

Languages

PrajwalAmte/Federated_Learning

Folders and files

Latest commit

History

Repository files navigation

Federated Learning for ECG Classification

Understanding Federated Learning

Dataset Overview

System Architecture

Technical Implementation

Dependencies and Installation

📁 Project Structure

Performance and Results

Real-World Applications and Impact

Privacy and Security Considerations

Getting Started

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages