Skip to content

PSriVarshan/AlzWELL-Core

 
 

Repository files navigation

AlzWELL: Bridging AI and Healthcare for Alzheimer's and Readmission Challenges

1️⃣ Alzheimer's Detection Using Blood Biomarkers

🔍 Objective

Predict whether a patient has Alzheimer's or not using structured clinical biomarkers such as age, gender, FDG, PIB, MMSE scores, APOE4 alleles, and marital status.

🏗️ What We Did

  • Preprocessed clinical tabular data by handling missing values, encoding categorical variables, and mapping diagnosis outcomes to binary classes (0 = Normal, 1 = Dementia).
  • Built an ensemble model using a Stacking Classifier combining:
    • Random Forest
    • Gradient Boosting
    • Support Vector Machine (SVM)
    • Logistic Regression as the meta-learner.

🎯 Achievements

  • Achieved 92% accuracy on the test dataset.
  • Provides an efficient early detection tool for Alzheimer's using only blood biomarkers.
  • Integrated a user-friendly prediction interface for real-time risk assessment.

Dependencies

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

Data Preprocessing

# Load the dataset
file_path = 'BIOM.csv'
data = pd.read_csv(file_path)

# Select relevant columns
selected_columns = ["AGE", "PTGENDER", "FDG", "PIB", "MMSE", "PTMARRY", "APOE4", "DX"]
data_selected = data[selected_columns]

# Clean and preprocess data
data_cleaned = data_selected.replace("NA", pd.NA)

# Fill missing values
data_cleaned['AGE'] = data_cleaned['AGE'].fillna(data_cleaned['AGE'].mean())
data_cleaned['FDG'] = data_cleaned['FDG'].fillna(data_cleaned['FDG'].mean())
data_cleaned['PIB'] = data_cleaned['PIB'].fillna(data_cleaned['PIB'].mean())
data_cleaned['MMSE'] = data_cleaned['MMSE'].fillna(data_cleaned['MMSE'].mean())
data_cleaned['APOE4'] = data_cleaned['APOE4'].fillna(data_cleaned['APOE4'].mode()[0])
data_cleaned['PTGENDER'] = data_cleaned['PTGENDER'].fillna(data_cleaned['PTGENDER'].mode()[0])
data_cleaned['PTMARRY'] = data_cleaned['PTMARRY'].fillna(data_cleaned['PTMARRY'].mode()[0])

# Convert categorical variables to numerical
data_cleaned['PTGENDER'] = data_cleaned['PTGENDER'].map({'Male': 0, 'Female': 1})
data_cleaned['PTMARRY'] = data_cleaned['PTMARRY'].map({
    'Married': 0, 'Divorced': 1, 'Widowed': 2, 'Never married': 3, 'Unknown': 4
})

# Map DX column to binary classification
dx_mapping = {
    'NL': 0, 'NL to MCI': 0, 'MCI to NL': 0, 'MCI': 0,
    'Dementia': 1, 'MCI to Dementia': 1, 'NL to Dementia': 1
}
data_cleaned['DX'] = data_cleaned['DX'].map(dx_mapping).dropna()

# Separate features and target variable
X = data_cleaned.drop(columns=['DX'])
y = data_cleaned['DX']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)

Model Training

# Define base learners
base_learners = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('gb', GradientBoostingClassifier(n_estimators=100, random_state=42)),
    ('svm', SVC(kernel='linear', probability=True))
]

# Create the stacking ensemble
stacking_model = StackingClassifier(estimators=base_learners, final_estimator=LogisticRegression())

# Train the stacking model
stacking_model.fit(X_train, y_train)

image

Model Evaluation

# Make predictions
y_pred = stacking_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the stacking model: {accuracy * 100:.2f}%")

image

Making Predictions

def get_user_input():
    age = float(input("Enter Age: "))
    ptgender = int(input("Enter Gender (0 = Male, 1 = Female): "))
    fdg = float(input("Enter FDG value: "))
    pib = float(input("Enter PIB value: "))
    mmse = float(input("Enter MMSE score: "))
    ptmarry = int(input("Enter Marital Status (0=Married, 1=Divorced, 2=Widowed, 3=Never married, 4=Unknown): "))
    apoe4 = int(input("Enter APOE4 allele count (0, 1, or 2): "))

    input_data = np.array([[age, ptgender, fdg, pib, mmse, ptmarry, apoe4]])
    return input_data

# Get user input
user_input = get_user_input()

# Make prediction
prediction = stacking_model.predict(user_input)
probabilities = stacking_model.predict_proba(user_input)

# Display results
print(f"Predicted class (0 = Normal, 1 = Dementia): {prediction[0]}")
print(f"Probability for each class: {probabilities[0]}")

2️⃣ Alzheimer's Detection Using MRI Scans

🔍 Objective

Detect Alzheimer's disease based on MRI scans using a deep learning image classification model.

🏗️ What We Did

  • Preprocessed MRI image datasets:
    • Resized to (176x176).
    • Label encoding.
    • Class balancing using SMOTE to handle imbalanced data.
  • Built a CNN model with:
    • Multiple convolutional layers.
    • Pooling layers and dropout for regularization.
  • Applied EarlyStopping and ModelCheckpoint for training stability.

🎯 Achievements

  • Achieved over 90% test accuracy on MRI-based classification.
  • Provides a robust image-based diagnostic support tool.
  • Visualized predictions and performance metrics, including confusion matrices and learning curves.

Dependencies

import os
import cv2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import keras
from keras.callbacks import EarlyStopping,ModelCheckpoint
import tensorflow as tf
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from tqdm import tqdm
from imblearn.over_sampling import SMOTE

Reading the Dataset

import os
import pandas as pd

# Directory containing the images
base_dir = 'ad'

# Initialize lists to store image paths and their corresponding labels
images = []
labels = []

# Iterate over each category folder
for label in os.listdir(base_dir):
    label_dir = os.path.join(base_dir, label)
    
    # Ensure it's a directory and not a file
    if os.path.isdir(label_dir):
        # Iterate over each image in the folder
        for image_filename in os.listdir(label_dir):
            image_path = os.path.join(label_dir, image_filename)
            images.append(image_path)
            labels.append(label)

# Create a DataFrame with the image paths and corresponding labels
df = pd.DataFrame({'image': images, 'label': labels})

# Display the DataFrame
df

Displaying the Dataset

plt.figure(figsize=(50,50))
for n,i in enumerate(np.random.randint(0,len(df),50)):
    plt.subplot(10,5,n+1)
    img=cv2.imread(df.image[i])
    img=cv2.resize(img,(224,224))
    img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
    plt.imshow(img)
    plt.axis('off')
    plt.title(df.label[i],fontsize=25)

image

Data Augmentation

Size=(176,176)
work_dr = ImageDataGenerator(
    rescale = 1./255
)
train_data_gen = work_dr.flow_from_dataframe(df,x_col='image',y_col='label', target_size=Size, batch_size=6500, shuffle=False)

train_data, train_labels = train_data_gen.next()

class_num=list(train_data_gen.class_indices.keys())
class_num

sm = SMOTE(random_state=42)
train_data, train_labels = sm.fit_resample(train_data.reshape(-1, 176 * 176 * 3), train_labels)
train_data = train_data.reshape(-1, 176,176, 3)
print(train_data.shape, train_labels.shape)

labels=[class_num[i] for i in np.argmax(train_labels,axis=1) ]
plt.figure(figsize=(15,8))
ax = sns.countplot(x=labels,palette='Set1')
ax.set_xlabel("Class",fontsize=20)
ax.set_ylabel("Count",fontsize=20)
plt.title('The Number Of Samples For Each Class',fontsize=20)
plt.grid(True)
plt.xticks(rotation=45)
plt.show()

image

Data Splitting for Training, Validation, and Testing

X_train, X_test1, y_train, y_test1 = train_test_split(train_data,train_labels, test_size=0.3, random_state=42,shuffle=True,stratify=train_labels)
X_val, X_test, y_val, y_test = train_test_split(X_test1,y_test1, test_size=0.5, random_state=42,shuffle=True,stratify=y_test1)
print('X_train shape is ' , X_train.shape)
print('X_test shape is ' , X_test.shape)
print('X_val shape is ' , X_val.shape)
print('y_train shape is ' , y_train.shape)
print('y_test shape is ' , y_test.shape)
print('y_val shape is ' , y_val.shape)

Model Training

model=keras.models.Sequential()
model.add(keras.layers.Conv2D(32,kernel_size=(3,3),strides=2,padding='same',activation='relu',input_shape=(176,176,3)))
model.add(keras.layers.MaxPool2D(pool_size=(2,2),strides=2,padding='same'))
model.add(keras.layers.Conv2D(64,kernel_size=(3,3),strides=2,activation='relu',padding='same'))
model.add(keras.layers.MaxPool2D((2,2),2,padding='same'))
model.add(keras.layers.Conv2D(128,kernel_size=(3,3),strides=2,activation='relu',padding='same'))
model.add(keras.layers.MaxPool2D((2,2),2,padding='same'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(1024,activation='relu'))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Dense(4,activation='softmax'))
model.summary()

image

Model Architecture

tf.keras.utils.plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True,show_dtype=True,dpi=120)

model

Model Evaluation

checkpoint_cb =ModelCheckpoint("CNN_model.h5", save_best_only=True)
early_stopping_cb =EarlyStopping(patience=10, restore_best_weights=True)
model.compile(optimizer ='adam', loss='categorical_crossentropy', metrics=['accuracy'])
hist = model.fit(X_train,y_train, epochs=50, validation_data=(X_val,y_val), callbacks=[checkpoint_cb, early_stopping_cb])

hist_=pd.DataFrame(hist.history)
hist_

plt.figure(figsize=(15,10))
plt.subplot(1,2,1)
plt.plot(hist_['loss'],label='Train_Loss')
plt.plot(hist_['val_loss'],label='Validation_Loss')
plt.title('Train_Loss & Validation_Loss',fontsize=20)
plt.legend()
plt.subplot(1,2,2)
plt.plot(hist_['accuracy'],label='Train_Accuracy')
plt.plot(hist_['val_accuracy'],label='Validation_Accuracy')
plt.title('Train_Accuracy & Validation_Accuracy',fontsize=20)
plt.legend()
plt.show()

image

Making Predictions

score, acc= model.evaluate(X_test,y_test)
print('Test Loss =', score)
print('Test Accuracy =', acc)

predictions = model.predict(X_test)
y_pred = np.argmax(predictions,axis=1)
y_test_ = np.argmax(y_test,axis=1)
df = pd.DataFrame({'Actual': y_test_, 'Prediction': y_pred})
df

plt.figure(figsize=(30,70))
for n,i in enumerate(np.random.randint(0,len(X_test),50)):
    plt.subplot(10,5,n+1)
    plt.imshow(X_test[i])
    plt.axis('off')
    plt.title(f"Actual: {class_num[y_test_[i]]}, \n Predicted: {class_num[y_pred[i]]}.\n Confidence: {round(predictions[i][np.argmax(predictions[i])],0)}%",fontsize=20)

image


3️⃣ Hospital Readmission Prediction

🔍 Objective

Predict whether an Alzheimer's patient will be readmitted within 30 days after discharge to assist healthcare providers in risk management.

🏗️ What We Did

  • Utilized features like:
    • Age, length of stay, prior admissions, MMSE, medications, caregiver support, discharge type.
  • Applied:
    • One-hot encoding, feature scaling, and RFE (Recursive Feature Elimination) for feature selection.
  • Trained a Support Vector Machine (SVM) with GridSearchCV for hyperparameter optimization.

🎯 Achievements

  • Achieved:
    • Accuracy: 89%
    • F1-Score: 0.88
    • ROC-AUC: 0.90
  • Helps hospitals reduce readmission rates and personalize patient care strategies.

🚀 Overall Impact

  • A comprehensive AI-powered healthcare pipeline combining:
    • Tabular data ML models
    • MRI-based CNN image models
    • Readmission risk prediction models
  • Supports clinicians with early diagnosis, efficient resource management, and preventive care planning.

🛠️ Technologies Used

  • Machine Learning: scikit-learn, pandas, numpy
  • Deep Learning: TensorFlow, Keras
  • Data Augmentation: OpenCV, ImageDataGenerator, SMOTE
  • Visualization: Matplotlib, Seaborn
  • Model Deployment: (Future Scope) – Can extend with Streamlit, FastAPI, Spring AI

🤖 Future Improvements

  • Integration with web or cloud-based applications (AWS, Spring Boot Microservices).
  • Expand to multi-modal models combining text-based EMR data and images.
  • Deploy as a full-stack AI healthcare assistant.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%