Skip to content

arashabadi/cm4ai_codefest2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2025 CM4AI Hackathon at UAB

Logo

This repo is related to the participation in 2025 CM4AI Hackathon at UAB

UPDATE - SEP 9,2025:

We have successfully completed the SubCell embeddings on the segmented and cropped IF images. This represents a key milestone in our project.

The integrated code and outputs are available here (see README.md for details).

Along the way, we strengthened three core components of the workflow, building on the earlier notebooks and contributions from the team:

  • Segmentation: We identified and resolved inconsistencies in per-channel cropping, updating to the latest Cellpose (v4.0.6) and tuning parameters to yield consistent and reliable masks across channels. While there is still room to further optimize for tightly clustered cells, the segmentation pipeline is now robust and reproducible.
  • Cropping & Preprocessing: We established a standardized per-cell cropping workflow that preserves cell counts across channels, introduces neighbor-masked variants, and outputs structured path_list.csv files fully compatible with SubCellPortable. This provides a clean foundation for downstream inference.
  • SubCell Inference & Analysis: We successfully deployed SubCell locally, resolved technical issues in environment setup and CSV parsing, and now generate embeddings, attention maps, and class probabilities.

With these foundations in place, the next step is to proceed with comparative analysis using the cm4ai pipeline, after which we plan to draft the preprint.

UPDATE - SEP 2,2025:

  1. We are currently working on finishing the Subcell embedding on our segmented and cropped images. We hope to have this completed by the end of next week. (Arash Abadi and Jebediah Smith)

  2. We are finalizing the full cm4ai pipeline on the subset of images we used in the subcell process. Once completed , we will begin our comparative analysis. (Morgan Smith and Rebecca Bernal)

  3. We aim to write up these results as a preprint submission to bioarXiv (hacker) with CM4AI team approval and guidance.


Day 1 (9AM-5PM) - August 14th, 2025

Some suggested projects:

  1. Embedding
  2. Building community network and hierarchy > classic louvain algorithm
  3. Visible Neural Networks (VNN)

Provided compute power (TACC by The University of Texas at Austin):

$ ssh [email protected]

Our Team (Embedding Mafia)

  • Arash Abadi (UAB)
  • Morgan Smith (UT Health San Antonio)
  • Rebecca Bernal (UT Health San Antonio)
  • Jedediah Smith (UAB)
  • Mona Shabana (UAB)

Selected Project:

We are going to Implement alternate image embedding method SubCell for immunofluorescence images:

Let's run IF tutorial and then subcell

  1. First try cm4ai-tutorial-immunofluorescence/ we should download 11GB data IF images in RO-Crate format

we downloaded by python src/download.py

  1. SubCell requires segmented images of cells , so we are going to perform cell segmentaiton by the same tool they used for preparing their data to train their model. HPA Cell Segmenation
  • Try HPA Cell Segmenation > reuqires cuda toolkit (NVIDIA GPU) >> Run on TACC or Cheaha

  • at HPA Cell Segmentation github Clone > conda env create -f environment.yml > sh install.sh

Hpacellseg should be run as a python script.


Day 2 (9AM-5PM) - August 15th, 2025

I have to connect my github account to TACC / shift to cheaha / use wget or curl to download the prepared code from github into TACC. I will go with wget raw file (python script to run hpacellseg) from github.

Prepare testing data to run hpacellseg

  1. connect to TACC via ssh
  2. idev > 3 (default)
  3. cd $WORK
  4. cd ./analysis
  5. bash data_transfer.sh to transfer 10 images to copy 10 images from "cm4ai-tutorial-immunofluorescence-main/data/raw/paclitaxel/blue" to "./data/"
  6. conda activate hpacellseg

Run hpacellseg > crop images > run subcell

  1. python ./run_hpa_segmentation.py

it will generate a directory called segmentation_results in the same directory of analysis

  1. I will transfer the results into my local machine (MacBook) via scp command
#Transfer the results into my local machine (MacBook) via scp protocol
hostname #(should be local machine)
cd ~
scp -r [email protected]:/work2/10900/USERNAME/frontera/analysis/segmentation_results ~/tacc  # tacc is a testing directory in my local machine
  1. Now let's prepare the data for input of subcell

    We have selected 10 first images from paclitaxel channels (related to cm4ai-tutorial-immunofluorescence/) for hpacellseg input.

  2. Run hpacellseg (Rebecca)

    https://github.com/Bayes-Student1/CM4AI-Group-Project-

  3. prepare cropped images for subcell input (Morgan)

    https://github.com/morgansmith27

  4. Run subcell (Jedediah)

    https://github.com/OriginalBrick/cm4ai-codefest

What we’ll be working on for the next few days.

  • Morgan: Cropping and subcellular visualization on the stacked images (all colors on new dataset) via Google Colab (possibly Visual Studio Code)
  • Jebediah: Subcell tutorial working to change data
  • Rebecca: Currently rerunning the segmentation and renaming the files VS Code
  • Arash: Project management and github maintanence
  • Mona: Background/Significance for powerpoint
  • Editing google slides/powerpoint for everyone

Acknowledgments

We would like to thank the following people who provided significant assistance and support throughout these two days:


My extra notes:

Project Theme - Data Embedding

Data embedding involves transforming high-dimensional biological data (e.g., imaging, proteomics, gene expression) into lower-dimensional representations that preserve meaningful patterns or relationships. The Cell Map pipeline starts by generating embeddings for biological entities such as proteins or genes from each input data source (IF image, AP-MS, and/or perturb-seq). After source-specific embedding, a joint embedding is created and used to generate the protein-protein interaction (PPI) network, which is then used to create hierarchical cell maps. In the current pipeline, a DenseNet model pre-trained on images from the Human Protein Atlas is used to generate image embeddings and node2vec is used to generate embeddings for AP-MS data. Joint/co-embeddings have been implemented using muse and proteingps in the current pipeline. Additional background is provided in Schaffer et al. (2025) and Lenkiewicz et al. (2025).

  • CM4AI preprint:

    Input data streams are integrated via the multi-scale integrated cell (MuSIC) software pipeline employing deep learning models and community detection algorithm.

  • In MuSIC paper:

    For image embedding we used DenseNet, a convolutional neural network with superior performance in capturing protein locations relative to counter-stained cellular landmarks

  • In U2OS Multi-Modal Cell Map paper:

    For the IF data, we applied DenseNet-121

  • U2OS Cell Map data to visualize via cytoscape: https://musicmaps.ai/u2os-cellmap/


Cell Mapping Publications & Background Reading

CM4AI Pipeline and Tools

The official CM4AI Cell Map Pipeline code and documentation are available at:

In addition to these repositories, development forks and environment setup instructions that may be more easily adapted to CodeFest projects are available at:

This development environment can be used to easily make changes to individual steps in the cell map AI/ML pipeline and log training parameters/metrics to MLFlow to assess the impact of new methods or pipeline configurations on generated cell maps.


Perturbation Correlation Network

  1. For each perturbation, compute the mean of all cells (perturbation mean)
  2. Compute the pairwise Pearson correlation matrix of perturbation means
  3. Use UMAP on the correlation matrix to visualize which perturbations correlate similarly

cells that are cluster each other will have similar perturbation means -> results in similar cell phenotype after those perturbations

To install CellMaps Pipeline:

conda create -n cm4ai python=3.8
conda activate cm4ai
pip install cellmaps_pipeline

About

This repo is related to participation in 2025 CM4AI Hackathon at UAB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published