This repository contains scripts for compressing datasets from the cellxgene census using the scquill package.
-
compression/: Contains the main compression scripts and utilities.cellxgene_census_compression.py: Main script for compressing cellxgene census datasets.utils/: Utility functions for the compression process.ensembl_to_gene.py: Converts Ensembl IDs to gene names.guess_normalisation.py: Determines the normalization method used in the dataset.write_to_file.py: Handles writing metadata(dataset title, collections...etc) to files.
-
static/: Contains static data files used in the compression process.human_gene_pairs.csv: Gene pair information for human.mouse_gene_pairs.csv: Gene pair information for mouse.multi_condition_datasets.csv: Contains dataset IDs that have multiple experimental conditions.
-
requirements.txt: Lists all Python dependencies for the project.
These instructions will guide you through setting up your project environment and running the compression scripts.
- Python 3.11s
- pip (Python package installer)
First, clone this repository to your local machine using git:
git clone https://github.com/YingX97/cell_atlas_approximations_disease_compression
cd cell_atlas_approximations_disease_compressionIt is recommended to use a virtual environment to avoid conflicts with other projects or system packages:
python3.11 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`Install all required packages using pip:
pip install -r requirements.txtcd compression
python3 cellxgene_census_compression.py