Light Modulation Analysis

This repository contains scripts for analyzing light modulation simulations and generating figures for publication.

Setup

Installing Dependencies

# Install all dependencies
pip install -r requirements.txt

# Or install manually
pip install huggingface_hub pandas numpy matplotlib scipy scikit-learn plotly tqdm pyyaml shap

Note: The shap package is required for running shap_feature_ranking.py. All other dependencies are required for the core analysis pipeline.

Downloading Data from Hugging Face

The data folder (containing JT curves, features, and analysis outputs) is hosted on Hugging Face Hub. This allows for version-controlled, accessible data storage and easy sharing of the dataset.

Authentication

For public repositories, no authentication is required. For private repositories, you need to authenticate:

# Option 1: Use Hugging Face CLI (recommended)
huggingface-cli login

# Option 2: Set environment variable
export HF_TOKEN=your_token_here

# Option 3: Pass token directly to script
python download_data_from_hf.py --repo-id <username/dataset-name> --token your_token_here

Downloading the Data

# Download data from Hugging Face
python download_data_from_hf.py --repo-id <username/dataset-name>

# Example:
# python download_data_from_hf.py --repo-id your-username/light-modulation-data

# Additional options:
# --data-dir: Specify target directory (default: 'data')
# --filename: Specify archive filename (default: 'data.tar.gz')
# --force: Overwrite existing data directory without prompting
# --keep-archive: Keep the downloaded archive file after extraction

The script will:

Download the compressed data archive (data.tar.gz) from Hugging Face Hub
Extract it to the data/ folder in the git root
Verify the extraction and check for expected subdirectories

Note: The data folder should be located at the git root level (data/) for all scripts to work correctly. The download script automatically places it in the correct location.

What's Included in the Data

The downloaded data archive typically contains:

modulation_profile/: Light modulation waveform files (sinusoidal, square)
all_jt_curves/: Complete set of Johnson-Taylor (JT) current-time curve files
jt_features_out/: Extracted JT cycle features (peak values, rise/fall times, etc.)
fft_features_out/: FFT frequency analysis features (fundamental frequency, harmonics, THD, etc.)
figures/: Pre-generated analysis plots and figures
graspi_features.csv: Morphology features extracted from simulation images

Uploading Data to Hugging Face (for maintainers)

To upload the data folder to Hugging Face Hub for sharing and version control:

Prerequisites

Create a Hugging Face account at huggingface.co
Create a dataset repository on Hugging Face Hub:
- Go to your profile → "New dataset"
- Choose a name and visibility (public/private)
- The repository type should be "dataset"
Authenticate using one of the methods described in the Downloading section above

Uploading the Data

# Upload data to Hugging Face
python upload_data_to_hf.py --repo-id <username/dataset-name>

# Example:
# python upload_data_to_hf.py --repo-id your-username/light-modulation-data

# Additional options:
# --data-dir: Specify source data directory (default: 'data')
# --output-archive: Specify output archive path (default: 'data.tar.gz' in git root)
# --skip-archive: Use existing archive file instead of creating new one
# --keep-archive: Keep the archive file after upload
# --token: Hugging Face token (or set HF_TOKEN env var)

The script will:

Create a compressed tar.gz archive of the data/ folder
Upload the archive to the specified Hugging Face dataset repository
Create the repository if it doesn't exist (requires appropriate permissions)

Note: The repository will be created automatically if it doesn't exist. Make sure you have the necessary permissions on your Hugging Face account.

Best Practices

Version control: Each upload creates a new version. Use descriptive commit messages when creating the repository.
File size: Large datasets (>5GB) may require Git LFS. Hugging Face Hub handles this automatically for large files.
Privacy: Use private repositories for sensitive data. Public repositories are accessible to everyone.
Documentation: Add a README.md in your Hugging Face dataset repository to document the data structure and usage.

Paper Figures Scripts (`paper_figures_scripts/`)

Scripts for generating publication-quality figures and analysis:

ablation_study_and_parity_plot.py: Generates ablation study plots showing feature importance and parity plots comparing model predictions vs. actual values for various target variables (interfacial area, domain size, connectivity, donor fraction).
jt_fft_single_plot.py: Creates annotated JT (current-time) cycle plots and FFT spectra plots for both sinusoidal and square light modulation waveforms, showing peak detection, rise/fall times, and harmonic analysis.
shap_feature_ranking.py: Performs SHAP (SHapley Additive exPlanations) feature ranking analysis to interpret machine learning model predictions and identify the most important features for each target variable.
config.yaml: Configuration file containing paths, JT analysis parameters, FFT analysis parameters, and plotting settings.
utils.py: Utility functions for data loading, JT cycle analysis, FFT analysis, plotting, and feature extraction.

Pipeline Scripts (`pipeline_scripts/`)

Scripts for running simulations and collecting data:

coreset_selection.py: Selects a representative coreset (subset) of data from the full dataset using PCA dimensionality reduction and distance-based selection methods to minimize redundancy while preserving data distribution.
setup_run.py: Sets up simulation run directories from coreset-selected files, creating the necessary directory structure (checkpoint_creation, sinusoidal, square) and copying required input files for each run.
submit_jobs.sh: Submits simulation jobs to a cluster/scheduler (e.g., SLURM) for batch processing. Supports specifying job ranges and manages job submission limits.
collect_jt_curves.sh: Collects JT curve data files from all simulation run directories and organizes them into a central location for analysis. Can process a specified range of runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Light Modulation Analysis

Setup

Installing Dependencies

Downloading Data from Hugging Face

Authentication

Downloading the Data

What's Included in the Data

Uploading Data to Hugging Face (for maintainers)

Prerequisites

Uploading the Data

Best Practices

Paper Figures Scripts (`paper_figures_scripts/`)

Pipeline Scripts (`pipeline_scripts/`)

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
docs		docs
paper_figures_scripts		paper_figures_scripts
pipeline_scripts		pipeline_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
download_data_from_hf.py		download_data_from_hf.py
requirements.txt		requirements.txt
upload_data_to_hf.py		upload_data_to_hf.py

License

baskargroup/LightModulation

Folders and files

Latest commit

History

Repository files navigation

Light Modulation Analysis

Setup

Installing Dependencies

Downloading Data from Hugging Face

Authentication

Downloading the Data

What's Included in the Data

Uploading Data to Hugging Face (for maintainers)

Prerequisites

Uploading the Data

Best Practices

Paper Figures Scripts (paper_figures_scripts/)

Pipeline Scripts (pipeline_scripts/)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Paper Figures Scripts (`paper_figures_scripts/`)

Pipeline Scripts (`pipeline_scripts/`)

Packages