Skip to content

baskargroup/LightModulation

Repository files navigation

Light Modulation Analysis

This repository contains scripts for analyzing light modulation simulations and generating figures for publication.

Setup

Installing Dependencies

# Install all dependencies
pip install -r requirements.txt

# Or install manually
pip install huggingface_hub pandas numpy matplotlib scipy scikit-learn plotly tqdm pyyaml shap

Note: The shap package is required for running shap_feature_ranking.py. All other dependencies are required for the core analysis pipeline.

Downloading Data from Hugging Face

The data folder (containing JT curves, features, and analysis outputs) is hosted on Hugging Face Hub. This allows for version-controlled, accessible data storage and easy sharing of the dataset.

Authentication

For public repositories, no authentication is required. For private repositories, you need to authenticate:

# Option 1: Use Hugging Face CLI (recommended)
huggingface-cli login

# Option 2: Set environment variable
export HF_TOKEN=your_token_here

# Option 3: Pass token directly to script
python download_data_from_hf.py --repo-id <username/dataset-name> --token your_token_here

Downloading the Data

# Download data from Hugging Face
python download_data_from_hf.py --repo-id <username/dataset-name>

# Example:
# python download_data_from_hf.py --repo-id your-username/light-modulation-data

# Additional options:
# --data-dir: Specify target directory (default: 'data')
# --filename: Specify archive filename (default: 'data.tar.gz')
# --force: Overwrite existing data directory without prompting
# --keep-archive: Keep the downloaded archive file after extraction

The script will:

  1. Download the compressed data archive (data.tar.gz) from Hugging Face Hub
  2. Extract it to the data/ folder in the git root
  3. Verify the extraction and check for expected subdirectories

Note: The data folder should be located at the git root level (data/) for all scripts to work correctly. The download script automatically places it in the correct location.

What's Included in the Data

The downloaded data archive typically contains:

  • modulation_profile/: Light modulation waveform files (sinusoidal, square)
  • all_jt_curves/: Complete set of Johnson-Taylor (JT) current-time curve files
  • jt_features_out/: Extracted JT cycle features (peak values, rise/fall times, etc.)
  • fft_features_out/: FFT frequency analysis features (fundamental frequency, harmonics, THD, etc.)
  • figures/: Pre-generated analysis plots and figures
  • graspi_features.csv: Morphology features extracted from simulation images

Uploading Data to Hugging Face (for maintainers)

To upload the data folder to Hugging Face Hub for sharing and version control:

Prerequisites

  1. Create a Hugging Face account at huggingface.co
  2. Create a dataset repository on Hugging Face Hub:
    • Go to your profile → "New dataset"
    • Choose a name and visibility (public/private)
    • The repository type should be "dataset"
  3. Authenticate using one of the methods described in the Downloading section above

Uploading the Data

# Upload data to Hugging Face
python upload_data_to_hf.py --repo-id <username/dataset-name>

# Example:
# python upload_data_to_hf.py --repo-id your-username/light-modulation-data

# Additional options:
# --data-dir: Specify source data directory (default: 'data')
# --output-archive: Specify output archive path (default: 'data.tar.gz' in git root)
# --skip-archive: Use existing archive file instead of creating new one
# --keep-archive: Keep the archive file after upload
# --token: Hugging Face token (or set HF_TOKEN env var)

The script will:

  1. Create a compressed tar.gz archive of the data/ folder
  2. Upload the archive to the specified Hugging Face dataset repository
  3. Create the repository if it doesn't exist (requires appropriate permissions)

Note: The repository will be created automatically if it doesn't exist. Make sure you have the necessary permissions on your Hugging Face account.

Best Practices

  • Version control: Each upload creates a new version. Use descriptive commit messages when creating the repository.
  • File size: Large datasets (>5GB) may require Git LFS. Hugging Face Hub handles this automatically for large files.
  • Privacy: Use private repositories for sensitive data. Public repositories are accessible to everyone.
  • Documentation: Add a README.md in your Hugging Face dataset repository to document the data structure and usage.

Paper Figures Scripts (paper_figures_scripts/)

Scripts for generating publication-quality figures and analysis:

  • ablation_study_and_parity_plot.py: Generates ablation study plots showing feature importance and parity plots comparing model predictions vs. actual values for various target variables (interfacial area, domain size, connectivity, donor fraction).

  • jt_fft_single_plot.py: Creates annotated JT (current-time) cycle plots and FFT spectra plots for both sinusoidal and square light modulation waveforms, showing peak detection, rise/fall times, and harmonic analysis.

  • shap_feature_ranking.py: Performs SHAP (SHapley Additive exPlanations) feature ranking analysis to interpret machine learning model predictions and identify the most important features for each target variable.

  • config.yaml: Configuration file containing paths, JT analysis parameters, FFT analysis parameters, and plotting settings.

  • utils.py: Utility functions for data loading, JT cycle analysis, FFT analysis, plotting, and feature extraction.

Pipeline Scripts (pipeline_scripts/)

Scripts for running simulations and collecting data:

  • coreset_selection.py: Selects a representative coreset (subset) of data from the full dataset using PCA dimensionality reduction and distance-based selection methods to minimize redundancy while preserving data distribution.

  • setup_run.py: Sets up simulation run directories from coreset-selected files, creating the necessary directory structure (checkpoint_creation, sinusoidal, square) and copying required input files for each run.

  • submit_jobs.sh: Submits simulation jobs to a cluster/scheduler (e.g., SLURM) for batch processing. Supports specifying job ranges and manages job submission limits.

  • collect_jt_curves.sh: Collects JT curve data files from all simulation run directories and organizes them into a central location for analysis. Can process a specified range of runs.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published