Skip to content

Add masked diffusion language model#13

Merged
gonzalobenegas merged 1 commit intomainfrom
masked-diffusion
Dec 2, 2025
Merged

Add masked diffusion language model#13
gonzalobenegas merged 1 commit intomainfrom
masked-diffusion

Conversation

@gonzalobenegas
Copy link
Copy Markdown
Member

No description provided.

@gonzalobenegas
Copy link
Copy Markdown
Member Author

@claude please review this PR, be concise

@claude
Copy link
Copy Markdown

claude bot commented Dec 2, 2025

Claude finished @gonzalobenegas's task —— View job


PR Review: Add Masked Diffusion Language Model

This PR cleanly implements a Diffusion Language Model (DLM) alongside the existing MLM/CLM objectives. The implementation follows the project's patterns well.

Summary

  • Adds DLM training objective with variable masking ratio r ~ Uniform(0, 1)
  • New classes: DLM, DLMDataModule, DLMLitModule
  • Two model configs: dlm_transformer_small (128d, 6L) and dlm_transformer_base (768d, 12L)

Code Quality

  • Proper docstrings throughout
  • Type hints present
  • Good code reuse via GeneralMaskedLM base class
  • Consistent PyTorch Lightning patterns

Architecture

  • DLM inherits from GeneralMaskedLM (shares prepare_for_loss with MLM)
  • DLMDataModule implements variable masking in apply_dlm_masking()
  • DLMLitModule uses MaskedLMAdapter (same as MLM, both bidirectional)
  • Proper separation: masking in data module, model logic in components

Key Implementation Details

  • Masking: Per-sequence random ratio (glm_experiments/data/lm_datamodule.py:108-144)
    • Samples r ~ Uniform(0,1) per sequence
    • 100% of selected tokens → [MASK] (no replacement unlike MLM's 80/10/10)
  • Loss: Filters to masked positions only (labels != -100), same as MLM
  • Eval: Uses compute_llr_mlm (correct for bidirectional models)

Minor Issues

  1. Workflow files: Trailing newline removals (.github/workflows/*.yml) - cosmetic formatting change
  2. Import order: Fixed in experiments/training_data/shard_dataset.py (numpy before pandas) ✅
  3. Nosec comments: Added nosec B615 for HF Hub downloads ✅

Small Suggestions (non-blocking)

  • Consider adding a comment in apply_dlm_masking() explaining why no token replacement (vs MLM's 80/10/10)
  • Experiment config has compile: false - consider documenting why (overhead vs benefit tradeoff)

Tests
⚠️ No new tests added. Per CLAUDE.md testing approach, DLM should have tests for:

  • apply_dlm_masking() logic (easy, no GPU needed)
  • Config instantiation (easy, no GPU needed)
  • Skip expensive training tests

Config Validation

  • Uses _target_ for Hydra instantiation
  • Proper config composition with defaults
  • Small model for debugging (100 steps, 10 warmup)
  • Base model follows CS336 defaults (AdamW, beta=[0.9, 0.98])

Overall Assessment
Clean implementation following existing patterns. The main gap is test coverage for the new masking function and configs.

Recommendation: Approve with suggestion to add basic tests for apply_dlm_masking().


@gonzalobenegas gonzalobenegas merged commit 8ccef02 into main Dec 2, 2025
5 of 6 checks passed
@gonzalobenegas gonzalobenegas deleted the masked-diffusion branch December 2, 2025 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant