Skip to content

Add plants training dataset and eval#10

Merged
gonzalobenegas merged 7 commits intomainfrom
plants
Dec 1, 2025
Merged

Add plants training dataset and eval#10
gonzalobenegas merged 7 commits intomainfrom
plants

Conversation

@gonzalobenegas
Copy link
Copy Markdown
Member

No description provided.

@gonzalobenegas
Copy link
Copy Markdown
Member Author

@claude please review for critical bugs. be concise.

@claude
Copy link
Copy Markdown

claude bot commented Dec 1, 2025

Claude finished @gonzalobenegas's task —— View job


Code Review for Critical Bugs

Critical Issues Found

1. CRITICAL: Race condition in eval metric ordering (lm_lit_module.py:227-236)

At lm_lit_module.py:227-236, the eval dataset index mapping relies on dictionary iteration order:

eval_idx = dataloader_idx - 1

if eval_idx >= len(self._eval_names):
    raise IndexError(...)

eval_name = self._eval_names[eval_idx]

But _eval_names is set from evals.keys() at line 97, which depends on dict insertion order. While Python 3.7+ dicts are ordered, the DataLoader iteration order is determined by the DataModule's val_dataloader() method (lm_datamodule.py:390), which iterates over self.eval_datasets.items().

Problem: If eval_datasets insertion order differs from the evals config order passed to LitModule, indices won't match.

Location:

  • lm_datamodule.py:390 - DataLoader ordering
  • lm_lit_module.py:97 - Metric name ordering
  • lm_lit_module.py:227 - Index mapping

Impact: Wrong metrics logged to wrong eval datasets (e.g., TraitGym AUPRC shown for PlantCAD data).

Fix: Ensure both use the same iteration order, or pass explicit ordering from DataModule to LitModule.


2. Potential type annotation error (lm_datamodule.py:156)

self.eval_datasets: dict[str, any] = {}

Should be Any (capitalized), not any. While Python may accept this in some versions, it's incorrect typing syntax.

Location: lm_datamodule.py:156


Minor Issues

3. Missing validation for metric/label compatibility (lm_lit_module.py:283-284)

At line 283-284, metrics are computed without checking if the metric type matches the label type:

  • Binary classification metrics (AUPRC, AUROC) require binary labels
  • Regression metrics (Pearson, Spearman) work with continuous labels

If config specifies metrics: [auprc] but label_column: AF (continuous allele frequency), sklearn will error.

Suggestion: Add validation in _setup_eval_metrics or catch exceptions with better error messages.


4. Security note acknowledged (evals.py:69, 141)

# nosec B310 and # nosec B615 comments acknowledge urllib and dataset loading security concerns - acceptable for scientific code with trusted sources.


Summary

Critical bug: Eval dataset index mismatch could cause incorrect metric attribution.

Action needed: Verify eval_datasets dict ordering matches evals config ordering, or implement explicit index mapping.


@claude
Copy link
Copy Markdown

claude bot commented Dec 1, 2025

Code Review for PR #10: Add plants training dataset and eval

I've conducted a comprehensive review of this PR following the CLAUDE.md standards. Overall, this is well-structured code that successfully generalizes the evaluation system from a TraitGym-specific implementation to a flexible, registry-based architecture. Here's my detailed feedback:


Strengths

1. Excellent Generalization & Architecture

  • Registry pattern: The EVAL_FILTERS and METRICS/TRANSFORMS registries are clean and extensible
  • Separation of concerns: evals.py handles data loading, metrics.py handles scoring, lm_lit_module.py orchestrates
  • Deletion of dead code: Removed traitgym.py and replaced hardcoded logic with configurable system ✅

2. Strong Test Coverage

  • Real tests, not fakes: Tests use actual HuggingFace datasets and models (no mocks!) ✅
  • Comprehensive coverage: Tests for transforms, metrics, filters, dataset loading, and Lightning module integration
  • Good test structure: Clear test names, proper fixtures, appropriate assertions

3. Good Documentation

  • Comprehensive docstrings with Google-style format
  • Type hints throughout
  • Config examples in YAML comments

4. Configuration Design

  • Clean migration from dict-based to list-based config for evals
  • Sensible defaults (label_column: "label", transform: "identity")
  • Good use of Hydra interpolation (evals: ${data.evals})

🔍 Issues Found

CRITICAL: Test Configuration Inconsistency ⚠️

Location: tests/test_evals.py:166-275

The test test_hydra_instantiation_with_evals expects the old dict-based config format:

assert "traitgym_mendelian_promoter" in dm.hparams.evals  # Line 269
assert dm.hparams.evals["traitgym_mendelian_promoter"]["dataset_name"] == "songlab/TraitGym"

But the codebase now uses list-based config (configs/data/default.yaml and configs/data/gpn_animal_promoter.yaml):

evals:
  - name: traitgym_mendelian_promoter
    dataset_name: songlab/TraitGym

Impact: This test will fail when run with the new config format.

Fix Required: Update lines 266-274 in tests/test_evals.py:

# Check evals config is present (list-based format)
assert dm.hparams.evals is not None
assert isinstance(dm.hparams.evals, list)
assert len(dm.hparams.evals) > 0

first_eval = dm.hparams.evals[0]
assert first_eval["name"] == "traitgym_mendelian_promoter"
assert first_eval["dataset_name"] == "songlab/TraitGym"
assert first_eval["window_size"] == 512

MODERATE: Edge Case in Dataloader Mapping

Location: glm_experiments/models/lm_lit_module.py:231-235

if eval_idx >= len(self._eval_names):
    raise IndexError(
        f"dataloader_idx={dataloader_idx} exceeds configured evals. "
        f"Expected indices 0-{len(self._eval_names)} for {len(self._eval_names)} eval(s)."
    )

Issue: The error message says "Expected indices 0-{len}" but should clarify that 0 is LM validation, 1+ are evals.

Suggested Fix:

raise IndexError(
    f"dataloader_idx={dataloader_idx} exceeds configured evals. "
    f"Expected: 0 (LM val) or 1-{len(self._eval_names)} (evals). "
    f"Configured {len(self._eval_names)} eval dataset(s)."
)

MINOR: Missing Default in Config Documentation

Location: configs/data/default.yaml:27-39

Suggestion: Add data_dir default to comment for completeness:

#     genome_url: https://ftp.ensembl.org/...
#     data_dir: data # Directory for genome downloads (default: "data")
#     filter_name: traitgym_promoter # Filter from EVAL_FILTERS registry (default: "none")

📋 CLAUDE.md Compliance Checklist

Issue Compliance: No issue linked (feature PR)
Code Formatting: Follows Black/isort conventions
Documentation: Comprehensive docstrings with type hints
Clean Code: Simple, DRY, no unnecessary complexity
Antipattern Check: No over-engineering, no premature abstractions
Design Compliance: Proper Hydra instantiation, PyTorch Lightning patterns
Test Implementation: Real tests (no fakes!) ✓
⚠️ Test Suite: One test needs fixing for new config format


🎯 Recommendations

Required Before Merge:

  1. Fix test_hydra_instantiation_with_evals to use list-based config format

Nice to Have:

  1. Improve error message clarity in lm_lit_module.py:231-235
  2. Add data_dir default comment in configs/data/default.yaml

Future Considerations:

  1. Consider adding integration test that runs load_eval_dataset end-to-end with a small synthetic genome (currently tests only check component functions separately)
  2. The cache_dir for evals defaults to data/evals_cache but is not configurable via YAML - consider adding to config if users need different cache locations

📊 Summary

Overall Assessment: ⭐⭐⭐⭐ (4/5 stars)

This PR successfully generalizes the evaluation system and adds plants dataset support. The architecture is clean, tests are real (not faked!), and documentation is strong. The one critical issue is a test expecting old config format, which must be fixed before merge.

Merge Recommendation: ✅ Approve after fixing test_hydra_instantiation_with_evals

Great work on this refactoring! The registry-based design is much more maintainable than the previous hardcoded approach.

@gonzalobenegas gonzalobenegas merged commit 331add1 into main Dec 1, 2025
6 checks passed
@gonzalobenegas gonzalobenegas deleted the plants branch December 1, 2025 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant