Skip to content

Conversation

@buntingj-vt
Copy link

Successfully integrated MLflow experiment tracking into the Flux fine-tuner project, providing an alternative/complementary tracking solution to Weights & Biases.

Files Created:

  • mlflow_client.py: New module handling MLflow tracking operations
    • MLflowClient class for experiment and run management
    • Logs training parameters, metrics, and artifacts
    • Error handling to prevent training interruption
  • MLFLOW_INTEGRATION.md: Comprehensive documentation covering setup, usage, features, and troubleshooting

Files Modified:

  • train.py:
    • Added MLflowClient import
    • Updated CustomSDTrainer to support MLflow logging in:
      • hook_train_loop(): Loss logging
      • sample(): Sample image logging
      • post_save_hook(): Weight saving
    • Updated CustomJob to accept mlflow_client parameter
    • Added three new input parameters:
      • mlflow_tracking_uri: MLflow server URI
      • mlflow_experiment_name: Experiment name (default: flux-lora-training)
      • mlflow_run_name: Optional run name
    • Created unified tracking_config dict for both W&B and MLflow
    • Added MLflow client initialization and cleanup
  • cog.yaml: Added mlflow==2.18.0 dependency
  • README.md: Added MLflow integration to features list

Integration Architecture:

  • Non-invasive: Works alongside existing W&B integration
  • Optional: Only activated when mlflow_tracking_uri is provided
  • Comprehensive tracking: hyperparameters, loss, samples, weights
  • All logging wrapped in try-except for robustness

Key Features:

  • Track all training hyperparameters
  • Log training loss at each step
  • Save sample images during training
  • Archive final LoRA weights as artifacts
  • Compatible with concurrent W&B usage

Usage Example:
train( input_images=Path("images.zip"), mlflow_tracking_uri="http://localhost:5000", mlflow_experiment_name="flux-lora-experiment", mlflow_run_name="baseline-run", ... )

buntingj-vt and others added 2 commits November 21, 2025 01:43
Successfully integrated MLflow experiment tracking into the Flux fine-tuner
project, providing an alternative/complementary tracking solution to Weights
& Biases.

Files Created:
- mlflow_client.py: New module handling MLflow tracking operations
  - MLflowClient class for experiment and run management
  - Logs training parameters, metrics, and artifacts
  - Error handling to prevent training interruption
- MLFLOW_INTEGRATION.md: Comprehensive documentation covering setup,
  usage, features, and troubleshooting

Files Modified:
- train.py:
  - Added MLflowClient import
  - Updated CustomSDTrainer to support MLflow logging in:
    - hook_train_loop(): Loss logging
    - sample(): Sample image logging
    - post_save_hook(): Weight saving
  - Updated CustomJob to accept mlflow_client parameter
  - Added three new input parameters:
    - mlflow_tracking_uri: MLflow server URI
    - mlflow_experiment_name: Experiment name (default: flux-lora-training)
    - mlflow_run_name: Optional run name
  - Created unified tracking_config dict for both W&B and MLflow
  - Added MLflow client initialization and cleanup
- cog.yaml: Added mlflow==2.18.0 dependency
- README.md: Added MLflow integration to features list

Integration Architecture:
- Non-invasive: Works alongside existing W&B integration
- Optional: Only activated when mlflow_tracking_uri is provided
- Comprehensive tracking: hyperparameters, loss, samples, weights
- All logging wrapped in try-except for robustness

Key Features:
- Track all training hyperparameters
- Log training loss at each step
- Save sample images during training
- Archive final LoRA weights as artifacts
- Compatible with concurrent W&B usage

Usage Example:
  train(
    input_images=Path("images.zip"),
    mlflow_tracking_uri="http://localhost:5000",
    mlflow_experiment_name="flux-lora-experiment",
    mlflow_run_name="baseline-run",
    ...
  )
Remove unused artifact_name variable to fix ruff check failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants