MLFlow issue: Every single run is marked as FINISHED never FAILED #20827
Closed
Unanswered
Saya47
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
Figured out my code had a bug, sorry if I took anyone's time, I've struggled too long with this until I posted this then after 12 hours realized the bug. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello good day to everybody.
I track my experiments using MLFLow. My issue is that even if my code has bugs and raises during run, Lightning marks all my runs as FINISHED in MLFLow. Below I asked an LLM to demonstrate this:
The run is marked with status 3 (FINISHED) on exceptions which is really bad.
I use the status to filter out bad runs because I use MLFLow to aggregate metrics/losses across epochs/runs and resume checkpoints.
Beta Was this translation helpful? Give feedback.
All reactions