Replies: 3 comments 2 replies
-
@maximsch2 I like the idea. The only thing I'm a bit afraid of is that if we don't check on each update, we will silently calculate wrong values. therefore I like having the method explicitly, but I'd probably do it on an opt-out basis rather than opt-in. (i.e. having a flag for that that defaults to true and can be set to false). What do you think? @SkafteNicki @Borda thoughts? |
Beta Was this translation helpful? Give feedback.
-
I would argue that the first step would be trying to lower the computational time of our implementations (they may not be optimal)
I think it is important that we remember that torchmetrics are not intended to be used only with lightning but also with native pytorch. Also the concept of task is more related to flash than lightning right? If this really boils down to our implementations being too slow because we make sure that the user input is correct, I would argue that we should have some kind of flag: import torchmetrics
torchmetrics.performance_mode = True that turns off all checking (meant for users knowing what they are doing). |
Beta Was this translation helpful? Give feedback.
-
Performance optimizations is not the main goal (as that can be addressed by other implementations anyway). Connection between training task and metrics is the key. Assume you are building a framework that allows people to train various tasks of different shape and you want to add metrics configurability in it. For simplicity, you can say that task==LightningModule, but of course this doesn't have to be the case. Now, you need a way to know how to pipe the output from the arbitrary model to a set of metrics. There are two ways:
In Lightning terms: def training_step(self, batch):
loss, outputs = self.model.get_loss_and_outputs()
# outputs is Dict[TaskType, TTaskTypeOutput]
for task_type, output in outputs.items():
for metric_name, metric in self.metrics_collection.items():
if metric.supports(task_type):
self.log(metric_name, metric(*output))
return loss Why do we have a the same model outputting different types? This can happen in various ways:
Right, flash is a more appropriate analogy. |
Beta Was this translation helpful? Give feedback.
-
🚀 Feature
Let's have a formal system of task types. Things like
BinaryClassificationTask
,MultiClassClassificationTask
,MultilabelClassificationTask
, etc.Motivation
Pitch
Add a type hierarchy of possible task types. Each task is defined by type signature of the (predictions, labels) tuple and semantics inside it (e.g. multticlass and multilabel have same shape, but different semantics).
Then, each metric takes a task_type and can assume that predictions/labels conform to it. If we want to add checking at the run_time, each type can provide a class method (e.g.
BinaryClassificationTask.validate_input
) that can be enabled for checking on opt-in basis.Alternatives
Beta Was this translation helpful? Give feedback.
All reactions