Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics #31

Open
jopetty opened this issue Nov 24, 2020 · 4 comments
Open

Add metrics #31

jopetty opened this issue Nov 24, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request v2 Version 2 (with Hydra)
Milestone

Comments

@jopetty
Copy link
Member

jopetty commented Nov 24, 2020

Training, evaluation, and testing all require a robust internal metrics framework. This needs to have several components:

  • A Metric superclass which represents an abstract measurement of model performance on a particular set of data. This should define a template which takes in an (input, target) pair and returns some numerical representation of how accurate the model is.
  • A collection of specific metrics which inherit the Metric superclass and provide specific implementations of model accuracy. Examples might be FullSequenceAccuracy, TokenAccuracy, ClauseAccuracy, and so on.
  • A logging framework which handles (1) the computation of these metrics when called (2) the saving of these logs to disk and (3) the reporting of these values to other parts of the code to handle things like early stopping, TQDM post-fixing, etc.
@jopetty jopetty self-assigned this Nov 24, 2020
@jopetty jopetty added v2 Version 2 (with Hydra) enhancement New feature or request labels Nov 24, 2020
@jopetty
Copy link
Member Author

jopetty commented Jan 1, 2021

This is partially done. The basic infrastructure is there, but it seems like metrics are being calculated incorrectly on some iterators. For example, consider this training output:

[2020-12-31 18:10:10,907][core.trainer][INFO] - EPOCH 100 / 100
[2020-12-31 18:10:10,907][core.trainer][INFO] - Computing metrics for 'train' dataset
100%|███████████████████████████████████████████████████████| 821/821 [00:16<00:00, 49.63it/s, trn_loss=0.855]
[2020-12-31 18:10:27,451][core.metrics.meter][INFO] - TokenAccuracy:	0.761
Average Loss:	0.867
[2020-12-31 18:10:27,612][core.trainer][INFO] - Computing metrics for 'val' dataset
100%|████████████████████████████████████████████████████████████████████████| 99/99 [00:00<00:00, 183.42it/s]
[2020-12-31 18:10:28,153][core.metrics.meter][INFO] - TokenAccuracy:	0.755
Average Loss:	0.873
[2020-12-31 18:10:28,153][core.trainer][INFO] - Computing metrics for 'test' dataset
100%|██████████████████████████████████████████████████████████████████████| 104/104 [00:00<00:00, 181.37it/s]
[2020-12-31 18:10:28,727][core.metrics.meter][INFO] - TokenAccuracy:	0.000
Average Loss:	0.000
[2020-12-31 18:10:28,727][core.trainer][INFO] - Computing metrics for 'gen' dataset
100%|██████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 208.29it/s]
[2020-12-31 18:10:28,737][core.metrics.meter][INFO] - TokenAccuracy:	0.000
Average Loss:	0.000
[2020-12-31 18:10:28,737][core.trainer][INFO] - Computing metrics for 'alice' dataset
100%|████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 186.79it/s]
[2020-12-31 18:10:28,952][core.metrics.meter][INFO] - TokenAccuracy:	0.000
Average Loss:	0.000

The alice, gen, and test sets all score 0.000 on both token-level accuracy and loss. The loss is clearly wrong, and the output of the model on the alice set shows that while it is still not scoring well, it should be getting credit for the transitive verbs and (, ,, and ) tokens:

source	target	prediction
alice sees grace	see ( alice , grace )	see ( oswald , grace )
alice knows zelda	know ( alice , zelda )	know ( bob , grace )
alice sees zelda	see ( alice , zelda )	see ( winnifred , grace )
alice notices henry	notice ( alice , henry )	notice ( oswald , grace )
alice likes daniel	like ( alice , daniel )	like ( samuel , grace )

@jopetty
Copy link
Member Author

jopetty commented Jan 1, 2021

Another thing which needs to be done: the metrics should be configurable in the yaml conf files; right now they are hard-coded into the trainer.

@jopetty
Copy link
Member Author

jopetty commented Jan 1, 2021

Okay, the first issue (of metrics being incorrectly calculated on non train/val sets) has been solved: I just forgot to compute them 🙄 fixed in 2dccc75.

@jopetty jopetty added this to the Stable 1.0 milestone Jan 4, 2021
@jopetty
Copy link
Member Author

jopetty commented Jan 4, 2021

Okay, the main issue is solved. Still need to make the metrics configurable in the YAML files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v2 Version 2 (with Hydra)
Projects
None yet
Development

No branches or pull requests

1 participant