From 71dd0ca353ce7a2373177eb1c798cda05db36ff8 Mon Sep 17 00:00:00 2001
From: Sagar Vinodababu <sv2414@columbia.edu>
Date: Thu, 13 Feb 2020 10:32:02 +0530
Subject: [PATCH] Update README.md

---
 README.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 23d71f287..3caf9fe98 100644
--- a/README.md
+++ b/README.md
@@ -360,15 +360,15 @@ An important distinction to make here is that I'm still supplying the ground-tru
 
 Since I'm teacher-forcing during validation, the BLEU score measured above on the resulting captions _does not_ reflect real performance. In fact, the BLEU score is a metric designed for comparing naturally generated captions to ground-truth captions of differing length. Once batched inference is implemented, i.e. no Teacher Forcing, early-stopping with the BLEU score will be truly 'proper'.
 
- With this in mind, I used [`eval.py`](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/master/eval.py) to compute the correct BLEU-4 scores of this model checkpoint on the validation set _without_ Teacher Forcing, at different beam sizes –
+With this in mind, I used [`eval.py`](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/master/eval.py) to compute the correct BLEU-4 scores of this model checkpoint on the validation and test sets _without_ Teacher Forcing, at different beam sizes –
 
-Beam Size | Validation BLEU-4
-:---: | :---:
-1 | 29.98
-3 | 32.95
-5 | 33.17
+Beam Size | Validation BLEU-4 | Test BLEU-4 |
+:---: | :---: | :---: |
+1 | 29.98 | 30.28 |
+3 | 32.95 | 33.06 |
+5 | 33.17 | 33.29 |
 
-This is higher than the result in the paper, and could be because of how our BLEU calculators are parameterized, the fact that I used a ResNet encoder, and actually fine-tuned the encoder – even if just a little.
+The test score is higher than the result in the paper, and could be because of how our BLEU calculators are parameterized, the fact that I used a ResNet encoder, and actually fine-tuned the encoder – even if just a little.
 
 Also, remember – when fine-tuning during Transfer Learning, it's always better to use a learning rate considerably smaller than what was originally used to train the borrowed model. This is because the model is already quite optimized, and we don't want to change anything too quickly. I used `Adam()` for the Encoder as well, but with a learning rate of `1e-4`, which is a tenth of the default value for this optimizer.