The official repository of the ICLR 2023 conference paper, "A Non-monotonic Self-terminating Language Model".
Authors: Eugene Choi, Kyunghyun Cho, Cheolhyoung Lee
[ArXiv] [Openreview]
The repository is organized as follows.
.
├── eos.png
├── LICENSE
├── README.md
├── requirements.txt
└── src
    ├── gpt2
    │   ├── data.py
    │   ├── eval.py
    │   ├── metrics.py
    │   ├── nmst.py
    │   ├── preprocess.py
    │   ├── st.py
    │   ├── train.py
    │   └── utils.py
    └── wiki2
        ├── data.py
        ├── decoding_utils.py
        ├── evaluate.py
        ├── model_utils.py
        ├── train.py
        └── utils.py
The code was written in Python 3.9.12 and all experiments in the paper were conducted using a single NVIDIA Quadro RTX 8000. Please set the environment variable and install dependencies, following commands below. This will ensure a seamless execution of the training and evaluation scripts:
pip install -r requirements.txt
export NMST_DIR=/path/to/non-monotonic-self-terminating-lm
Make sure that you are in the following directory.
cd ${NMST_DIR}/src/wiki2
Note: The default argparse configuration for train.py in ${NMST_DIR}/src/wiki2 is based on the hyperparameters used in the NMST+LSTM (1e-4) experiment. The epsilon value for NMST+ and ST+ can be set to any value in (0,1).
Please refer to the following example commands comparing NMST/ST/VA parameterizations with an epsilon value of 1e-5 as a guide for running the WikiText-2 experiments.
LSTM:
- VA+LSTM
python train.py --loss mle --rnn-type nn.LSTM --dropout 0.5 --embedding-dim 512 --num-layers 2 --hidden-size 512 --rnn-dropout 0.0 --batch-size 32 --expr-name lstm_lm
- ST+LSTM (1e-5)
python train.py --loss st --epsilon 1e-5 --rnn-type nn.LSTM --dropout 0.5 --embedding-dim 512 --num-layers 2 --hidden-size 512 --rnn-dropout 0.0 --batch-size 32 --expr-name lstm_st-1e-5
- NMST+LSTM (1e-5)
python train.py --loss nmst --epsilon 1e-5 --rnn-type nn.LSTM --dropout 0.5 --embedding-dim 512 --num-layers 2 --hidden-size 512 --rnn-dropout 0.0 --batch-size 32 --expr-name lstm_nmst-1e-5
- VA+RNN
python train.py --loss mle --rnn-type nn.RNN --dropout 0.3 --embedding-dim 256 --num-layers 2 --hidden-size 256 --rnn-dropout 0.0 --batch-size 32 --expr-name rnn_lm
- ST+RNN (1e-5)
python train.py --loss st --epsilon 1e-5 --rnn-type nn.RNN --dropout 0.3 --embedding-dim 256 --num-layers 2 --hidden-size 256 --rnn-dropout 0.0 --batch-size 32 --expr-name rnn_st-1e-5
- NMST+RNN (1e-5)
python train.py --loss nmst --epsilon 1e-5 --rnn-type nn.RNN --dropout 0.3 --embedding-dim 256 --num-layers 2 --hidden-size 256 --rnn-dropout 0.0 --batch-size 32 --expr-name rnn_nmst-1e-5
python evaluate.py --model-load-dir ${NMST_DIR}/checkpoint/wiki2/MODEL_DIR --DECODING_METHOD 1
The evaluate.py scipt supports multiple decoding methods, including greedy decoding, ancestral sampling, top-k sampling, nucleus sampling, beam search, as well as consistent top-k and consistent nucleus sampling, as proposed in the paper "Consistency of a Recurrent Language Model with Respect to Incomplete Decoding." Please make sure to choose the appropriate decoding hyperparameters (such as k in top-k sampling or beam size for beam search) before running the inference.
First, prepare the dataset by running the following commands:
cd ${NMST_DIR}/src/gpt2
python preprocess.py
Once completed, finetune GPT-2 (124M) using train.py script:
- VA+GPT2
python train.py --loss mle --expr-name lm --model-name gpt2 --bucketing 1 --eval 1 --decode 1
- ST+GPT2 (1e-5)
python train.py --loss st --epsilon 1e-5 --expr-name st_1e-5 --model-name gpt2 --bucketing 1 --eval 1 --decode 1
- NMST+GPT2 (1e-5)
python train.py --loss nmst --epsilon 1e-5 --expr-name nmst_1e-5 --model-name gpt2 --bucketing 1 --eval 1 --decode 1
python eval.py --model-load-dir ${NMST_DIR}/checkpoint/gpt2/MODEL_DIR --DECODING_METHOD 1 --expr-name EVAL_RUN_NAME
The eval.py scipt supports greedy decoding, ancestral sampling, top-k sampling, nucleus sampling, and beam search. Please make sure to choose the appropriate decoding hyperparameters (such as k in top-k sampling or beam size for beam search) before running the inference.
The Weights and Biases logs for all experimental results can be accessed through the links provided below:
https://wandb.ai/eugenechoi/nmst-rnn/workspace?workspace=user-eugenechoi
(Note: You can filter the logs to view only the RNN or LSTM experiments by selecting rnn_type from the filter dropdown menu.)
https://wandb.ai/eugenechoi/nmst-gpt2/workspace?workspace=user-eugenechoi
- 
Perplexity & Greedy: https://wandb.ai/eugenechoi/nmst-gpt2-greedy/workspace?workspace=user-eugenechoi 
- 
Nucleus: https://wandb.ai/eugenechoi/nmst-gpt2-topp/workspace?workspace=user-eugenechoi 
- 
Top-k: https://wandb.ai/eugenechoi/nmst-gpt2-topk/workspace?workspace=user-eugenechoi 
- 
Beam Search: https://wandb.ai/eugenechoi/nmst-gpt2-beam/workspace?workspace=user-eugenechoi 
Please use the following bib to cite our work:
@inproceedings{
choi2023a,
title={A Non-monotonic Self-terminating Language Model},
author={Eugene Choi and Kyunghyun Cho and Cheolhyoung Lee},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=vw-5EgYbJZr}
}
