Skip to content

Latest commit

 

History

History
35 lines (18 loc) · 1.76 KB

README.md

File metadata and controls

35 lines (18 loc) · 1.76 KB

ParaLS: Paraphraser-based Lexical Substitution

Lexical substitution (LS) aims at finding appropriate substitutes for a target word in a sentence. Recently, LS methods based on pretrained language models have made remarkable progress, generating potential substitutes for a target word through analysis of its contextual surroundings. However, these methods tend to overlook the preservation of the sentence's meaning when generating the substitutes. This study explores how to generate the substitute candidates from a paraphraser, as the generated paraphrases from a paraphraser contain variations in word choice and preserve the sentence's meaning. Since we cannot directly generate the substitutes via commonly used decoding strategies, we propose two simple decoding strategies that focus on the variations of the target word during decoding. Experimental results show that our methods outperform state-of-the-art LS methods based on pre-trained language models on three benchmarks.

Requirements and Installation

  • Our code is based on Fairseq version=10.2
  • PyTorch version = 1.7.1
  • Python version >= 3.7
  • For training new models, you'll also need an NVIDIA GPU and NCCL

Step 1: Downlaod the pretrained paraphraser modeling

You need to download the paraphraser from here, and put it into folder "checkpoints/⁨para⁩/transformer/⁩"

Step 2: Run our code

(1) run ParaLS for lexical substitute dataset LS07

input "run_LS_Paraphraser.multi.ls07.sh"

(2)run ParaLS for lexical substitute dataset LS14

input "run_LS_Paraphraser.multi.ls14.sh"

Citation

Please cite as: