Skip to content

Commit

Permalink
Update tokenizer: do the safety check before inserting EOL
Browse files Browse the repository at this point in the history
  • Loading branch information
sergei-mironov committed Mar 11, 2020
1 parent e0eaa1e commit 30579e0
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions official/nlp/transformer/utils/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ def encode(self, raw_string, add_eos=False):
for token in tokens:
ret.extend(self._token_to_subtoken_ids(token))
if add_eos:
assert EOS in self.subtoken_list, \
"Can't append 'EOS' because it is not in list of known subtokens."
ret.append(EOS_ID)
return ret

Expand Down

0 comments on commit 30579e0

Please sign in to comment.