Thanks for releasing this model. I hope the authors can provide more information regarding the following questions:
- Was this model trained in the same way as the original BERT paper, i.e. masked LM and NSP?
- What was the format of input sequences used in training? Were they complete sentences (e.g. couplets)?
- What is the meaning of token
* and # in the vocabulary?