Skip to content

[修改建议] 章节5.3:SFTDataset->generate_loss_mask函数 #119

@chengyuZou

Description

@chengyuZou

1. 遇到问题的章节 / Affected Chapter

Chapter5.3 ->SFTDataset->generate_loss_mask函数

2. 具体问题描述 / Problem Description

代码这样改是否更简洁?
def init() 添加:
self.bos_token_id = self.tokenizer('<|im_start|>assistant' , add_special_tokens = False)['input_ids']
self.eos_token_id = self.tokenizer('<|im_end|>' , add_special_tokens=False)['input_ids']

Image

以及,是否能用KMP算法对时间进行优化?O(n^2) -> O(n * log n)

3. 问题重现材料 / Reproduction Materials

确认事项 / Verification

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions