add support for Qwen #129

cjw-d · 2024-06-03T05:33:40Z

convert

python3 scripts/convert_qwen_from_huggingface_to_tencentpretrain.py --input_model_path $Qwen_1_8B_FOLDER --output_model_path models/qwen-1_8b.bin --layers_num 24

test

python3 scripts/generate_lm.py --load_model_path models/qwen-1_8b.bin
--tokenizer qwen --vocab_path $Qwen_1_8B_FOLDER
--test_path beginning.txt --prediction_path generated_sentence.txt
--config_path models/qwen/1_8b_config.json

ydli-ai · 2024-06-03T12:06:49Z

tencentpretrain/layers/multi_headed_attn.py

        if freqs_cis is not None:
-            query, key = apply_rotary_emb(query.transpose(1,2), key.transpose(1,2), freqs_cis=freqs_cis)
+            if use_dynamic_ntk:


这里建议可以封装一下

ydli-ai · 2024-06-03T12:12:43Z

tencentpretrain/layers/transformer.py

@@ -40,8 +49,8 @@ def __init__(self, args, layer_number=None):
            lora_params = args.lora_params

        self.self_attn = MultiHeadedAttention(
-            args.hidden_size, args.heads_num, attention_head_size, local_kv_heads_num, args.dropout, has_bias=has_bias,
-            with_scale=with_scale, lora_params=lora_params, layer_number=layer_number
+            args.hidden_size, args.heads_num, attention_head_size, local_kv_heads_num, args.dropout, self.max_seq_length, has_bias=has_bias, has_attention_bias = has_attention_bias,


之前has_bias包含了attention_bias，这里重命名后是否有考虑对之前的兼容性？比如T5模型

之前has_bias包含了attention_bias，这里重命名后是否有考虑对之前的兼容性？比如T5模型

当创建q,k,v的linear_layers时，如果没有传入attention_bias，则会使用has_bias的值，应该是对之前的模型兼容。

ydli-ai · 2024-06-03T12:14:01Z

tencentpretrain/layers/transformer.py

@@ -16,6 +17,13 @@ def __init__(self, args, layer_number=None):
        self.relative_position_embedding = args.relative_position_embedding
        self.rotary_position_embedding = args.rotary_position_embedding
        self.has_residual_attention = args.has_residual_attention
+        self.use_logn_attn = args.use_logn_attn
+        self.max_seq_length = args.max_seq_length
+        self.use_dynamic_ntk = args.use_dynamic_ntk


训练不需要ntk，只有推理需要，如果只考虑训练的话这里是否有可能简化？

wmpscc · 2024-06-21T06:26:00Z

scripts/convert_qwen_from_huggingface_to_tencentpretrain.py

建议同时提供互相转换脚本
convert_qwen_from_tencentpretrain_to_huggingface.py

cjw-d added 14 commits April 12, 2024 12:23

add logN-scaling

a08a34e

add dynamic ntk

7f22312

add qwen config

7240c2d

add qwen conversion script

da9d7b4

modify conversion script

24e00f0

remove old conversion script

af5f1ce

add QwenTokenizer

e261047

modify logN-scaling

97cdb6a

fix bug

ede088f

modify logN-scaling

98508ac

modify QwenTokenizer

a80884e

Merge branch 'main' of https://github.com/Tencent/TencentPretrain

121eb3a

add qwen_special_tokens_map

6b4eaff

add qwen config

68c4ad5

ydli-ai reviewed Jun 4, 2024

View reviewed changes

refactor

c2527db

wmpscc reviewed Jun 21, 2024

View reviewed changes

cjw-d added 2 commits August 4, 2024 19:46

refact rope

d999642

fix file format

7258fa2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for Qwen #129

add support for Qwen #129

cjw-d commented Jun 3, 2024

ydli-ai Jun 3, 2024

ydli-ai Jun 3, 2024

cjw-d Jun 8, 2024

ydli-ai Jun 3, 2024

wmpscc Jun 21, 2024

add support for Qwen #129

Are you sure you want to change the base?

add support for Qwen #129

Conversation

cjw-d commented Jun 3, 2024

convert

test

ydli-ai Jun 3, 2024

Choose a reason for hiding this comment

ydli-ai Jun 3, 2024

Choose a reason for hiding this comment

cjw-d Jun 8, 2024

Choose a reason for hiding this comment

ydli-ai Jun 3, 2024

Choose a reason for hiding this comment

wmpscc Jun 21, 2024

Choose a reason for hiding this comment