Description
System Info
CPU x86_64
GPU NVIDIA L40
TensorRT branch: v0.10.0
CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.4
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I have a model based on deepseek_coder_6.7b, and add some special tokens, such as <filename>
, <reponame>
and so on for better performance.
I have some requests, and they are executed on trt
, vLLM
and transformers.generate
respectively.
The resluts of vLLM
and transformers.generate
are very good, but the result of trt
is a badcase, which is pretty werid.
Here are the commands of trt:
python /data/tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /data/deepseek-6.7b/ \
--output_dir /data/trt-v10-deepseek-6.7b-tp2-bs8 \
--dtype float16 \
--tp_size 2 \
--workers 2
trtllm-build --checkpoint_dir /data/trt-v10-deepseek-6.7b-tp2-bs8 \
--output_dir /data/trt-v10-engines-deepseek-6.7b-bs8/2-gpu/ \
--gemm_plugin float16 \
--paged_kv_cache enable \
--max_input_len 8192 \
--max_output_len 1024 \
--gpt_attention_plugin float16 \
--max_batch_size 8
Here is the one of the requests:
curl -X POST localhost:8820/v2/models/ensemble/generate_stream -d '{"text_input": "\u003creponame\u003eprogramming-language-demo\n\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// }\n// func isPrime(n int) bool {\n// \tif n \u003c 2 {\n// \t\treturn false\n// \t} else {\n// \t\tfor i := 2; i \u003c= n/2; i++ {\n// \t\t\tif n%i == 0 {\n// \t\t\t\treturn false\n// \t\t\t}\n// \t\t}\n// \t}\n// \treturn true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n// \t\"fmt\"\n// \t\"os\"\n// \t\"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n// \tif n \u003c 2 {\n// \t\treturn false\n// \t} else {\n// \t\tfor i := 2; i \u003c= n/2; i++ {\n// \t\t\tif n%i == 0 {\n// \t\t\t\treturn false\n// \t\t\t}\n// \t\t}\n// \t}\n// \treturn true\n// }\n// \n// func exitWithError() {\n// \tfmt.Println(\"Usage: please input a non-negative integer\")\n// \tos.Exit(1)\n// }\n// \n// func main() {\n// \tif len(os.Args) != 2 {\n// \t\texitWithError()\n// \t}\n// \n// \tn, err := strconv.Atoi(os.Args[1])\n// \tif err != nil || n \u003c 0 {\n// \t\texitWithError()\n// \t}\n// \n// \tif isPrime(n) {\n// \t\tfmt.Println(\"Prime\")\n// \t} else {\n// \t fmt.Println(\"Composite\")\n// \t}\n// }\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError() {\n// \tfmt.Println(\"Usage: please input a non-negative integer\")\n// \tos.Exit(1)\n// }\n// func main() {\n// \tif len(os.Args) != 2 {\n// \t\texitWithError()\n// \t}\n// \n// \tn, err := strconv.Atoi(os.Args[1])\n// \tif err != nil || n \u003c 0 {\n// \t\texitWithError()\n// \t}\n// \n// \tif isPrime(n) {\n// \t\tfmt.Println(\"Prime\")\n// \t} else {\n// \t fmt.Println(\"Composite\")\n// \t}\n// }\n// func isPrime(n int) bool {\n// \tif n \u003c 2 {\n// \t\treturn false\n// \t} else {\n// \t\tfor i := 2; i \u003c= n/2; i++ {\n// \t\t\tif n%i == 0 {\n// \t\t\t\treturn false\n// \t\t\t}\n// \t\t}\n// \t}\n// \treturn true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n// \t\"fmt\"\n// \t\"os\"\n// \t\"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n// \tif n \u003c 2 {\n// \t\treturn false\n// \t} else {\n// \t\tfor i := 2; i \u003c= n/2; i++ {\n// \t\t\tif n%i == 0 {\n// \t\t\t\treturn false\n// \t\t\t}\n// \t\t}\n// \t}\n// \treturn true\n// }\n// \n// func exitWithError() {\u003cneighbor\u003e\u003cfilename\u003elongest-word.go\u003ccodeblock\u003e// Variables from import file go/longest-word.go can be referenced:\n// errorMessage = \"Usage: please provide a string\"\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int {\n// \twords := strings.FieldsFunc(str, isLimitedWhitespace)\n// \treturn longestStringLength(words)\n// }\n// func isLimitedWhitespace(r rune) bool {\n// \treturn strings.ContainsRune(\" \\t\\n\\r\", r)\n// }\n// func longestStringLength(strs []string) (longest int) {\n// \tfor _, str := range strs {\n// \t\tif len(str) \u003e longest {\n// \t\t\tlongest = len(str)\n// \t\t}\n// \t}\n// \treturn\n// }\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int\n// func isLimitedWhitespace(r rune) bool\n// func longestStringLength(strs []string) (longest int)\u003cneighbor\u003e\u003cfilename\u003efactorial.go\u003ccodeblock\u003e// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string) {\n// \tfmt.Println(msg)\n// \tos.Exit(1)\n// }\n// func factorial(n uint64) uint64 {\n// \tif n \u003c= 0 {\n// \t\treturn 1\n// \t}\n// \treturn n * factorial(n-1)\n// }\n// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string)\n// func factorial(n uint64) uint64\u003cfilename\u003elongest-common-subsequence.go\n\u003ccodecontent\u003epackage main\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n\t\"regexp\"\n\t\"strconv\"\n\t\"strings\"\n)\n//exitWithError\n, ", "max_tokens": 50, "bad_words": "", "stop_words": "", "stream": false, "temperature": 0.2, "top_p": 0.95, "return_log_probs": true, "generation_logits": true}'
Expected behavior
The expected result is:
func exitWithError(msg string) {
fmt.Println(msg)
os.Exit(1)
}
In fact, vLLM
and transformers.generate
are all the results as above.
actual behavior
The trt result is:
data: {"context_logits":0.0,"cum_log_probs":-1.76106858253479,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.0000066757424974639438,-0.10143566876649857,-0.1650305688381195,-0.00022062112111598253,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-0.0000067949526965094268,-0.0000010728841743912199,-9.536747711536009e-7,-9.536747711536009e-7,-0.00007355483830906451,-9.536747711536009e-7,-0.0000020265599687263604,-0.0000010728841743912199,-9.536747711536009e-7,-0.000012636264727916569,-9.536747711536009e-7,-9.536747711536009e-7,-0.0000010728841743912199,-0.0001179049359052442,-9.536747711536009e-7,-0.0005595461116172373,-0.0000011920935776288389,-9.536747711536009e-7,-0.000048638572479831058,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-0.17364458739757539,-9.536747711536009e-7,-0.0004099851648788899,-9.536747711536009e-7,-0.000002861027041944908,-0.0005539401317946613,-0.0008925008587539196,-9.536747711536009e-7,-9.536747711536009e-7,-0.000003933914285880746,-0.0258316770195961,-9.536747711536009e-7,-0.022926615551114084,-9.536747711536009e-7,-9.536747711536009e-7,-0.000002145769485650817,-1.0269726514816285,-0.24228566884994508],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"\n//isLimitedWhitespace\n, \n//longestStringLength\n, \n//longestWordLength\n, \n//longestCommonSubsequence\n, \n//main\n, \n//parseArgs"}
And the text_output part is:
//isLimitedWhitespace
//longestStringLength
//longestWordLength
//longestCommonSubsequence
//main
//parseArgs
However, If I only use the last part from the request, the result is also normal.
Here is the request:
curl -X POST localhost:8820/v2/models/ensemble/generate_stream -d '{"text_input": "\u003ccodecontent\u003epackage main\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n\t\"regexp\"\n\t\"strconv\"\n\t\"strings\"\n)\n//exitWithError\n, ", "max_tokens": 50, "bad_words": "", "stop_words": "", "stream": false, "temperature": 0.2, "top_p": 0.95, "return_log_probs": true, "generation_logits": true}'
And here is the result:
data: {"context_logits":0.0,"cum_log_probs":-2.383721351623535,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.000052334245992824438,-0.3010028600692749,-0.0016516967443749309,-9.536747711536009e-7,-0.0000010728841743912199,-9.536747711536009e-7,-0.0057563441805541519,-0.0000027418175250204514,-0.000046373490476980808,-0.0000019073504518019037,-9.536747711536009e-7,-0.008396431803703308,-9.536747711536009e-7,-9.536747711536009e-7,-0.21918922662734986,-0.0002970540663227439,-0.06785676628351212,-9.536747711536009e-7,-0.00040557264583185315,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-0.0000020265599687263604,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7,-0.6207338571548462,-0.15082910656929017,-0.4851605296134949,-0.39718568325042727,-0.0005019970703870058,-0.0011182717280462385,-0.0000017881409348774469,-9.536747711536009e-7,-0.0000010728841743912199,-9.536747711536009e-7,-9.536747711536009e-7,-0.0000015497220147153712,-0.00020500138634815812,-0.12325640767812729,-0.000039816695789340886,-9.536747711536009e-7,-0.0000013113030945532956,-9.536747711536009e-7,-9.536747711536009e-7,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"\nfunc exitWithError(err error) {\n\tfmt.Println(err)\n\tos.Exit(1)\n}\n\n//getEnv\n, \nfunc getEnv(key, fallback string) string {"}
And the text_output part is:
func exitWithError(err error) {
fmt.Println(err)
os.Exit(1)
}
//getEnv
func getEnv(key, fallback string) string {
additional notes
This is so weird.
I have analyzed for a long time, but I still don’t know what is causing it.
Please help me.
Thank you.