Skip to content

guided decoding parameters for tensorrt_llm backend must be present even if not needed #5099

Open
@InCogNiTo124

Description

@InCogNiTo124

System Info

Nvidia rtx 3090 ti
nvcr.io/nvidia/tritonserver:25.05-trtllm-python-py3

Who can help?

@ncomly-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce the behaviour:

  1. take any tensorrt_llm compiled plan
  2. delete from config.pbtxt parameters tokenizer_dir, xgrammar_tokenizer_info_path or guided_decoding_backend

Expected behavior

tensorrtllm_backend should start normally

actual behavior

tensorrtllm_backend crashes with a message that a parameter is missing, although it's not used.

additional notes

std::optional<executor::GuidedDecodingConfig> ModelInstanceState::getGuidedDecodingConfigFromParams()
{
std::optional<executor::GuidedDecodingConfig> guidedDecodingConfig = std::nullopt;
std::string tokenizerDir = model_state_->GetParameter<std::string>("tokenizer_dir");
std::string tokenizerInfoPath = model_state_->GetParameter<std::string>("xgrammar_tokenizer_info_path");
std::string guidedDecodingBackendStr = model_state_->GetParameter<std::string>("guided_decoding_backend");
if (!tokenizerDir.empty() && tokenizerDir != "${tokenizer_dir}")
{
TLLM_LOG_INFO(
"Guided decoding C++ workflow does not use tokenizer_dir, this parameter will "
"be ignored.");
}
if (guidedDecodingBackendStr.empty() || guidedDecodingBackendStr == "${guided_decoding_backend}"
|| tokenizerInfoPath.empty() || tokenizerInfoPath == "${xgrammar_tokenizer_info_path}")
{
return guidedDecodingConfig;
}
TLLM_CHECK_WITH_INFO(std::filesystem::exists(tokenizerInfoPath),
"Xgrammar's tokenizer info path at %s does not exist.", tokenizerInfoPath.c_str());
auto const tokenizerInfo = nlohmann::json::parse(std::ifstream{std::filesystem::path(tokenizerInfoPath)});
auto const encodedVocab = tokenizerInfo["encoded_vocab"].template get<std::vector<std::string>>();
auto const tokenizerStr = tokenizerInfo["tokenizer_str"].template get<std::string>();
auto const stopTokenIds
= tokenizerInfo["stop_token_ids"].template get<std::vector<tensorrt_llm::runtime::TokenIdType>>();
executor::GuidedDecodingConfig::GuidedDecodingBackend guidedDecodingBackend;
if (guidedDecodingBackendStr == "xgrammar")
{
guidedDecodingBackend = executor::GuidedDecodingConfig::GuidedDecodingBackend::kXGRAMMAR;
}
else
{
TLLM_THROW(
"Guided decoding is currently supported with 'xgrammar' backend. Invalid guided_decoding_backend parameter "
"provided.");
}
guidedDecodingConfig
= executor::GuidedDecodingConfig(guidedDecodingBackend, encodedVocab, tokenizerStr, stopTokenIds);
return guidedDecodingConfig;
}

Metadata

Metadata

Assignees

Labels

InvestigatingTriton BackendRelated to NVIDIA Triton Inference Server backendbugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions