Skip to content

feat: add support for florence2 #4383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ducviet00
Copy link

@ducviet00 ducviet00 commented May 16, 2025

What does this PR do?

This PR to adds support for Florence-2.
Fixes: #2221

@juney-nvidia juney-nvidia added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels May 16, 2025
@poweiw poweiw added feature request New feature or request. This includes new model, dtype, functionality support new model Request to add a new model Generic Runtime General operational aspects of TRTLLM execution not in other categories. triaged Issue has been triaged by maintainers labels Jun 5, 2025
@poweiw poweiw requested a review from schetlur-nv June 5, 2025 20:22
@JinkeJ
Copy link

JinkeJ commented Jun 23, 2025

Thanks for your contribution.
When I run the converted model, it shows

python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 963, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.Florence-2-large.configuration_florence2.Florence2Config'> to build an AutoTokenizer.

How do you solve that?

@ducviet00
Copy link
Author

@JinkeJ
Hi, this PR is not completed yet.
I’ve been busy and haven’t had much time to work on it.
Florence-2 is a combination of DaViT and BART, so you can convert DaViT to ONNX and then to TensorRT, and convert BART to TensorRT-LLM. It works fine with some tweaks for me.

@JinkeJ
Copy link

JinkeJ commented Jun 23, 2025

@ducviet00 I already got the "model.engine" and "config.json" under tmp/trt_engines, but how to load the engine? I followed the instruction in your commit with those error I mentioned above bash python run.py \ --max_new_tokens 30 \ --input_text "<OD>" \ --hf_model_dir tmp/hf_models/${MODEL_NAME} \ --engine_dir tmp/trt_engines/${MODEL_NAME}/${INFERENCE_PRECISION}
Did you use any other method or code to load that?

@ducviet00
Copy link
Author

@JinkeJ
You should tweaks run.py to make it works, this is my example:

processor = AutoProcessor.from_pretrained(hf_dir, trust_remote_code=True, config=config)
inputs = processor(text=prompt, images=image, return_tensors="pt")
input_ids = inputs["input_ids"]
pixel_values = inputs["pixel_values"]
engine_name = "florence-2"
engine_dir = f"/workspace/models/trt_engines/{model_name}/{torch_dtype_to_str(torch_dtype)}/"
tllm_model = EncDecModelRunner.from_engine(engine_name, engine_dir, debug_mode=debug_mode, enable_context_fmha_fp32_acc=True)
# tllm_model.encoder_model_config
vocab_size = tllm_model.encoder_model_config.vocab_size
vision_encoder_path = f"{engine_dir}vision/model.engine"
with open(vision_encoder_path, 'rb') as f:
    engine_buffer = f.read()
visual_encoder_session = Session.from_serialized_engine(engine_buffer)

vision_config_path = f"{engine_dir}vision/config.json"
with open(os.path.join(vision_config_path), "r") as f:
    config = json.load(f)
image_features_len = 577
task_vocab_size = torch.tensor([image_features_len], dtype=torch.int32).cuda()
tasks = torch.zeros([588], dtype=torch.int32).cuda()
fake_visual_ids = torch.arange(
    vocab_size, vocab_size + 577
)
fake_visual_ids = fake_visual_ids.reshape(
    1, 577
).to("cuda")
encoder_input_ids = (
    torch.cat([fake_visual_ids, input_ids], dim=1).contiguous().to(torch.int32)
)
attention_mask = torch.ones(
    encoder_input_ids.shape, device=encoder_input_ids.device, dtype=torch.int32
)

tik = time.perf_counter()
visual_features = {
    "input": pixel_values.to(torch_dtype),
}
tensor_info = [
    TensorInfo("input", torch_dtype_to_trt(torch_dtype), pixel_values.shape),
]
visual_output_info = visual_encoder_session.infer_shapes(tensor_info)
visual_outputs = {
    t.name: torch.empty(
        tuple(t.shape), dtype=trt_dtype_to_torch(t.dtype), device=pixel_values.device
    )
    for t in visual_output_info
}
stream = torch.cuda.Stream(torch.cuda.current_device())
ok = visual_encoder_session.run(visual_features, visual_outputs, stream.cuda_stream)
stream.synchronize()
assert ok
image_features = visual_outputs["image_features"]
prompt_table = image_features
prompt_table = prompt_table.view((prompt_table.shape[0] * prompt_table.shape[1], prompt_table.shape[2]))
decoder_input_ids = torch.IntTensor([[model.language_model.config.decoder_start_token_id]] * encoder_input_ids.shape[0]).to('cuda')
tllm_output = tllm_model.generate(
    encoder_input_ids=encoder_input_ids,
    decoder_input_ids=decoder_input_ids,
    # attention_mask=attention_mask,
    max_new_tokens=300,
    num_beams=1,
    bos_token_id=processor.tokenizer.bos_token_id,
    pad_token_id=1,
    eos_token_id=processor.tokenizer.eos_token_id,
    debug_mode=debug_mode,
    return_dict=True,
    time_encoder=True,
    prompt_embedding_table=prompt_table,
    prompt_tasks=tasks,
    prompt_vocab_size=task_vocab_size,
    return_encoder_output=True
)
tok = time.perf_counter()
print("=" * 100)
print(tok - tik)
print("="*100)
print("TensorRT-LLM Output Text:")
for generated_ids in tllm_output["output_ids"]:
    print(generated_ids)
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
    print(parsed_answer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Engagement help/insights needed from community Community want to contribute PRs initiated from Community feature request New feature or request. This includes new model, dtype, functionality support Generic Runtime General operational aspects of TRTLLM execution not in other categories. new model Request to add a new model triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for florence-2
4 participants