Triton inference server model service update strategy

1、In an industrial production environment, ,In the Triton Inference Server, suppose there is a model service running on graphics card 0 with 24GB of memory. There are four instances of this model service, occupying a total of 18GB of memory. If I need to update the model version to V2, how can I implement the update and traffic switching while avoiding OOM errors?

2、In an industrial production environment, if the model service is distributed on graphics cards 0 and 1 or more, that is, on multiple cards, how can the update deployment not affect normal use?





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Triton inference server model service update strategy #8413

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Triton inference server model service update strategy #8413

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions