Skip to content

Triton inference server model service update strategy #8413

@fly2skyToEnjoy

Description

@fly2skyToEnjoy

1、In an industrial production environment, ,In the Triton Inference Server, suppose there is a model service running on graphics card 0 with 24GB of memory. There are four instances of this model service, occupying a total of 18GB of memory. If I need to update the model version to V2, how can I implement the update and traffic switching while avoiding OOM errors?

2、In an industrial production environment, if the model service is distributed on graphics cards 0 and 1 or more, that is, on multiple cards, how can the update deployment not affect normal use?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions