1、In an industrial production environment, ,In the Triton Inference Server, suppose there is a model service running on graphics card 0 with 24GB of memory. There are four instances of this model service, occupying a total of 18GB of memory. If I need to update the model version to V2, how can I implement the update and traffic switching while avoiding OOM errors?
2、In an industrial production environment, if the model service is distributed on graphics cards 0 and 1 or more, that is, on multiple cards, how can the update deployment not affect normal use?