-
hello! I use nginx to provide multiple remote model inference interfaces. How can I set inference configs to accelerate the inference process? eval = dict( |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Yes, |
Beta Was this translation helpful? Give feedback.
-
thanks! another question for your help! |
Beta Was this translation helpful? Give feedback.
Yes,
max_num_workers
can be used for parallel inference. But I suggest a round-robin over the urls in theModel
class, which is easier to implement and more intuitive in concept.You may find this document helpful