Skip to content

TritonServer inference service shows steadily increasing RSS memory over requests without reclamation #8457

@zhangzju

Description

@zhangzju

Description

I built TritonServer based on the 2.50.0 tag and ran a local model inference service. After starting the service, I launched another script that repeatedly sends a fixed prompt to this service and monitored TritonServer’s RSS memory. As shown in the chart, the RSS keeps increasing over time. Even after I stop sending prompt requests, the memory is not reclaimed. In a real production environment, this leads to my container running out of memory.

Image

I’d like to know whether TritonServer’s continuously increasing RSS memory is expected behavior. If it is expected, there should theoretically be an upper bound after which memory is reclaimed. How is this reclamation limit determined? Is it based on the operating system’s MemTotal?

I also captured jeprof SVG files for the same environment under both low-request and high-request conditions.

Image

Image

Triton Information

2.50.0,tried the latest version, and reported the same phenomenon;

Are you using the Triton container or did you build it yourself?

Build executable files in Dockerfile from source code directly without change;

To Reproduce

Steps to reproduce the behavior.

  1. Build tritonserver from the 2.50.0 tag and start a model inference service (locally we launch a proprietary, non-open-source model).
  2. Start a client and send prompt requests to the service.
  3. Use the command ps -C tritonserver -o pid,comm,lstart,rss to obtain the tritonserver process’s RSS memory usage.

Expected behavior

The RSS remains at a safe level, or there is a way for tritonserver’s RSS memory to be reclaimed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions