TritonServer inference service shows steadily increasing RSS memory over requests without reclamation

### Description

I built TritonServer based on the 2.50.0 tag and ran a local model inference service. After starting the service, I launched another script that repeatedly sends a fixed prompt to this service and monitored TritonServer’s RSS memory. As shown in the chart, the RSS keeps increasing over time. Even after I stop sending prompt requests, the memory is not reclaimed. In a real production environment, this leads to my container running out of memory.

<img width="790" height="388" alt="Image" src="https://github.com/user-attachments/assets/676fbbd5-3797-4142-8a0f-5653c8090779" />

I’d like to know whether TritonServer’s continuously increasing RSS memory is expected behavior. If it is expected, there should theoretically be an upper bound after which memory is reclaimed. How is this reclamation limit determined? Is it based on the operating system’s MemTotal?

I also captured jeprof SVG files for the same environment under both low-request and high-request conditions.

![Image](https://github.com/user-attachments/assets/dec97689-71f4-4f2f-adfc-b0fc077dddc0)

![Image](https://github.com/user-attachments/assets/05f36e2e-d3da-4d72-a7f3-90940613ab46)

### Triton Information
2.50.0，tried the latest version, and reported the same phenomenon;

### Are you using the Triton container or did you build it yourself?
Build executable files in Dockerfile from source code directly without change;

### To Reproduce
Steps to reproduce the behavior.

1. Build tritonserver from the 2.50.0 tag and start a model inference service (locally we launch a proprietary, non-open-source model).
2. Start a client and send prompt requests to the service.
3. Use the command ps -C tritonserver -o pid,comm,lstart,rss to obtain the tritonserver process’s RSS memory usage.

### Expected behavior
The RSS remains at a safe level, or there is a way for tritonserver’s RSS memory to be reclaimed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TritonServer inference service shows steadily increasing RSS memory over requests without reclamation #8457

Description

Triton Information

Are you using the Triton container or did you build it yourself?

To Reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TritonServer inference service shows steadily increasing RSS memory over requests without reclamation #8457

Description

Description

Triton Information

Are you using the Triton container or did you build it yourself?

To Reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions