Skip to content

Commit b9b6aeb

Browse files
committed
Add resource management docstring
1 parent b8aa4e6 commit b9b6aeb

File tree

1 file changed

+88
-0
lines changed

1 file changed

+88
-0
lines changed
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
.. _resource_management:
2+
3+
Resource Management
4+
===================
5+
6+
Overview
7+
--------
8+
9+
Efficient control of CPU and GPU memory is essential for successful model compilation,
10+
especially when working with large models such as LLMs or diffusion models.
11+
Uncontrolled memory growth can cause compilation failures or process termination.
12+
This guide describes the symptoms of excessive memory usage and provides methods
13+
to reduce both CPU and GPU memory consumption.
14+
15+
Memory Usage Control
16+
--------------------
17+
18+
CPU Memory
19+
^^^^^^^^^^
20+
21+
By default, Torch-TensorRT may consume up to **** the model size in CPU memory.
22+
This can exceed system limits when compiling large models.
23+
24+
**Common symptoms of high CPU memory usage:**
25+
26+
- Program freeze
27+
- Process terminated by the operating system
28+
29+
**Ways to lower CPU memory usage:**
30+
31+
1. **Enable memory trimming**
32+
33+
Set the following environment variable:
34+
35+
.. code-block:: bash
36+
37+
export TRIM_CPU_MEMORY=1
38+
39+
This reduces approximately **** of redundant model copies, limiting
40+
total CPU memory usage to up to **** the model size.
41+
42+
2. **Disable CPU offloading**
43+
44+
In compilation settings, set:
45+
46+
.. code-block:: python
47+
48+
offload_module_to_cpu = False
49+
50+
This removes another **** model copy, reducing peak CPU memory
51+
usage to about **** the model size.
52+
53+
GPU Memory
54+
^^^^^^^^^^
55+
56+
By default, Torch-TensorRT may consume up to **** the model size in GPU memory.
57+
58+
**Common symptoms of high GPU memory usage:**
59+
60+
- CUDA out-of-memory errors
61+
- TensorRT compilation errors
62+
63+
**Ways to lower GPU memory usage:**
64+
65+
1. **Enable offloading to CPU**
66+
67+
In compilation settings, set:
68+
69+
.. code-block:: python
70+
71+
offload_module_to_cpu = True
72+
73+
This shifts one model copy from GPU to CPU memory.
74+
As a result, peak GPU memory usage decreases to about ****
75+
the model size, while CPU memory usage increases by roughly ****.
76+
77+
Summary
78+
-------
79+
80+
| Setting | Effect | Approx. Memory Ratio |
81+
|----------|---------|----------------------|
82+
| Default | Baseline behavior | CPU: 5×, GPU: 2× |
83+
| ``export TRIM_CPU_MEMORY=1`` | Reduces redundant CPU copies | CPU: ~3× |
84+
| ``offload_module_to_cpu=False`` | Further reduces CPU copies | CPU: ~2× |
85+
| ``offload_module_to_cpu=True`` | Reduces GPU usage, increases CPU usage | GPU: ~1×, CPU: +1× |
86+
87+
Proper configuration ensures efficient resource use, stable compilation,
88+
and predictable performance for large-scale models.

0 commit comments

Comments
 (0)