|  | 
|  | 1 | +.. _resource_management: | 
|  | 2 | + | 
|  | 3 | +Resource Management | 
|  | 4 | +=================== | 
|  | 5 | + | 
|  | 6 | +Overview | 
|  | 7 | +-------- | 
|  | 8 | + | 
|  | 9 | +Efficient control of CPU and GPU memory is essential for successful model compilation,  | 
|  | 10 | +especially when working with large models such as LLMs or diffusion models.  | 
|  | 11 | +Uncontrolled memory growth can cause compilation failures or process termination.  | 
|  | 12 | +This guide describes the symptoms of excessive memory usage and provides methods  | 
|  | 13 | +to reduce both CPU and GPU memory consumption. | 
|  | 14 | + | 
|  | 15 | +Memory Usage Control | 
|  | 16 | +-------------------- | 
|  | 17 | + | 
|  | 18 | +CPU Memory | 
|  | 19 | +^^^^^^^^^^ | 
|  | 20 | + | 
|  | 21 | +By default, Torch-TensorRT may consume up to **5×** the model size in CPU memory.   | 
|  | 22 | +This can exceed system limits when compiling large models. | 
|  | 23 | + | 
|  | 24 | +**Common symptoms of high CPU memory usage:** | 
|  | 25 | + | 
|  | 26 | +- Program freeze   | 
|  | 27 | +- Process terminated by the operating system   | 
|  | 28 | + | 
|  | 29 | +**Ways to lower CPU memory usage:** | 
|  | 30 | + | 
|  | 31 | +1. **Enable memory trimming** | 
|  | 32 | + | 
|  | 33 | +   Set the following environment variable: | 
|  | 34 | + | 
|  | 35 | +   .. code-block:: bash | 
|  | 36 | +
 | 
|  | 37 | +      export TRIM_CPU_MEMORY=1 | 
|  | 38 | +
 | 
|  | 39 | +   This reduces approximately **2×** of redundant model copies, limiting  | 
|  | 40 | +   total CPU memory usage to up to **3×** the model size. | 
|  | 41 | + | 
|  | 42 | +2. **Disable CPU offloading** | 
|  | 43 | + | 
|  | 44 | +   In compilation settings, set: | 
|  | 45 | + | 
|  | 46 | +   .. code-block:: python | 
|  | 47 | +
 | 
|  | 48 | +      offload_module_to_cpu = False | 
|  | 49 | +
 | 
|  | 50 | +   This removes another **1×** model copy, reducing peak CPU memory  | 
|  | 51 | +   usage to about **2×** the model size. | 
|  | 52 | + | 
|  | 53 | +GPU Memory | 
|  | 54 | +^^^^^^^^^^ | 
|  | 55 | + | 
|  | 56 | +By default, Torch-TensorRT may consume up to **2×** the model size in GPU memory. | 
|  | 57 | + | 
|  | 58 | +**Common symptoms of high GPU memory usage:** | 
|  | 59 | + | 
|  | 60 | +- CUDA out-of-memory errors   | 
|  | 61 | +- TensorRT compilation errors   | 
|  | 62 | + | 
|  | 63 | +**Ways to lower GPU memory usage:** | 
|  | 64 | + | 
|  | 65 | +1. **Enable offloading to CPU** | 
|  | 66 | + | 
|  | 67 | +   In compilation settings, set: | 
|  | 68 | + | 
|  | 69 | +   .. code-block:: python | 
|  | 70 | +
 | 
|  | 71 | +      offload_module_to_cpu = True | 
|  | 72 | +
 | 
|  | 73 | +   This shifts one model copy from GPU to CPU memory.   | 
|  | 74 | +   As a result, peak GPU memory usage decreases to about **1×**  | 
|  | 75 | +   the model size, while CPU memory usage increases by roughly **1×**. | 
|  | 76 | + | 
|  | 77 | + | 
0 commit comments