You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -297,9 +297,9 @@ The following table shows the fields that may to be modified before deployment:
297
297
|`max_tokens_in_paged_kv_cache`| Optional (default=unspecified). The maximum size of the KV cache in number of tokens. If unspecified, value is interpreted as 'infinite'. KV cache allocation is the min of max_tokens_in_paged_kv_cache and value derived from kv_cache_free_gpu_mem_fraction below. |
298
298
|`max_attention_window_size`| Optional (default=max_sequence_length). When using techniques like sliding window attention, the maximum number of tokens that are attended to generate one token. Defaults attends to all tokens in sequence. |
299
299
|`kv_cache_free_gpu_mem_fraction`| Optional (default=0.9). Set to a number between 0 and 1 to indicate the maximum fraction of GPU memory (after loading the model) that may be used for KV cache.|
300
-
|`enable_trt_overlap`| Optional (default=`false`). Set to `true` to partition available requests into 2 'microbatches' that can be run concurrently to hide exposed CPU runtime |
301
300
|`exclude_input_in_output`| Optional (default=`false`). Set to `true` to only return completion tokens in a response. Set to `false` to return the prompt tokens concatenated with the generated tokens |
302
301
|`cancellation_check_period_ms`| Optional (default=100). The time for cancellation check thread to sleep before doing the next check. It checks if any of the current active requests are cancelled through triton and prevent further execution of them. |
302
+
|`stats_check_period_ms`| Optional (default=100). The time for the statistics reporting thread to sleep before doing the next check. |
303
303
|`iter_stats_max_iterations`| Optional (default=executor::kDefaultIterStatsMaxIterations). The numbers of iteration stats to be kept. |
304
304
|`request_stats_max_iterations`| Optional (default=executor::kDefaultRequestStatsMaxIterations). The numbers of request stats to be kept. |
305
305
|`normalize_log_probs`| Optional (default=`true`). Set to `false` to skip normalization of `output_log_probs`|
f"[TensorRT-LLM][WARNING] Don't setup 'skip_special_tokens' correctly (set value is {skip_special_tokens['string_value']}). Set it as True by default."
73
+
)
74
+
self.skip_special_tokens=True
75
+
else:
76
+
print(
77
+
f"[TensorRT-LLM][WARNING] Don't setup 'skip_special_tokens'. Set it as True by default."
0 commit comments