Skip to content

Commit 9642f71

Browse files
committed
clarity and typo
Signed-off-by: Kyle Sayers <[email protected]>
1 parent 08c4c91 commit 9642f71

File tree

1 file changed

+6
-6
lines changed
  • src/llmcompressor/transformers/tracing

1 file changed

+6
-6
lines changed

src/llmcompressor/transformers/tracing/GUIDE.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,14 @@ a [Sequential Pipeline](/src/llmcompressor/pipelines/sequential/pipeline.py)
1616
is required in order to offload activations and reduce memory usage as well as propagate
1717
the activation error induced by compression.
1818

19-
For example, let's say we want to quantize a basic `3` layer model using the
20-
[GPTQModifier](/src/llmcompressor/modifiers/quantization/gptq/base.py) and `512`
19+
For example, let's say we want to quantize a basic 3 layer model using the
20+
[GPTQModifier](/src/llmcompressor/modifiers/quantization/gptq/base.py) and 512
2121
calibration samples. The [Sequential Pipeline](/src/llmcompressor/pipelines/sequential/pipeline.py)
2222
first identifies each of the layers (sequential targets) within the model. Then, the
23-
pipeline runs each of the `512` samples, one sample at a time, through the first layer.
23+
pipeline runs each of the 512 samples, one sample at a time, through the first layer.
2424
When one sample completes its forward pass through the layer, its activations are
25-
recorded by the [GPTQModifier](/src/llmcompressor/modifiers/quantization/gptq/base.py)
26-
hessian and the layer output is offloaded to the cpu. After all `512` samples have been
25+
used by the [GPTQModifier](/src/llmcompressor/modifiers/quantization/gptq/base.py)
26+
to calibrate the hessian and the layer output is offloaded to the cpu. After all 512 samples have been
2727
passed through the layer, the [GPTQModifier](/src/llmcompressor/modifiers/quantization/gptq/base.py)
2828
uses the recorded activations to compress the weights of the modules within the layer.
2929
Once module compression is complete, the offloaded activations are used to perform the
@@ -242,7 +242,7 @@ def _prepare_cross_attention_mask(...) -> ...:
242242
<img alt="Wrapped Function" src="assets/wrapped_function.jpg" height="5%" />
243243
</p>
244244
<p align="center">
245-
<em>This image dicts how the internals of the <code>_prepare_cross_attention_mask</code> function are replaced by a single <code>call_module</code> operation, similar to how modules can be ignored as featured in section 1
245+
<em>This image dicts how the internals of the <code>_prepare_cross_attention_mask</code> function are replaced by a single <code>call_module</code> operation, similar to how modules can be ignored as featured in section 1</em>
246246
</p>
247247

248248
Please note that wrapped functions must be defined at the module-level, meaning that

0 commit comments

Comments
 (0)