Tencent · yghstill · May 9, 2026 · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026
diff --git a/README.md b/README.md
@@ -22,6 +22,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
 </p>
 
 ## 📣Latest News
+- [26/05/08] We have released STQ1_0 kernel for 1.25-bit model and given a PR to llama.cpp [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836) ! If you have any questions or suggestions for STQ_0, welcome to comment under the PR !🔥🔥🔥
 - [26/04/29] We have released 2-bit and 1.25-bit versions of Tencent Hy-MT1.5-1.8B Translation Model: [Hy-MT1.5-1.8B-2bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit) and [Hy-MT1.5-1.8B-1.25bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit). Additionally, we have make an [offline translation demo](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit/blob/main/Hy-MT-demo.apk) for you to try out. We invite you to give it a spin! 🔥🔥🔥
 - [26/04/23] We now support FP8-Static quantization for **Hy3-preview** (MoE A20B).
 - [26/03/25] We have released **DAQ**, the quantization algorithm that preserves the knowledge acquired while the update of parameters is relatively small during post-training training.[[Paper]](https://arxiv.org/abs/2603.22324) | [[Docs]](docs/source/features/quantization/daq.md)

diff --git a/README_cn.md b/README_cn.md
@@ -22,6 +22,7 @@
 </p>
 
 ## 📣最新进展
+- [26/05/08] 我们发布了用于 1.25-bit 模型的 STQ1_0 内核，并向 llama.cpp 提交了 [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836)！如果您对 STQ_0 有任何疑问或建议，欢迎在该 PR 下留言！🔥🔥🔥
 - [26/04/29] 我们发布了 2bit 与 1.25bit 腾讯混元翻译模型 [Hy-MT1.5-1.8B-2bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit), [Hy-MT1.5-1.8B-1.25bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit)。并且还制作了 [离线翻译体验 Demo](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit/blob/main/Hy-MT-demo.apk)。 欢迎体验 🔥🔥🔥
 - [26/04/23] 我们支持了 **Hy3-preview**（MoE A20B）模型的 FP8-Static 量化。
 - [26/03/25] 我们发布了量化算法DAQ，该方法在后训练参数更新较小时，可保留量化后模型能力 [[论文]](https://arxiv.org/abs/2603.22324) | [[文档]](docs/source/features/quantization/daq.md)

diff --git a/docs/source/models/Hy-MT1.5/hy-mt1.5.md b/docs/source/models/Hy-MT1.5/hy-mt1.5.md
@@ -100,7 +100,78 @@ Demo device: Snapdragon 7+ Gen 2, 16GB RAM.
 :::
 
 ## 💻 Deployment
-Our llama.cpp kernel (including STQ kernel) is coming soon. 
+
+### Clone llama.cpp
+
+```bash
+git clone https://github.com/ggml-org/llama.cpp.git
+```
+
+### Enter the llama.cpp folder
+
+```bash
+cd llama.cpp
+```
+
+### Fetch and check out the PR branch
+
+```bash
+git fetch origin pull/22836/head:pr-22836-stq_0
+git checkout pr-22836-stq_0
+```
+
+### Build llama.cpp
+
+```bash
+pip install -r requirements.txt
+cmake -B build
+cmake --build build --config Release
+```
+
+### Download the HF model
+
+
+```bash
+pip install huggingface_hub
+huggingface-cli download AngelSlim/Hy-MT1.5-1.8B-1.25bit \
+    --local-dir model_zoo/Hy-MT1.5-1.8B-1.25bit
+```
+
+### Convert HF → bf16 GGUF
+
+```bash
+python convert_hf_to_gguf.py model_zoo/Hy-MT1.5-1.8B-1.25bit \
+    --outfile model_zoo/Hy-MT1.5-1.8B-bf16.gguf \
+    --outtype bf16
+```
+
+### Quantize bf16 → STQ1_0
+
+```bash
+./build/bin/llama-quantize \
+    model_zoo/Hy-MT1.5-1.8B-bf16.gguf \
+    model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf \
+    STQ1_0
+```
+
+### Run a completion example
+
+The prompt format can be viewed at [HY-MT1.5-1.8B](https://huggingface.co/tencent/HY-MT1.5-1.8B)
+
+```bash
+./build/bin/llama-completion \
+  --model model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf \
+  -p "Translate the following segment into Chinese, without additional explanation. Hello " \
+  --jinja \
+  -ngl 0 \
+  -n 64 -st
+```
+
+### Run the llama.cpp benchmark
+
+```bash
+./build/bin/llama-bench -m model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf -ngl 0
+```
 
 ## 📥 Download Links