Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
</p>

## 📣Latest News
- [26/05/08] We have released STQ1_0 kernel for 1.25-bit model and given a PR to llama.cpp [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836) ! If you have any questions or suggestions for STQ_0, welcome to comment under the PR !🔥🔥🔥
- [26/04/29] We have released 2-bit and 1.25-bit versions of Tencent Hy-MT1.5-1.8B Translation Model: [Hy-MT1.5-1.8B-2bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit) and [Hy-MT1.5-1.8B-1.25bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit). Additionally, we have make an [offline translation demo](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit/blob/main/Hy-MT-demo.apk) for you to try out. We invite you to give it a spin! 🔥🔥🔥
- [26/04/23] We now support FP8-Static quantization for **Hy3-preview** (MoE A20B).
- [26/03/25] We have released **DAQ**, the quantization algorithm that preserves the knowledge acquired while the update of parameters is relatively small during post-training training.[[Paper]](https://arxiv.org/abs/2603.22324) | [[Docs]](docs/source/features/quantization/daq.md)
Expand Down
1 change: 1 addition & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
</p>

## 📣最新进展
- [26/05/08] 我们发布了用于 1.25-bit 模型的 STQ1_0 内核,并向 llama.cpp 提交了 [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836)!如果您对 STQ_0 有任何疑问或建议,欢迎在该 PR 下留言!🔥🔥🔥
- [26/04/29] 我们发布了 2bit 与 1.25bit 腾讯混元翻译模型 [Hy-MT1.5-1.8B-2bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit), [Hy-MT1.5-1.8B-1.25bit](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit)。并且还制作了 [离线翻译体验 Demo](https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit/blob/main/Hy-MT-demo.apk)。 欢迎体验 🔥🔥🔥
- [26/04/23] 我们支持了 **Hy3-preview**(MoE A20B)模型的 FP8-Static 量化。
- [26/03/25] 我们发布了量化算法DAQ,该方法在后训练参数更新较小时,可保留量化后模型能力 [[论文]](https://arxiv.org/abs/2603.22324) | [[文档]](docs/source/features/quantization/daq.md)
Expand Down
73 changes: 72 additions & 1 deletion docs/source/models/Hy-MT1.5/hy-mt1.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,78 @@ Demo device: Snapdragon 7+ Gen 2, 16GB RAM.
:::

## 💻 Deployment
Our llama.cpp kernel (including STQ kernel) is coming soon.

### Clone llama.cpp

```bash
git clone https://github.com/ggml-org/llama.cpp.git
```

### Enter the llama.cpp folder

```bash
cd llama.cpp
```

### Fetch and check out the PR branch

```bash
git fetch origin pull/22836/head:pr-22836-stq_0
git checkout pr-22836-stq_0
```

### Build llama.cpp

```bash
pip install -r requirements.txt
cmake -B build
cmake --build build --config Release
```

### Download the HF model


```bash
pip install huggingface_hub
huggingface-cli download AngelSlim/Hy-MT1.5-1.8B-1.25bit \
--local-dir model_zoo/Hy-MT1.5-1.8B-1.25bit
```

### Convert HF → bf16 GGUF

```bash
python convert_hf_to_gguf.py model_zoo/Hy-MT1.5-1.8B-1.25bit \
--outfile model_zoo/Hy-MT1.5-1.8B-bf16.gguf \
--outtype bf16
```

### Quantize bf16 → STQ1_0

```bash
./build/bin/llama-quantize \
model_zoo/Hy-MT1.5-1.8B-bf16.gguf \
model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf \
STQ1_0
```

### Run a completion example

The prompt format can be viewed at [HY-MT1.5-1.8B](https://huggingface.co/tencent/HY-MT1.5-1.8B)

```bash
./build/bin/llama-completion \
--model model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf \
-p "Translate the following segment into Chinese, without additional explanation. Hello " \
--jinja \
-ngl 0 \
-n 64 -st
```

### Run the llama.cpp benchmark

```bash
./build/bin/llama-bench -m model_zoo/Hy-MT1.5-1.8B-STQ1_0.gguf -ngl 0
```

## 📥 Download Links

Expand Down
Loading