Skip to content

Commit

Permalink
update doc for qwen2
Browse files Browse the repository at this point in the history
  • Loading branch information
yvonwin committed Jun 7, 2024
1 parent 0ddd87b commit 6a49b68
Show file tree
Hide file tree
Showing 3 changed files with 1,242 additions and 1,107 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[中文版](README_zh.md)

This project is an independent C++ implementation of [Qwen1.5 family](https://github.com/QwenLM/Qwen1.5) and Llama3.
This project is an independent C++ implementation of [Qwen2 family](https://github.com/QwenLM/Qwen2) and Llama3.

![](docs/main_demo.jpg)

Expand All @@ -14,6 +14,7 @@ This project is an independent C++ implementation of [Qwen1.5 family](https://gi
- **`2024/04/11`** The platform has been updated to support Windows. It has been tested on Visual Studio 2022, and both CUDA and CPU functionalities are confirmed to work correctly.
- **`2024/04/18`** Tested on [CodeQwen1.5-7B](https://huggingface.co/Qwen/CodeQwen1.5-7B) The model's architecture is verified to be correct. However, it uses SentencePiece for tokenization.You can test it with hf tokenizer like `examples/codeqwen.py`.
- **`2024/04/25`** Support [Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) Llama3 utilizes tiktoken as well, hence it is supported.
- **`2024/06/07`** Support [Qwen2](https://huggingface.co/Qwen/Qwen2-7B-Instruct)

## Features

Expand All @@ -26,7 +27,7 @@ Highlights:
Support Matrix:
* Hardwares: x86/arm CPU, NVIDIA GPU, Apple Silicon GPU
* Platforms: Linux, MacOS, Winodws
* Models: [Qwen1.5](https://github.com/QwenLM/Qwen1.5) family and Llama3
* Models: [Qwen2](https://github.com/QwenLM/Qwen2) family and Llama3

## Test in colab

Expand All @@ -49,9 +50,9 @@ git submodule update --init --recursive

**Quantize Model**

Use `convert.py` to transform Qwen1.5 into quantized GGML format. For example, to convert the fp16 original model to q4_0 (quantized int4) GGML model, run:
Use `convert.py` to transform Qwen2 into quantized GGML format. For example, to convert the fp16 original model to q4_0 (quantized int4) GGML model, run:
```sh
python3 qwen_cpp/convert.py -i Qwen/Qwen1.5-1.8B-Chat -t q4_0 -o qwen2_1.8b-ggml.bin
!python qwen_cpp/convert.py -i Qwen/Qwen2-1.5B-Instruct -t q4_0 -o Qwen2-1.5B-Instruct-ggml.bin
```

The original model (`-i <model_name_or_path>`) can be a HuggingFace model name or a local path to your pre-downloaded model. Currently supported models are:
Expand All @@ -64,6 +65,7 @@ The original model (`-i <model_name_or_path>`) can be a HuggingFace model name o
* Qwen1.5-MoeA2.7B: `Qwen/Qwen1.5-MoE-A2.7B-Chat`
* Llama-3-8B-Instruct: `meta-llama/Meta-Llama-3-8B-Instruct`
* Llama3-8B-Chinese-Chat : `shenzhi-wang/Llama3-8B-Chinese-Chat`
* Qwen2-7B-Instruct : `Qwen/Qwen2-7B-Instruct`

You are free to try any of the below quantization types by specifying `-t <type>`:
* `q4_0`: 4-bit integer quantization with fp16 scales.
Expand Down Expand Up @@ -96,7 +98,7 @@ The default tiktoken file is `qwen.tiktoken`. For Llama3, download it from [this

To run the model in interactive mode, add the `-i` flag. For example:
```sh
./build/bin/main -m qwen2_1.8b-ggml.bin -i
./build/bin/main -m Qwen2-1.5B-Instruct-ggml.bin -i
```
In interactive mode, your chat history will serve as the context for the next-round conversation.

Expand Down
8 changes: 5 additions & 3 deletions README_zh.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# qwen2.cpp

这个项目是 [Qwen1.5 家族](https://github.com/QwenLM/Qwen1.5)和Llama3 的独立 C++ 实现。
这个项目是 [Qwen2 家族](https://github.com/QwenLM/Qwen2)和Llama3 的独立 C++ 实现。

![](docs/main_demo.jpg)

Expand All @@ -13,6 +13,7 @@
- **`2024/04/18`**[CodeQwen1.5-7B](https://huggingface.co/Qwen/CodeQwen1.5-7B) 上进行了测试,验证了模型的架构正确性。但它使用 SentencePiece 进行标记化,暂时不想引入更多的库。可以使用 hf tokenizer 进行测试,例如 `examples/codeqwen.py`
- **`2024/04/25`** 支持 [Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)。Llama3 也使用 tiktoken,因此支持一下。
- **`2024/05/08`** 推荐使用中文微调后的llama3: shenzhi-wang/Llama3-8B-Chinese-Chat`
- **`2024/06/07`** 模型结构基本没变化,所以支持[Qwen2](https://huggingface.co/Qwen/Qwen2-7B-Instruct)

## 特点

Expand All @@ -25,7 +26,7 @@
支持矩阵:
* 硬件:x86/arm CPU、NVIDIA GPU、Apple Silicon GPU
* 平台:Linux、MacOS、Windows
* 模型:[Qwen1.5](https://github.com/QwenLM/Qwen1.5) 家族和 Llama3
* 模型:[Qwen2](https://github.com/QwenLM/Qwen2) 家族和 Llama3

## 在 Colab 中测试

Expand Down Expand Up @@ -62,6 +63,7 @@ python3 qwen_cpp/convert.py -i Qwen/Qwen1.5-1.8B-Chat -t q4_0 -o qwen2_1.8b-ggml
* Qwen1.5-MoeA2.7B: `Qwen/Qwen1.5-MoE-A2.7B-Chat`
* Llama-3-8B-Instruct: `meta-llama/Meta-Llama-3-8B-Instruct`
* Llama3-8B-Chinese-Chat : `shenzhi-wang/Llama3-8B-Chinese-Chat`
* Qwen2-7B-Instruct : `Qwen/Qwen2-7B-Instruct`

你可以通过指定 `-t <type>` 来尝试以下任何量化类型:
* `q4_0`:4 位整数量化, 使用 fp16 缩放。
Expand Down Expand Up @@ -114,7 +116,7 @@ llama3-chinese 示例

要在交互模式下运行模型,添加 `-i` 标志。例如:
```sh
./build/bin/main -m qwen2_1.8b-ggml.bin -i
./build/bin/main -m Qwen2-1.5B-Instruct-ggml.bin -i
```
在交互模式下,你的聊天记录将作为下一轮对话的上下文。

Expand Down
Loading

0 comments on commit 6a49b68

Please sign in to comment.