GitHub - wnma3mz/tLLM

3 Branches 0 Tags

Name	Name	Last commit message	Last commit date
Latest commit wnma3mz Merge branch 'main' of github.com:wnma3mz/tLLM Feb 9, 2025 ad6b451 · Feb 9, 2025 History 300 Commits
asserts	asserts	add equation img	Feb 1, 2025
benchmarks	benchmarks	update en readme	Feb 2, 2025
examples	examples	clean code	Jan 29, 2025
flux_examples	flux_examples	add setup	Dec 21, 2024
requirements	requirements	mlx-vlm and mflux are not mandatory	Feb 2, 2025
scripts	scripts	add radix tree for token level prefill cache	Jan 27, 2025
tests	tests	build batch and prefill request to test	Jan 28, 2025
tllm	tllm	Merge branch 'main' of github.com:wnma3mz/tLLM	Feb 9, 2025
.gitignore	.gitignore	move qwen2vision model to mlx_clip	Jan 28, 2025
LICENSE	LICENSE	Initial commit	Jun 25, 2024
README.md	README.md	clean readme	Feb 9, 2025
README_EN.md	README_EN.md	clean readme	Feb 9, 2025
pyproject.toml	pyproject.toml	update forward seq_len=1 batch	Oct 3, 2024
run_engine.py	run_engine.py	refactor the cache manager	Feb 3, 2025
run_janus_pro.py	run_janus_pro.py	refactor the cache manager	Feb 3, 2025
setup.py	setup.py	fix is_local bugs	Jan 31, 2025

Repository files navigation

Together-LLM

English | 中文

跨机推理 LLM 框架

快速开始

安装依赖

在 MacOS （Apple silicon）: pip install -U -e ".[mlx]"
其他平台（NVIDIA）: pip install -e ".[torch]"

本机运行：PYTHONPATH="./" python3 ./run_engine.py --model_path mlx-community/Llama-3.2-1B-Instruct-4bit

启动 HTTP 服务

单机: tllm.server --model_path mlx-community/Llama-3.2-1B-Instruct-4bit
多机:
- 在一个终端启动服务端: tllm.server --model_path mlx-community/Llama-3.2-1B-Instruct-4bit --hostname $YOUR_IP
- 在另一个终端启动客户端 tllm.client --hostname http://$YOUR_IP:8022

测试 HTTP 服务

python3 benchmarks/run_async_requests.py

支持模型

llama
qwen
janus_pro: 暂只支持 MacOS 平台
- Text to Text: PYTHONPATH="./" python3 run_janus_pro.py --model_path wnma3mz/Janus-Pro-1B-4bit --message_type llm
- Image to Text: PYTHONPATH="./" python3 run_janus_pro.py --model_path wnma3mz/Janus-Pro-1B-4bit --message_type mllm
- Text to Image: PYTHONPATH="./" python3 run_janus_pro.py --model_path wnma3mz/Janus-Pro-1B-4bit --message_type image
qwen-vl: 在 MacOS 平台需要额外安装 pip install mlx-vlm==0.1.12
flux: 暂只支持 MacOS 平台，需要额外安装 pip install mflux=0.4.1

进阶功能

对于多机部署，会使用默认的部分端口进行运行。如果有特殊需求，可以通过配置文件 examples/config.json 进行修改。

{
    "server": {
        "grpc_port": 25001,
        "http_port": 8022,
        "hostname": "mac-mini"
    },
    "client": [
        {
            "grpc_port": 25002,
            "hostname": "m3pro"
        },
        {
            "grpc_port": 25003,
            "hostname": "m3"
        }
    ]
}

客户端的数量会决定模型拆分的数量
server.grpc_port: server 的 grpc 端口，用于每个 client 发送状态数据以及最后一个 client 发送计算后的结果
server.http_port: server 的 http 端口，API 接口以及 WebSocket 服务
server.hostname: server 的 hostname，可以用 ip 代替，如 192.168.1.10，需要确保 client 能够访问
client.grpc_port: client 的 grpc 端口
client.hostname: client 的 hostname，需要确保 server 和其他 client 能够访问

Features

Performance

In Mac Mini M4

	`mlx-community/Llama-3.2-1B-Instruct-4bit`	`mlx-community/Llama-3.2-1B-Instruct`	`mlx-community/Meta-Llama-3.1-8B-Instruct-4bit`	`mlx-community/Meta-Llama-3.1-8B-Instruct-bf16`
Mac Mini M4 (16G) (Local)	45.36 tok/s	23.60 tok/s	15.80 tok/s	No Memory
Mac Mini M4 (16G) + M3 Pro (18G)		16.33 tok/s	11.06 tok/s	5.64 tok/s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Together-LLM

快速开始

支持模型

进阶功能

Features

Performance

About

Releases

Packages

Languages

License

wnma3mz/tLLM

Folders and files

Latest commit

History

Repository files navigation

Together-LLM

快速开始

支持模型

进阶功能

Features

Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages