Skip to content

Pull requests: vllm-project/tpu-inference

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[Multi-host] Fix bugs in the deployment script
#940 opened Oct 25, 2025 by Lumosis Loading…
Reduce the host overhead for LoRA
#930 opened Oct 23, 2025 by vanbasten23 Loading…
[multi-host] add quick start guide
#928 opened Oct 23, 2025 by Lumosis Loading…
[Requirements] Bump JAX/JAXLib to 0.8.0
#927 opened Oct 23, 2025 by jrplatin Loading…
[Feature] Code implementation of Async Scheduler
#924 opened Oct 23, 2025 by cychiuak Loading…
[Spec Decoding] Merge jitted helpers for eagle3
#920 opened Oct 22, 2025 by Lumosis Loading…
[Requirements] Bump TPU Info to 0.6.0
#917 opened Oct 22, 2025 by jrplatin Loading…
PP for single host
#914 opened Oct 21, 2025 by Chenyaaang Draft
[WIP] Add Qwen3-Omni model
#896 opened Oct 19, 2025 by eitanporat Loading…
add jax support for Qwen2VL
#893 opened Oct 18, 2025 by shungcp Loading…
[Doc] Docker guide extended
#890 opened Oct 17, 2025 by hosseinsarshar Loading…
[GPT-OSS] JAX implementation of GPT-OSS
#861 opened Oct 14, 2025 by bzgoogle Loading…
lora spmd
#802 opened Oct 8, 2025 by vanbasten23 Draft
Prototyping load weight scale for qwen3.
#741 opened Sep 25, 2025 by inho9606 Loading…
[Test only] Remove the model cache
#725 opened Sep 22, 2025 by QiliangCui Loading…
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.