vllm-project / tpu-inference Public

Notifications You must be signed in to change notification settings
Fork 18
Star 126

Code
Issues 5
Pull requests 42
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: vllm-project/tpu-inference

Labels 10 Milestones 0

New pull request New

42 Open 880 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[WIP][DO NOT REVIEW] [TPU host offload] delta load optimization for tpu connector local

#941 opened Oct 26, 2025 by saikat-royc

Loading…

[Multi-host] Fix bugs in the deployment script

#940 opened Oct 25, 2025 by Lumosis

Loading…

[do not review yet] Get rid of a2a

#939 opened Oct 24, 2025 by vanbasten23 • Draft

Reduce the host overhead for LoRA

#930 opened Oct 23, 2025 by vanbasten23

Loading…

[multi-host] add quick start guide

#928 opened Oct 23, 2025 by Lumosis

Loading…

[Requirements] Bump JAX/JAXLib to 0.8.0

#927 opened Oct 23, 2025 by jrplatin

Loading…

[Do not review yet] Fix issues when running multiple tests on the v6e-8 machine.

#926 opened Oct 23, 2025 by vanbasten23 • Draft

[Feature] Code implementation of Async Scheduler

#924 opened Oct 23, 2025 by cychiuak

Loading…

[Spec Decoding] Merge jitted helpers for eagle3

#920 opened Oct 22, 2025 by Lumosis

Loading…

[Kernel] Refactor ragged_paged_attention to proxy for default and hd64

#918 opened Oct 22, 2025 by yaochengji • Draft

[Requirements] Bump TPU Info to 0.6.0

#917 opened Oct 22, 2025 by jrplatin

Loading…

PP for single host

#914 opened Oct 21, 2025 by Chenyaaang • Draft

[Draft] Skip build if only docs/icons changed

#908 opened Oct 21, 2025 by boe20211 • Draft

[WIP] Add Qwen3-Omni model

#896 opened Oct 19, 2025 by eitanporat

Loading…

add jax support for Qwen2VL

#893 opened Oct 18, 2025 by shungcp

Loading…

[Doc] Docker guide extended

#890 opened Oct 17, 2025 by hosseinsarshar

Loading…

Data Parallelism support

#865 opened Oct 14, 2025 by wenxindongwork • Draft

[GPT-OSS] JAX implementation of GPT-OSS

#861 opened Oct 14, 2025 by bzgoogle

Loading…

[CI] remove lora_bias_stacked as it is deprecated in vllm

#835 opened Oct 11, 2025 by bzgoogle

Loading…

lora spmd

#802 opened Oct 8, 2025 by vanbasten23 • Draft

feat: Add a procedures to record the vllm and tpu_inference's commit hashes in CI pipeline (WIP)

#795 opened Oct 7, 2025 by dennisYehCienet

Loading…

update max_model_len to plus 1 to adjust vllm change

#793 opened Oct 6, 2025 by Chenyaaang • Draft

Prototyping load weight scale for qwen3.

#741 opened Sep 25, 2025 by inho9606

Loading…

[Test only] Remove the model cache

#725 opened Sep 22, 2025 by QiliangCui

Loading…

[kernel][RPA v3] Use dummy DMA on wait to save SREGs usage

#718 opened Sep 19, 2025 by lsy323 • Draft

Previous 1 2 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!