waybarrios / vllm-mlx Public

Notifications You must be signed in to change notification settings
Fork 159
Star 1.1k

Code
Issues 32
Pull requests 21
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: waybarrios/vllm-mlx

Labels 15 Milestones 0

New pull request New

20 Open 341 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Harden bench-serve workload runner with focused regression tests optimization

#515 opened May 7, 2026 by waybarrios Owner

Loading…

Track admission-control invariant for serialized TextModel-direct routes optimization

#514 opened May 7, 2026 by waybarrios Owner

Loading…

Wire MLLM assistant drafters for Gemma 4 MTP

#507 opened May 6, 2026 by Thump604 Collaborator

Loading…

Expose MLLM MTP draft counters

#504 opened May 6, 2026 by Thump604 Collaborator

Loading…

fix: unexpected keyword argument 'mtp' when enable-mtp is set

#503 opened May 5, 2026 by git4alex

Loading…

fix: Qwen tool streaming recovery

#497 opened May 4, 2026 by kylejeske

Loading…

Fix dangling think before tool calls in templates

#494 opened May 2, 2026 by Thump604 Collaborator

Loading…

docs: add PR merge readiness checklist

#492 opened May 2, 2026 by Thump604 Collaborator

Loading…

docs: clarify MLLM MTP guard semantics

#491 opened May 2, 2026 by Thump604 Collaborator

Loading…

fix: sanitize remote media safety errors

#490 opened May 2, 2026 by Thump604 Collaborator

Loading…

fix: run BatchedEngine MLLM on dedicated MLXWorkerThread to prevent cross-thread stream errors

#479 opened May 1, 2026 by xykong

Loading…

fix(simple): use persistent MLX worker thread to fix thread-local stream crash

#478 opened Apr 30, 2026 by xykong

Loading…

Expose effective MLLM MTP draft stats

#473 opened Apr 29, 2026 by Thump604 Collaborator

Loading…

perf: O(1) tool lookup in ToolExecutor via lazily-cached name index optimization

#449 opened Apr 26, 2026 by clickbrain Contributor

Loading…

Fix sampling defaults and short prefix-cache reuse

#424 opened Apr 24, 2026 by Thump604 Collaborator

Loading…

feat(mllm): extract audio track from video inputs

#352 opened Apr 15, 2026 by miguel-flowstate Contributor

Loading…

3 of 4 tasks

fix: graceful fallback when model has no chat_template (MedGemma)

#271 opened Apr 9, 2026 by jackneil Contributor

Loading…

2 of 3 tasks

feat: add --compile flag for mx.compile model optimization

#270 opened Apr 9, 2026 by jackneil Contributor

Loading…

3 tasks done

fix: overhaul GLM-4.7-Flash streaming tool calls and add GLM4 reasoning parser

#246 opened Apr 2, 2026 by b2ornot2b

Loading…

Add TurboQuant KV cache compression for prefix cache (4.6x)

#233 opened Mar 29, 2026 by arozanov

Loading…

9 tasks done

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!