-
Notifications
You must be signed in to change notification settings - Fork 159
Pull requests: waybarrios/vllm-mlx
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Harden bench-serve workload runner with focused regression tests
optimization
#515
opened May 7, 2026 by
waybarrios
Owner
Loading…
Track admission-control invariant for serialized TextModel-direct routes
optimization
#514
opened May 7, 2026 by
waybarrios
Owner
Loading…
Wire MLLM assistant drafters for Gemma 4 MTP
#507
opened May 6, 2026 by
Thump604
Collaborator
Loading…
fix: unexpected keyword argument 'mtp' when enable-mtp is set
#503
opened May 5, 2026 by
git4alex
Loading…
Fix dangling think before tool calls in templates
#494
opened May 2, 2026 by
Thump604
Collaborator
Loading…
fix: run BatchedEngine MLLM on dedicated MLXWorkerThread to prevent cross-thread stream errors
#479
opened May 1, 2026 by
xykong
Loading…
fix(simple): use persistent MLX worker thread to fix thread-local stream crash
#478
opened Apr 30, 2026 by
xykong
Loading…
perf: O(1) tool lookup in ToolExecutor via lazily-cached name index
optimization
#449
opened Apr 26, 2026 by
clickbrain
Contributor
Loading…
Fix sampling defaults and short prefix-cache reuse
#424
opened Apr 24, 2026 by
Thump604
Collaborator
Loading…
feat(mllm): extract audio track from video inputs
#352
opened Apr 15, 2026 by
miguel-flowstate
Contributor
Loading…
3 of 4 tasks
fix: graceful fallback when model has no chat_template (MedGemma)
#271
opened Apr 9, 2026 by
jackneil
Contributor
Loading…
2 of 3 tasks
feat: add --compile flag for mx.compile model optimization
#270
opened Apr 9, 2026 by
jackneil
Contributor
Loading…
3 tasks done
fix: overhaul GLM-4.7-Flash streaming tool calls and add GLM4 reasoning parser
#246
opened Apr 2, 2026 by
b2ornot2b
Loading…
Add TurboQuant KV cache compression for prefix cache (4.6x)
#233
opened Mar 29, 2026 by
arozanov
Loading…
9 tasks done
ProTip!
Mix and match filters to narrow down what you’re looking for.