-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix[Docs]: link anchor is incorrect #20309
documentation
Improvements or additions to documentation
structured-output
#20315
opened Jul 1, 2025 by
yyzxw
Loading…
4 tasks
[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel
#20308
opened Jul 1, 2025 by
jvlunteren
Loading…
Add support for Prithvi geospatial model in serving mode
documentation
Improvements or additions to documentation
frontend
multi-modality
Related to multi-modality (#4194)
needs-rebase
structured-output
v1
[doc] quark_mxfp4_introduction
documentation
Improvements or additions to documentation
#20306
opened Jul 1, 2025 by
lihaoyang-amd
•
Draft
[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#20301
opened Jul 1, 2025 by
WoosukKwon
Loading…
[Misc] Minor refactoring for scheduler
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#20299
opened Jul 1, 2025 by
WoosukKwon
Loading…
[Feature] Support Minimax-M1 function calls features
documentation
Improvements or additions to documentation
frontend
tool-calling
#20297
opened Jul 1, 2025 by
qscqesze
Loading…
[Hardware][RISC-V] Add RISC-V architecture cpu inference support
ci/build
#20292
opened Jul 1, 2025 by
huangzhengx
Loading…
[Optimization] Cache sampled token ids in model runner
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#20291
opened Jul 1, 2025 by
WoosukKwon
Loading…
[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct
#20286
opened Jun 30, 2025 by
zichongli5
Loading…
3 of 4 tasks
[Misc][Doc] Add missing comment for LLM
frontend
#20285
opened Jun 30, 2025 by
draftbk
Loading…
1 of 4 tasks
Support DeepSeekV3-style block FP8 quantization with CT
#20279
opened Jun 30, 2025 by
mgoin
Loading…
[TPU] Temporary fix vmem oom for long model len by reducing page size
tpu
Related to Google TPUs
v1
#20278
opened Jun 30, 2025 by
Chenyaaang
Loading…
[Docs] use Improvements or additions to documentation
uv
in GPU installation docs
documentation
#20277
opened Jun 30, 2025 by
davidxia
Loading…
[Bugfix][Frontend]: Fix API server connection refused on wsl2
frontend
#20275
opened Jun 30, 2025 by
Chen-zexi
Loading…
3 of 4 tasks
[Bugfix] Fix None value handling in trace span creation for cancelled requests
#20272
opened Jun 30, 2025 by
br4mm
Loading…
3 of 4 tasks
[Docs] Update transcriptions API to use openai client with Improvements or additions to documentation
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
stream=True
documentation
#20271
opened Jun 30, 2025 by
NickLucche
Loading…
[V1] [ROCm] Enable EP with AITER Fused MoE
rocm
Related to AMD ROCm
#20270
opened Jun 30, 2025 by
tjtanaa
Loading…
3 of 4 tasks
[Refactor] Refactor import utils
frontend
multi-modality
Related to multi-modality (#4194)
performance
Performance-related issues
speculative-decoding
structured-output
tool-calling
v1
#20269
opened Jun 30, 2025 by
yewentao256
Loading…
[Benchmark] Add benchmark tool for multi turn conversations
performance
Performance-related issues
#20267
opened Jun 30, 2025 by
pliops-daniels
Loading…
[misc]refactor Related to AMD ROCm
tpu
Related to Google TPUs
v1
Platform.set_device
method
rocm
#20262
opened Jun 30, 2025 by
jikunshang
Loading…
4 tasks
[WIP][Model][VLM] Support JinaVL Reranker
documentation
Improvements or additions to documentation
frontend
#20260
opened Jun 30, 2025 by
shineran96
•
Draft
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.