Committers¶

本文档列出了 vLLM 项目的当前 Committer 以及他们维护的核心领域。Committer 拥有 vLLM 仓库的写入权限，并负责审查和合并 PR。您还可以参考 CODEOWNERS 文件获取具体的文件级别归属和审查人信息。本文档和 CODEOWNERS 文件都是动态更新的，它们是互补的。

活跃 Committer¶

我们试图用简短的几句话来总结每位 Committer 在 vLLM 中的角色。总的来说，vLLM 的 Committer 覆盖了广泛的领域，并在维护过程中相互帮助。有关具体的组件归属细节，请参考后面的“领域负责人”部分。按 GitHub 用户名字母顺序排序

@22quinn: RL API
@aarnphm: 结构化输出
@alexm-redhat: 性能
@ApostaC: Connectors, offloading
@benchislett: 引擎核心和 spec decode
@bigPYJ1151: Intel CPU/XPU 集成
@chaunceyjiang: 工具使用和推理解析器
@DarkLight1337: 多模态，API 服务器
@esmeetu: 开发者营销，社区
@gshtras: AMD 集成
@heheda12345: 混合内存分配器
@hmellor: Hugging Face 集成，文档
@houseroad: 引擎核心和 Llama 模型
@Isotr0py: 多模态，新模型支持
@jeejeelee: LoRA，新模型支持
@jikunshang: Intel CPU/XPU 集成
@khluu: CI 基础设施
@KuntaiDu: KV Connector
@LucasWilkinson: Kernels 和性能
@luccafong: Llama 模型，speculative decoding，分布式
@markmc: 可观测性
@mgoin: 量化和性能
@NickLucche: KV connector
@njhill: 分布式，API 服务器，引擎核心
@noooop: Pooling models
@patrickvonplaten: Mistral 模型，新模型支持
@pavanimajety: NVIDIA GPU 集成
@ProExpertProg: 编译，启动 UX
@robertgshaw2-redhat: Core, distributed, disagg
@ruisearch42: Pipeline parallelism, Ray Support
@russellb: 结构化输出，引擎核心，安全
@sighingnow: Qwen 模型，新模型支持
@simon-mo: 项目负责人，API 入口，社区
@tdoublep: State space models
@tjtanaa: AMD GPU 集成
@tlrmchlsmth: Kernels and performance, distributed, disagg
@WoosukKwon: 项目负责人，引擎核心
@yaochengji: TPU 集成
@yeqcharlotte: Benchmark, Llama 模型
@yewentao256: Kernels and performance
@Yikun: Pluggable hardware interface
@youkaichao: 项目负责人，分布式，编译，社区
@ywang96: 多模态，benchmark
@zhuohan123: 项目负责人，RL 集成，numerics
@zou3519: 编译

荣誉 Committer¶

过去曾为 vLLM 做出重大贡献（感谢！）但现已不再活跃的 Committer

@andoorve: Pipeline parallelism
@cadedaniel: Speculative decoding
@comaniac: KV cache management, pipeline parallelism
@LiuXiaoxuanPKU: Speculative decoding
@pcmoritz: MoE
@rkooo567: Chunked prefill
@sroy745: Speculative decoding
@Yard1: kernels and performance
@zhisbug: Arctic models, distributed

领域负责人¶

本节按 vLLM 组件细分了活跃 Committer，并列出了领域负责人。如果您有涉及该领域的 PR，请随时 ping 领域负责人进行审查。

引擎核心¶

Scheduler: vLLM 引擎的核心循环，将请求调度到下一个批次
- @WoosukKwon, @robertgshaw2-redhat, @njhill, @heheda12345
KV Cache Manager: 调度器内的内存管理层，维护 KV 缓存的逻辑块数据
- @heheda12345, @WoosukKwon
AsyncLLM: 基于 zmq 的协议，托管引擎核心并使其可供入口点访问
- @robertgshaw2-redhat, @njhill, @russellb
ModelRunner, Executor, Worker: 用于包装模型实现的引擎的抽象
- @WoosukKwon, @tlrmchlsmth, @heheda12345, @LucasWilkinson, @ProExpertProg
KV Connector: 用于 KV 缓存卸载和传输的连接器接口和实现
- @robertgshaw2-redhat, @njhill, @KuntaiDu, @NickLucche, @ApostaC
Distributed, Parallelism, Process Management: 进程启动器，管理每个 worker，并将其分配给正确的 DP/TP/PP/EP rank
- @youkaichao, @njhill, @WoosukKwon, @ruisearch42
Collectives: nccl 和其他通信库/kernels 的使用
- @tlrmchlsmth, @youkaichao
多模态引擎和内存管理: 涉及视觉、音频和视频输入的关键调度和内存管理。
- @ywang96, @DarkLight1337

模型实现¶

Model Interface: 各种模型的 nn.Module 接口和实现
- @zhuohan123, @mgoin, @simon-mo, @houseroad, @ywang96 (multimodality), @jeejeelee (lora)
Logits Processors / Sampler: 提供的 sampler 类和可插入的 logits processors
- @njhill, @houseroad, @22quinn
Custom Layers: vLLM 中的实用层，如 rotary embedding 和 rms norms
- @ProExpertProg
Attention: paged attention 的 Attention 接口
- @WoosukKwon, @LucasWilkinson, @heheda12345
FusedMoE: FusedMoE kernel, Modular kernel framework, EPLB
- @tlrmchlsmth
Quantization: 各种量化配置、权重加载和 kernel。
- @mgoin, @Isotr0py, @yewentao256
Custom quantized GEMM kernels (cutlass_scaled_mm, marlin, machete)
- @tlrmchlsmth, @LucasWilkinson
Multi-modal Input Processing: 加载和处理图像/视频/音频数据到特征张量的组件
- @DarkLight1337, @ywang96, @Isotr0py
torch compile: vLLM 中的 torch.compile 集成，自定义 pass & transformations
- @ProExpertProg, @zou3519, @youkaichao
State space models: vLLM 中的 state space models 实现
- @tdoublep, @tlrmchlsmth
Reasoning and tool calling parsers
- @chaunceyjiang, @aarnphm

入口点¶

LLM Class: 用于离线推理的 LLM 类
- @DarkLight1337
API Server: 兼容 OpenAI 的 API 服务器
- @DarkLight1337, @njhill, @aarnphm, @simon-mo, @heheda12345 (Responses API)
Batch Runner: 兼容 OpenAI 的 batch runner
- @simon-mo

功能特性¶

Spec Decode: 涵盖模型定义、attention、sampler 和调度器，与 n-grams、EAGLE 和 MTP 相关。
- @WoosukKwon, @benchislett, @luccafong
Structured Output: 结构化输出实现
- @russellb, @aarnphm
RL: RL 相关功能，如 collective rpc，sleep mode 等。
- @youkaichao, @zhuohan123, @22quinn
LoRA: @jeejeelee
Observability: Metrics and Logging
- @markmc, @robertgshaw2-redhat, @simon-mo

代码库¶

Config: 配置注册和解析
- @hmellor
Documentation: @hmellor, @DarkLight1337, @simon-mo
Benchmarks: @ywang96, @simon-mo
CI, Build, Release Process: @khluu, @njhill, @simon-mo
Security: @russellb

外部 Kernels 集成¶

FlashAttention: @LucasWilkinson
FlashInfer: @LucasWilkinson, @mgoin, @WoosukKwon
Blackwell Kernels: @mgoin, @yewentao256
DeepEP/DeepGEMM/pplx: @mgoin, @yewentao256

集成¶

Hugging Face: @hmellor, @Isotr0py
Ray: @ruisearch42
NIXL: @robertgshaw2-redhat, @NickLucche

与模型供应商合作¶

gpt-oss: @heheda12345, @simon-mo, @zhuohan123
Llama: @luccafong
Qwen: @sighingnow
Mistral: @patrickvonplaten

硬件¶

Plugin Interface: @youkaichao, @Yikun
NVIDIA GPU: @pavanimajety
AMD GPU: @gshtras, @tjtanaa
Intel CPU/GPU: @jikunshang, @bigPYJ1151
Google TPU: @yaochengji

生态项目¶

Ascend NPU: @wangxiyuan 和更多详情
Intel Gaudi HPU @xuechendi 和 @kzawora-intel
Semantic Router: @xunzhuo, @rootfs 和更多详情