测试文件结构和风格指南¶

为了确保项目的可维护性和可持续发展，我们鼓励贡献者在提交代码更改的同时提交测试代码（单元测试、系统测试或端到端测试）。本文档概述了组织和命名测试文件的指南。

测试类型¶

单元测试和系统测试¶

对于单元测试和系统测试，我们强烈建议将测试文件放置在被测试源代码的相同目录结构中，并使用 test_*.py 的命名约定。

模型的端到端 (E2E) 测试¶

端到端测试用于验证系统或组件的完整功能。对于我们的项目，不同 Omni 模型的 E2E 测试被组织到两个子目录中

tests/e2e/offline_inference/：用于离线推理模式的测试（例如，Qwen3Omni 离线推理）
tests/e2e/online_serving/：用于在线服务场景的测试（例如，API 服务器测试）

示例： vllm_omni/entrypoints/omni_llm.py 的测试文件应位于 tests/entrypoints/test_omni_llm.py。

测试目录结构¶

理想的目录结构应镜像源代码的组织结构

vllm_omni/                          tests/
├── config/                    →    ├── config/
│   └── model.py                    │   └── test_model.py
│
├── core/                      →    ├── core/
│   └── sched/                      │   └── sched/                    # Maps to core/sched/
│       ├── omni_ar_scheduler.py    │       ├── test_omni_ar_scheduler.py
│       ├── omni_generation_scheduler.py │  ├── test_omni_generation_scheduler.py
│       └── output.py               │       └── test_output.py
│
├── diffusion/                 →    ├── diffusion/
│   ├── diffusion_engine.py         │   ├── test_diffusion_engine.py
│   ├── omni_diffusion.py           │   ├── test_omni_diffusion.py
│   ├── attention/                  │   ├── attention/                # Maps to diffusion/attention/
│   │   └── backends/               │   │   └── test_*.py
│   ├── models/                     │   ├── models/                   # Maps to diffusion/models/
│   │   ├── qwen_image/             │   │   ├── qwen_image/
│   │   │   └── ...                 │   │   │   └── test_*.py
│   │   └── z_image/                │   │   └── z_image/
│   │       └── ...                 │   │       └── test_*.py
│   └── worker/                     │   └── worker/                   # Maps to diffusion/worker/
│       └── ...                     │       └── test_*.py
│
├── distributed/               →    ├── distributed/
│   └── ...                         │   └── test_*.py
│
├── engine/                    →    ├── engine/
│   ├── processor.py                │   ├── test_processor.py
│   └── output_processor.py         │   └── test_output_processor.py
│
├── entrypoints/               →    ├── entrypoints/
│   ├── omni_llm.py                 │   ├── test_omni_llm.py          # UT: OmniLLM core logic (mocked)
│   ├── omni_stage.py               │   ├── test_omni_stage.py         # UT: OmniStage logic
│   ├── omni.py                     │   ├── test_omni.py               # E2E: Omni class (offline inference)
│   ├── async_omni.py               │   ├── test_async_omni.py         # E2E: AsyncOmni class
│   ├── cli/                        │   ├── cli/                       # Maps to entrypoints/cli/
│   │   └── ...                     │   │   └── test_*.py
│   └── openai/                     │   └── openai/                     # Maps to entrypoints/openai/
│       ├── api_server.py           │       ├── test_api_server.py     # E2E: API server (online serving)
│       └── serving_chat.py         │       └── test_serving_chat.py
│
├── inputs/                    →    ├── inputs/
│   ├── data.py                     │   ├── test_data.py
│   ├── parse.py                    │   ├── test_parse.py
│   └── preprocess.py               │   └── test_preprocess.py
│
├── model_executor/            →    ├── model_executor/
│   ├── layers/                     │   ├── layers/
│   │   └── mrope.py                │   │   └── test_mrope.py
│   ├── model_loader/               │   ├── model_loader/
│   │   └── weight_utils.py         │   │   └── test_weight_utils.py
│   ├── models/                     │   ├── models/
│   │   ├── qwen2_5_omni/           │   │   ├── qwen2_5_omni/
│   │   │   ├── qwen2_5_omni_thinker.py │ │   │   ├── test_qwen2_5_omni_thinker.py  # UT
│   │   │   ├── qwen2_5_omni_talker.py │ │   │   ├── test_qwen2_5_omni_talker.py  # UT
│   │   │   └── qwen2_5_omni_token2wav.py │ │   │   └── test_qwen2_5_omni_token2wav.py  # UT
│   │   └── qwen3_omni/             │   │   └── qwen3_omni/
│   │       └── ...                 │   │       └── test_*.py
│   ├── stage_configs/              │   └── stage_configs/             # Configuration tests (if needed)
│   │   └── ...                     │       └── test_*.py
│   └── stage_input_processors/     │   └── stage_input_processors/
│       └── ...                     │       └── test_*.py
│
├── sample/                    →    ├── sample/
│   └── ...                         │   └── test_*.py
│
├── utils/                     →    ├── utils/
│   └── platform_utils.py           │   └── test_platform_utils.py
│
├── worker/                    →    ├── worker/
    ├── gpu_ar_worker.py            │   ├── test_gpu_ar_worker.py
    ├── gpu_generation_worker.py    │   ├── test_gpu_generation_worker.py
    ├── gpu_model_runner.py         │   ├── test_gpu_model_runner.py
    └── npu/                        │   └── npu/                       # Maps to worker/npu/
        └── ...                     │       └── test_*.py
│
└── e2e/                       →    ├── e2e/                # End-to-end scenarios (no 1:1 source mirror)
                                    ├── online_serving/       # Full-stack online serving flows
                                    │   └── (empty for now)
                                    └── offline_inference/    # Full offline inference flows
                                        ├── test_qwen2_5_omni.py     # Moved from multi_stages/
                                        ├── test_qwen3_omni.py       # Moved from multi_stages_h100/
                                        ├── test_t2i_model.py  # Moved from single_stage/
                                        └── stage_configs/           # Shared stage configs
                                            ├── qwen2_5_omni_ci.yaml
                                            └── qwen3_omni_ci.yaml

命名约定¶

单元/系统测试：使用 test_<module_name>.py 格式
示例： omni_llm.py → test_omni_llm.py
E2E 测试：放置在 tests/e2e/offline_inference/ 或 tests/e2e/online_serving/ 中，并使用描述性名称
示例： tests/e2e/offline_inference/test_qwen3_omni.py, tests/e2e/offline_inference/test_diffusion_model.py

最佳实践¶

镜像源代码结构：测试目录应镜像源代码的结构
测试类型指示符：使用注释来指示测试类型（UT 表示单元测试，E2E 表示端到端测试）
共享资源：将共享的测试配置（例如，CI 配置）放置在适当的子目录中
一致的命名：在所有测试文件中始终遵循 test_*.py 命名约定

测试代码要求¶

编码风格¶

文件头：为所有测试文件添加 SPDX 许可证头
导入：请勿使用手动 sys.path 修改，请使用标准导入。
测试类型区分:
- 单元测试：保持模拟风格
- 模型的 E2E 测试：考虑统一使用 OmniRunner，避免使用装饰器
文档：为所有测试函数添加 docstrings
环境变量：统一在 conftest.py 或文件顶部设置
类型注解：为所有测试函数参数添加类型注解
资源，使用 pytest 标签指定测试所需的计算资源。

模板¶

E2E - 在线服务¶

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""
Online E2E smoke test for an omni model (video,text,audio → audio).
"""
from pathlib import Path

import pytest
import openai


# Optional: set process start method for workers
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"

models = ["{your model name}"] #Edit here to load your model
stage_configs = [str(Path(__file__).parent / "stage_configs" / {your model yaml})] #Edit here to load your model yaml
test_params = [(model, stage_config) for model in models for stage_config in stage_configs]

#OmniServer，Used to start the vllm-omni server
class OmniServer:
    xxx


@pytest.fixture
def omni_server(request):
    model, stage_config_path = request.param
    with OmniServer(model, ["--stage-configs-path", stage_config_path]) as server:
        yield server


#handle request message
@pytest.fixture(scope="session")
def base64_encoded_video() -> str:
    xxx

@pytest.fixture(scope="session")
def dummy_messages_from_video_data(video_data_url: str, content_text: str) -> str:
    xxx

@pytest.mark.parametrize("omni_server", test_params, indirect=True)
def test_video_to_audio(
    client: openai.OpenAI,
    omni_server,
    base64_encoded_video: str,
) -> None:
    #set message
    video_data_url = f"data:video/mp4;base64, {base64_encoded_video}"
    messages = dummy_messages_from_video_data(video_data_url)

    #send request
    chat_completion = client.chat.completions.create(
        model=omni_server.model,
        messages=messages,
    )

    #verify text output
    text_choice = chat_completion.choices[0]
    assert text_choice.finish_reason == "length"

    #verify audio output
    audio_choice = chat_completion.choices[1]
    audio_message = audio_choice.message
    if hasattr(audio_message, "audio") and audio_message.audio:
        assert audio_message.audio.data is not None
        assert len(audio_message.audio.data) > 0

E2E - 离线推理¶

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""
Offline E2E smoke test for an omni model (video → audio).
"""

import os
from pathlib import Path

import pytest
from vllm.assets.video import VideoAsset

from ..multi_stages.conftest import OmniRunner

# Optional: set process start method for workers
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"

models = ["{your model name}"] #Edit here to load your model
stage_configs = [str(Path(__file__).parent / "stage_configs" / {your model yaml})] #Edit here to load your model yaml

# Create parameter combinations for model and stage config
test_params = [(model, stage_config) for model in models for stage_config in stage_configs]

# function name: test_{input_modality}_to_{output_modality}
# modality candidate: text, image, audio, video, mixed_modalities
@pytest.mark.gpu_mem_high  # requires high-memory GPU node
@pytest.mark.parametrize("test_config", test_params)
def test_video_to_audio(omni_runner: type[OmniRunner], model: str) -> None:
    """Offline inference: video input, audio output."""
    model, stage_config_path = test_config
    with omni_runner(model, seed=42, stage_configs_path=stage_config_path) as runner:
        # Prepare inputs
        video = VideoAsset(name="sample", num_frames=4).np_ndarrays

        outputs = runner.generate_multimodal(
            prompts="Describe this video briefly.",
            videos=video,
        )

        # Minimal assertions: got outputs and at least one audio result
        assert outputs
        has_audio = any(o.final_output_type == "audio" for o in outputs)
        assert has_audio

提交测试文件前的清单¶

文件已保存在合适的位置，文件名清晰。
编码风格符合上述要求。
对于 e2e 模型测试，请确保测试已在 ./buildkite/ 文件夹下配置。