测试#
本文档将解释如何编写端到端测试和单元测试来验证您功能的实现。
设置测试环境#
设置测试环境最快的方法是使用主分支的容器镜像
您可以通过以下步骤在 CPU 上运行单元测试
cd ~/vllm-project/
# ls
# vllm vllm-ascend
# Use mirror to speed up download
# docker pull quay.nju.edu.cn/ascend/cann:8.3.rc2-910b-ubuntu22.04-py3.11
export IMAGE=quay.io/ascend/cann:8.3.rc2-910b-ubuntu22.04-py3.11
docker run --rm --name vllm-ascend-ut \
-v $(pwd):/vllm-project \
-v ~/.cache:/root/.cache \
-ti $IMAGE bash
# (Optional) Configure mirror to speed up download
sed -i 's|ports.ubuntu.com|mirrors.huaweicloud.com|g' /etc/apt/sources.list
pip config set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple/
# For torch-npu dev version or x86 machine
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"
apt-get update -y
apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2
# Install vllm
cd /vllm-project/vllm
VLLM_TARGET_DEVICE=empty python3 -m pip -v install .
# Install vllm-ascend
cd /vllm-project/vllm-ascend
# [IMPORTANT] Import LD_LIBRARY_PATH to enumerate the CANN environment under CPU
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
python3 -m pip install -r requirements-dev.txt
python3 -m pip install -v .
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci0
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:main
docker run --rm \
--name vllm-ascend \
--shm-size=1g \
--device $DEVICE \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-p 8000:8000 \
-it $IMAGE bash
启动容器后,您应该安装所需的包
# Prepare
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Install required packages
pip install -r requirements-dev.txt
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:main
docker run --rm \
--name vllm-ascend \
--shm-size=1g \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-p 8000:8000 \
-it $IMAGE bash
启动容器后,您应该安装所需的包
cd /vllm-workspace/vllm-ascend/
# Prepare
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Install required packages
pip install -r requirements-dev.txt
运行测试#
单元测试#
编写单元测试时应遵循一些原则
测试文件路径应与源文件一致,并以
test_前缀开头,例如:vllm_ascend/worker/worker.py–>tests/ut/worker/test_worker.pyvLLM Ascend 测试使用 unittest 框架。请参阅 此处 了解如何编写单元测试。
所有单元测试都可以在 CPU 上运行,因此您必须模拟与设备相关的函数以支持宿主机。
您可以使用
pytest运行单元测试
# Run unit tests
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
TORCH_DEVICE_BACKEND_AUTOLOAD=0 pytest -sv tests/ut
cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
pytest -sv tests/ut
# Run single test
pytest -sv tests/ut/test_ascend_config.py
cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
pytest -sv tests/ut
# Run single test
pytest -sv tests/ut/test_ascend_config.py
端到端测试#
尽管 vllm-ascend CI 在 Ascend CI 上提供了 端到端测试,但您也可以在本地运行它。
您不能在 CPU 上运行端到端测试。
cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/
# Run a certain test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py
# Run a certain case in test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models
cd /vllm-workspace/vllm-ascend/
# Run all the single card tests
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/
# Run a certain test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py
# Run a certain case in test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models
这将重现端到端测试。请参阅 vllm_ascend_test.yaml。
在本地运行夜间多节点测试用例,请参考 多节点测试 的 本地运行 部分。
端到端测试示例:#
正确性测试示例:
tests/e2e/singlecard/test_aclgraph_accuracy.pyCI 资源有限,您可能需要减少模型的层数。以下是如何生成一个层数减少的模型示例
Fork modelscope 上的原始模型仓库。仓库中的所有文件(除了权重)都是必需的。
将
num_hidden_layers设置为预期的层数,例如:{"num_hidden_layers": 2,}将以下 Python 脚本复制为
generate_random_weight.py。根据需要设置相关参数MODEL_LOCAL_PATH、DIST_DTYPE和DIST_MODEL_PATHimport torch from transformers import AutoTokenizer, AutoConfig from modeling_deepseek import DeepseekV3ForCausalLM from modelscope import snapshot_download MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning" DIST_DTYPE = torch.bfloat16 DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer" config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True) model = DeepseekV3ForCausalLM(config) model = model.to(DIST_DTYPE) model.save_pretrained(DIST_MODEL_PATH)
运行 doctest#
vllm-ascend 提供了一个 vllm-ascend/tests/e2e/run_doctests.sh 命令来运行文档文件中的所有 doctests。doctest 是确保文档保持最新和示例保持可执行的好方法,可以在本地按如下方式运行
# Run doctest
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
这将重现与 CI 相同的环境。请参阅 vllm_ascend_doctest.yaml。