工具辅助安装

支持工具的安装#

本教程将指导您完成使用 Llama-3.1-8B-Instruct 模型设置支持工具调用的 vLLM Production Stack。此设置使您的模型能够通过结构化接口与外部工具和函数进行交互。

先决条件#

来自快速入门教程的所有先决条件
一个拥有 Llama-3.1-8B-Instruct 访问权限的 Hugging Face 帐户
在 Hugging Face 上已接受 meta-llama/Llama-3.1-8B-Instruct 的条款
有效的 Hugging Face 令牌
本地机器上已安装 Python 3.7+
已安装 openai Python 包（pip install openai）
具有存储提供商支持的 Kubernetes 群集访问权限

步骤#

1. 设置 vLLM 模板和存储#

首先，运行设置脚本以下载模板并创建必要的 Kubernetes 资源

# Make the script executable
chmod +x scripts/setup_vllm_templates.sh

# Run the setup script
./scripts/setup_vllm_templates.sh

此脚本将

从 vLLM 存储库下载所需的模板
创建用于存储模板的 PersistentVolume
创建用于访问模板的 PersistentVolumeClaim
验证设置是否完成

脚本使用与部署配置匹配的一致命名

PersistentVolume: vllm-templates-pv
PersistentVolumeClaim: vllm-templates-pvc

2. 设置 Hugging Face 凭证#

使用您的 Hugging Face 令牌创建 Kubernetes secret

kubectl create secret generic huggingface-credentials \
  --from-literal=HUGGING_FACE_HUB_TOKEN=your_token_here

3. 部署支持工具调用的 vLLM 实例#

3.1: 使用示例配置#

我们将使用位于 tutorials/assets/values-08-tool-enabled.yaml 的示例配置文件。此文件包含启用工具调用的所有必需设置

servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
  - name: "llama3-8b"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "meta-llama/Llama-3.1-8B-Instruct"

    # Tool calling configuration
    enableTool: true
    toolCallParser: "llama3_json"  # Parser to use for tool calls (e.g., "llama3_json" for Llama models)
    chatTemplate: "tool_chat_template_llama3.1_json.jinja"  # Template file name (will be mounted at /vllm/templates)

    # Mount Hugging Face credentials
    env:
      - name: HUGGING_FACE_HUB_TOKEN
        valueFrom:
          secretKeyRef:
            name: huggingface-credentials
            key: HUGGING_FACE_HUB_TOKEN

    replicaCount: 1

    # Resource requirements for Llama-3.1-8B-Instruct
    requestCPU: 8
    requestMemory: "32Gi"
    requestGPU: 1

注意

工具调用配置现已简化

enableTool: true 启用该功能
toolCallParser：指定如何解析模型的工具调用（对于 Llama-3 模型使用“llama3_json”）
chatTemplate：指定模板文件名（将挂载在 /vllm/templates/）

聊天模板通过我们在第 1 步创建的 PersistentVolume 进行管理，该卷提供了许多好处

模板下载一次并持久存储
模板可供多个部署共享
通过更新 PersistentVolume 中的文件即可更新模板
模板与 vLLM 存储库进行版本控制

3.2: 部署 Helm Chart#

# Add the vLLM Helm repository if you haven't already
helm repo add vllm https://vllm-project.github.io/production-stack

# Deploy the vLLM stack with tool calling support using the example configuration
helm install vllm-tool vllm/vllm-stack -f tutorials/assets/values-08-tool-enabled.yaml

部署将

使用我们在第 1 步创建的 PersistentVolume 来访问模板
在容器中的 /vllm/templates 挂载模板
配置模型使用指定的模板进行工具调用

您可以使用以下命令验证部署

# Check the deployment status
kubectl get deployments

# Check the pods
kubectl get pods

# Check the logs
kubectl logs -f deployment/vllm-tool-llama3-8b-deployment-vllm

4. 测试工具调用设置#

部署运行后，让我们使用示例脚本测试工具调用功能。

4.1: 端口转发 Router 服务#

首先，我们需要设置端口转发以访问 Router 服务

# Get the service name
kubectl get svc

# Set up port forwarding to the router service
kubectl port-forward svc/vllm-tool-router-service 8000:80

4.2: 运行示例脚本#

在一个新终端中，运行示例脚本以测试工具调用

# Navigate to the examples directory
cd src/examples

# Run the example script
python tool_calling_example.py

脚本将

通过端口转发的端点连接到 vLLM 服务
发送一个关于天气的测试查询
演示模型的能力
- 理解可用工具
- 进行适当的工具调用
- 处理工具响应

预期输出应类似于

Function called: get_weather
Arguments: {"location": "San Francisco, CA", "unit": "celsius"}
Result: Getting the weather for San Francisco, CA in celsius...

这证实了

vLLM 服务正在正常运行
工具调用已正确启用
模型可以理解和使用定义的工具
模板系统运行正常

注意

该示例使用模拟的天气函数进行演示。在实际应用程序中，您会将其替换为对天气服务的实际 API 调用。