跳到内容

vLLM 指南

InternVL3.5 使用指南

InternVL3.5 使用指南¶

InternVL3.5 是由上海人工智能实验室开发的一个视觉语言模型。本指南将介绍如何使用 vLLM 部署 InternVL3.5，并提供一些简单的 API 使用示例。

安装 vLLM¶

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

使用 vLLM 启动 InternVL3.5¶

vllm serve OpenGVLab/InternVL3_5-8B --trust-remote-code

API 使用示例¶

纯文本聊天¶

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role':
        'user',
        'content': [{
            'type': 'text',
            'text': '9.11 and 9.8, which is greater?',
        }],
    }],
    temperature=0.6,
    top_p=0.95,
)
print(response.choices[0].message.content)

图像聊天¶

单张图像¶

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role':
        'user',
        'content': [{
            'type': 'text',
            'text': 'Describe the image.',
        }, {
            'type': 'image_url',
            'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'},
        }],
    }],
    temperature=0.0
)
print(response.choices[0].message.content)

多张图像¶

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role':
        'user',
        'content': [{
            'type': 'text',
            'text': 'Describe these two images.',
        }, {
            'type': 'image_url',
            'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg'},
        }, {
            'type': 'image_url',
            'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg'},
        }],
    }],
    temperature=0.0
)
print(response.choices[0].message.content)

思考模式¶

要启用思考模式，请将系统提示设置为我们的思考模式系统提示。启用思考模式时，我们建议将 temperature 设置为 0.6 以减轻不期望的重复。

from openai import OpenAI
client = OpenAI(api_key='', base_url='http://0.0.0.0:8000/v1')
model_name = client.models.list().data[0].id

THINKING_SYSTEM_PROMPT = """
You are an AI assistant that rigorously follows this response protocol:

1. First, conduct a detailed analysis of the question. Consider different angles, potential solutions, and reason through the problem step-by-step. Enclose this entire thinking process within <think> and </think> tags.

2. After the thinking section, provide a clear, concise, and direct answer to the user's question. Separate the answer from the think section with a newline.

Ensure that the thinking process is thorough but remains focused on the query. The final answer should be standalone and not reference the thinking section.
""".strip()

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        'role': 'system',
        'content': [{
            'type': 'text',
            'text': THINKING_SYSTEM_PROMPT,
        }],
    }, {
        'role': 'user',
        'content': [{
            'type': 'text',
            'text': '9.11 and 9.8, which is greater?',
        }],
    }],
    temperature=0.6,
    top_p=0.95,
)
print(response.choices[0].message.content)

额外资源¶