DeepSeek-V3.1 使用指南¶

简介¶

DeepSeek-V3.1 是一个支持思维模式和非思维模式的混合模型。本指南介绍了如何在 vllm 中动态切换 think 和 non-think 模式。

安装 vLLM¶

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

启动 DeepSeek-V3.1¶

在 8xH200 (或 H20) GPU 上提供服务 (141GB × 8)¶

vllm serve deepseek-ai/DeepSeek-V3.1 \
  --enable-expert-parallel \
  --tensor-parallel-size 8 \
  --served-model-name ds31

函数调用¶

vLLM 还支持调用用户定义的函数。请确保使用以下参数运行您的 DeepSeek-V3.1 模型。示例文件包含在官方容器中，也可以在此处下载。

vllm serve ... 
    --enable-auto-tool-choice 
    --tool-call-parser deepseek_v31 
    --chat-template examples/tool_chat_template_deepseekv31.jinja

使用模型¶

OpenAI 客户端示例¶

您可以使用 OpenAI 客户端如下。您可以通过 extra_body={"chat_template_kwargs": {"thinking": False}} 来控制是否启用思维模式，其中 True 启用思维模式，False 禁用思维模式（非思维模式）。

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "https://:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Who are you?"},
    {"role": "assistant", "content": "<think>Hmm</think>I am DeepSeek"},
    {"role": "user", "content": "9.11 and 9.8, which is greater?"},
]
extra_body = {"chat_template_kwargs": {"thinking": False}}
response = client.chat.completions.create(
    model=model, messages=messages, extra_body=extra_body
)
content = response.choices[0].message.content
print("content:\n", content)

示例输出¶

thinking=True¶

如下所示，输出结果包含 </think>。

 Hmm, the user is asking which number is greater between 9.11 and 9.8. This seems straightforward, but I should be careful because decimals can sometimes confuse people. 

I recall that comparing decimals involves looking at each digit from left to right. Both numbers have the same whole number part (9), so I need to compare the decimal parts. 0.11 is greater than 0.8 because 0.11 is equivalent to 0.110 and 0.8 is 0.800, so 110 thousandths is greater than 800 thousandths? Wait no, that’s wrong. 

Actually, 0.8 is the same as 0.80, and 0.11 is less than 0.80. So 9.11 is actually less than 9.8. I should double-check that. Yes, 9.8 is larger because 0.8 > 0.11. 

I’ll explain it clearly by comparing the tenths place: 9.8 has 8 tenths, while 9.11 has 1 tenth and 1 hundredth, so 8 tenths is indeed larger. 

The answer is 9.8 is greater. I’ll state it confidently and offer further help if needed.</think>9.8 is greater than 9.11.  

To compare them:  
- 9.8 is equivalent to 9.80  
- 9.80 has 8 tenths, while 9.11 has only 1 tenth  
- Since 8 tenths (0.8) is greater than 1 tenth (0.1), 9.8 > 9.11  

Let me know if you need further clarification! 😊

thinking=False¶

 The number **9.11** is greater than **9.8**.  

To compare them:  
- 9.11 = 9 + 11/100  
- 9.8 = 9 + 80/100  

Since 11/100 (0.11) is less than 80/100 (0.80), 9.11 is actually smaller than 9.8. Wait, let me correct that:  

Actually, **9.8 is greater than 9.11**.  

- 9.8 can be thought of as 9.80  
- Comparing 9.80 and 9.11: 80 hundredths is greater than 11 hundredths.  

So, **9.8 > 9.11**.  

Apologies for the initial confusion! 😅

curl 示例¶

您可以运行以下 curl 命令。

curl https://:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ds31",
        "messages": [
            {
                "role": "user",
                "content": "9.11 and 9.8, which is greater?"
            }
        ],
        "chat_template_kwargs": {
            "thinking": true
        }
    }'