结构化输出#

vLLM 支持使用 outlineslm-format-enforcerxgrammar 作为引导解码的后端来生成结构化输出。本文档向您展示一些可用于生成结构化输出的不同选项的示例。

在线服务 (OpenAI API)#

您可以使用 OpenAI 的 CompletionsChat API 生成结构化输出。

支持以下参数,必须作为额外参数添加

  • guided_choice:输出将是选项之一。

  • guided_regex:输出将遵循正则表达式模式。

  • guided_json:输出将遵循 JSON 模式。

  • guided_grammar:输出将遵循上下文无关文法。

  • guided_whitespace_pattern:用于覆盖引导式 JSON 解码的默认空格模式。

  • guided_decoding_backend:用于选择要使用的引导式解码后端。可以在后端名称后用逗号分隔的列表提供其他特定于后端的选项。例如,"xgrammar:no-fallback" 将不允许 vLLM 在出错时回退到不同的后端。

您可以在OpenAI 兼容服务器页面上查看支持参数的完整列表。

现在让我们看看每个案例的示例,从 guided_choice 开始,因为它是最简单的

from openai import OpenAI
client = OpenAI(
    base_url="https://127.0.0.1:8000/v1",
    api_key="-",
)

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-3B-Instruct",
    messages=[
        {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={"guided_choice": ["positive", "negative"]},
)
print(completion.choices[0].message.content)

下一个示例展示了如何使用 guided_regex。想法是根据简单的正则表达式模板生成电子邮件地址

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-3B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line. Example result: [email protected]\n",
        }
    ],
    extra_body={"guided_regex": "\w+@\w+\.com\n", "stop": ["\n"]},
)
print(completion.choices[0].message.content)

结构化文本生成中最相关的特性之一是生成具有预定义字段和格式的有效 JSON 的选项。为此,我们可以使用两种不同的方式使用 guided_json 参数

下一个示例展示了如何将 guided_json 参数与 Pydantic 模型一起使用

from pydantic import BaseModel
from enum import Enum

class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"


class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType


json_schema = CarDescription.model_json_schema()

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-3B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
        }
    ],
    extra_body={"guided_json": json_schema},
)
print(completion.choices[0].message.content)

提示

虽然不是绝对必要,但通常最好在提示中表明需要生成 JSON,以及哪些字段以及 LLM 应如何填充它们。在大多数情况下,这可以显著改善结果。

最后我们有 guided_grammar,这可能是最难使用的一个,但它非常强大,因为它允许我们定义完整的语言,例如 SQL 查询。它的工作原理是使用上下文无关的 EBNF 文法,例如,我们可以使用它来定义特定格式的简化 SQL 查询,如下例所示

simplified_sql_grammar = """
    ?start: select_statement

    ?select_statement: "SELECT " column_list " FROM " table_name

    ?column_list: column_name ("," column_name)*

    ?table_name: identifier

    ?column_name: identifier

    ?identifier: /[a-zA-Z_][a-zA-Z0-9_]*/
"""

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-3B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table.",
        }
    ],
    extra_body={"guided_grammar": simplified_sql_grammar},
)
print(completion.choices[0].message.content)

完整示例:examples/online_serving/openai_chat_completion_structured_outputs.py

实验性自动解析 (OpenAI API)#

本节介绍 client.chat.completions.create() 方法上的 OpenAI beta 包装器,该包装器提供与 Python 特定类型更丰富的集成。

在撰写本文时(openai==1.54.4),这是 OpenAI 客户端库中的“beta”功能。代码参考可以在这里找到。

对于以下示例,vLLM 使用 vllm serve meta-llama/Llama-3.1-8B-Instruct 进行设置

这是一个简单的示例,演示如何使用 Pydantic 模型获取结构化输出

from pydantic import BaseModel
from openai import OpenAI


class Info(BaseModel):
    name: str
    age: int


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
completion = client.beta.chat.completions.parse(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
    ],
    response_format=Info,
    extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age)

输出

ParsedChatCompletionMessage[Testing](content='{"name": "Cameron", "age": 28}', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=Testing(name='Cameron', age=28))
Name: Cameron
Age: 28

这是一个更复杂的示例,使用嵌套的 Pydantic 模型来处理逐步数学解决方案

from typing import List
from pydantic import BaseModel
from openai import OpenAI


class Step(BaseModel):
    explanation: str
    output: str


class MathResponse(BaseModel):
    steps: List[Step]
    final_answer: str


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
completion = client.beta.chat.completions.parse(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful expert math tutor."},
        {"role": "user", "content": "Solve 8x + 31 = 2."},
    ],
    response_format=MathResponse,
    extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
print(message)
assert message.parsed
for i, step in enumerate(message.parsed.steps):
    print(f"Step #{i}:", step)
print("Answer:", message.parsed.final_answer)

输出

ParsedChatCompletionMessage[MathResponse](content='{ "steps": [{ "explanation": "First, let\'s isolate the term with the variable \'x\'. To do this, we\'ll subtract 31 from both sides of the equation.", "output": "8x + 31 - 31 = 2 - 31"}, { "explanation": "By subtracting 31 from both sides, we simplify the equation to 8x = -29.", "output": "8x = -29"}, { "explanation": "Next, let\'s isolate \'x\' by dividing both sides of the equation by 8.", "output": "8x / 8 = -29 / 8"}], "final_answer": "x = -29/8" }', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=MathResponse(steps=[Step(explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation.", output='8x + 31 - 31 = 2 - 31'), Step(explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.', output='8x = -29'), Step(explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8.", output='8x / 8 = -29 / 8')], final_answer='x = -29/8'))
Step #0: explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation." output='8x + 31 - 31 = 2 - 31'
Step #1: explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.' output='8x = -29'
Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8." output='8x / 8 = -29 / 8'
Answer: x = -29/8

离线推理#

离线推理允许相同类型的引导式解码。要使用它,我们需要使用 SamplingParams 内的 GuidedDecodingParams 类配置引导式解码。GuidedDecodingParams 内的主要可用选项是

  • json

  • regex

  • choice

  • grammar

  • backend

  • whitespace_pattern

这些参数的使用方式与上述在线服务示例中的参数相同。下面显示了 choices 参数用法的一个示例

from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams

llm = LLM(model="HuggingFaceTB/SmolLM2-1.7B-Instruct")

guided_decoding_params = GuidedDecodingParams(choice=["Positive", "Negative"])
sampling_params = SamplingParams(guided_decoding=guided_decoding_params)
outputs = llm.generate(
    prompts="Classify this sentiment: vLLM is wonderful!",
    sampling_params=sampling_params,
)
print(outputs[0].outputs[0].text)

完整示例:examples/offline_inference/structured_outputs.py