Jina Reranker vLLM 部署指南¶
本指南包含使用 vLLM 部署 jinaai/jina-reranker-m0 的说明。这是一个多语言、多模态的 reranker 模型,用于对多种语言的视觉文档进行排序。它处理文本和视觉内容,包括跨越 29 种语言的混合文本、图形、表格和各种布局的页面。
本指南使用 2x NVIDIA T4 GPU 或 2x NVIDIA L4 GPU 来启动此模型。
安装¶
安装 vLLM 和所需的依赖项
在线部署¶
使用 vLLM 将模型部署为生产级 API 服务器。
1. 部署模型服务器¶
# https://docs.vllm.com.cn/en/latest/cli/serve.html
vllm serve jinaai/jina-reranker-m0 \
--host 0.0.0.0 \
--port 8000 \
--tensor_parallel_size 2 \
--gpu-memory-utilization 0.75 \
--max_num_seqs 32
2. Rerank API¶
Rerank API 返回一个根据与查询的相关性排序的文档列表。
请求格式¶
curl -X POST https://:8000/v1/rerank \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"model": "jinaai/jina-reranker-m0",
"query": "What are the health benefits of green tea?",
"documents": [
"Green tea contains antioxidants called catechins that may help reduce inflammation and protect cells from damage.",
"El precio del café ha aumentado un 20% este año debido a problemas en la cadena de suministro.",
"Studies show that drinking green tea regularly can improve brain function and boost metabolism.",
"Basketball is one of the most popular sports in the United States.",
"绿茶富含儿茶素等抗氧化剂,可以降低心脏病风险,还有助于控制体重。",
"Le thé vert est riche en antioxydants et peut améliorer la fonction cérébrale."
],
"top_n": 3,
"return_documents": true
}'
响应格式¶
{
"id": "rerank-f0a2c978b4fb4d61b0a54fd1c05e335f",
"model": "jinaai/jina-reranker-m0",
"usage": {
"total_tokens": 225
},
"results": [
{
"index": 4,
"document": {
"text": "绿茶富含儿茶素等抗氧化剂,可以降低心脏病风险,还有助于控制体重。",
"multi_modal": null
},
"relevance_score": 0.9823843836784363
},
{
"index": 0,
"document": {
"text": "Green tea contains antioxidants called catechins that may help reduce inflammation and protect cells from damage.",
"multi_modal": null
},
"relevance_score": 0.9777672290802002
},
{
"index": 2,
"document": {
"text": "Studies show that drinking green tea regularly can improve brain function and boost metabolism.",
"multi_modal": null
},
"relevance_score": 0.9752224683761597
}
]
}
3. Score API¶
Score API 在不排序的情况下计算查询与多个文档之间的相似度分数。
文本到文本评分¶
curl -X POST https://:8000/v1/score \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"model": "jinaai/jina-reranker-m0",
"text_1": [
"What is the capital of Brazil?",
"What is the capital of France?"
],
"text_2": [
"The capital of Brazil is Brasilia.",
"The capital of France is Paris."
]
}'
请求参数¶
model: 模型标识符(必需)text_1: 查询文本(必需)text_2: 要评分的文档。可以是单个字符串或字符串数组(必需)
响应格式¶
{
"id":"score-30d069df61924c4292579640c0d97bcc",
"object":"list",
"created":1761686670,
"model":"jinaai/jina-reranker-m0",
"data":[
{
"index":0,
"object":"score",
"score":0.9878721237182617
},
{
"index":1,
"object":"score",
"score":0.9879010915756226
}
],
"usage":{
"prompt_tokens":47,
"total_tokens":47,
"completion_tokens":0,
"prompt_tokens_details":null
}
}
多模态评分¶
Score API 支持多模态输入,允许您将文本与图像进行评分,反之亦然。
curl -X POST https://:8000/v1/score \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"model": "jinaai/jina-reranker-m0",
"text_1": "A cat",
"text_2": {
"content": [
{
"type": "image_url",
"image_url": {
"url": "cat_img.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "dog_img.jpg"
}
}
]
}
}'
离线部署¶
在不运行服务器的情况下直接在 Python 代码中使用该模型。
from vllm import LLM
MODEL = "jinaai/jina-reranker-m0"
# Initialize the LLM engine
llm = LLM(
model=MODEL,
tensor_parallel_size=2,
gpu_memory_utilization=0.75,
max_model_len=1024,
max_num_seqs=32,
kv_cache_dtype="fp8",
dtype="bfloat16",
)
# Prepare query and documents
query = "fast recipes for weeknight dinners"
documents = [
"A 65-minute pasta with garlic and olive oil.",
"Slow braised short ribs that cook for 5 hours.",
"Stir-fry veggies with pre-cooked rice.",
]
# Compute scores
res = llm.score(query, documents)
# Extract and print scores
for item in res:
print(item.outputs.score)