dstack¶
您可以在基于云的GPU机器上运行vLLM,使用dstack。dstack是一个开源框架,可以在任何云上运行LLM。本教程假设您已经在云环境中配置了凭据、网关和GPU配额。
要安装dstack客户端,请运行
接下来,要配置您的dstack项目,请运行
接下来,要为LLM(本例中为NousResearch/Llama-2-7b-chat-hf)配置VM实例,请为dstack Service创建以下serve.dstack.yml文件:
配置
然后,运行以下CLI进行配置:
命令
$ dstack run . -f serve.dstack.yml
⠸ Getting run plan...
Configuration serve.dstack.yml
Project deep-diver-main
User deep-diver
Min resources 2..xCPU, 8GB.., 1xGPU (24GB)
Max price -
Max duration -
Spot policy auto
Retry policy no
# BACKEND REGION INSTANCE RESOURCES SPOT PRICE
1 gcp us-central1 g2-standard-4 4xCPU, 16GB, 1xL4 (24GB), 100GB (disk) yes $0.223804
2 gcp us-east1 g2-standard-4 4xCPU, 16GB, 1xL4 (24GB), 100GB (disk) yes $0.223804
3 gcp us-west1 g2-standard-4 4xCPU, 16GB, 1xL4 (24GB), 100GB (disk) yes $0.223804
...
Shown 3 of 193 offers, $5.876 max
Continue? [y/n]: y
⠙ Submitting run...
⠏ Launching spicy-treefrog-1 (pulling)
spicy-treefrog-1 provisioning completed (running)
Service is published at ...
配置完成后,您可以使用OpenAI SDK与模型进行交互:
代码
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.<gateway domain>",
api_key="<YOUR-DSTACK-SERVER-ACCESS-TOKEN>",
)
completion = client.chat.completions.create(
model="NousResearch/Llama-2-7b-chat-hf",
messages=[
{
"role": "user",
"content": "Compose a poem that explains the concept of recursion in programming.",
}
],
)
print(completion.choices[0].message.content)
注意
dstack会自动使用dstack的令牌在网关上处理身份验证。同时,如果您不想配置网关,可以配置dstack Task而不是Service。Task仅用于开发目的。如果您想了解更多关于如何使用dstack服务vLLM的实践材料,请查看此存储库。
