vllm_gaudi.attention.ops.hpu_paged_attn ¶
HPUPageAttentionInputBuilderBase 数据类 ¶
HPUPagedAttention ¶
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
copy_blocks 静态方法 ¶
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
get_kv_cache_shape 静态方法 ¶
get_supported_head_sizes 静态方法 ¶
split_kv_cache 静态方法 ¶
supports_attn_type 类方法 ¶
CPU attention 支持 decoder 和 encoder-only attention。
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
swap_blocks 静态方法 ¶
swap_blocks(
src_kv_cache: tuple[Tensor, Tensor],
dst_kv_cache: tuple[Tensor, Tensor],
src_to_dsts: Tensor,
) -> None
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
write_to_paged_cache 静态方法 ¶
write_to_paged_cache(
key: Tensor,
value: Tensor,
key_cache: Tensor,
value_cache: Tensor,
slot_mapping: Tensor,
kv_cache_dtype: str,
is_prompt: bool,
) -> None
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
HPUPagedAttentionMetadata 数据类 ¶
PagedAttention 的元数据。
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
HPUPagedAttentionMetadataBuilder 数据类 ¶
Bases: AttentionMetadataBuilder