vllm_gaudi.v1.attention.backends.hpu_attn ¶
HPUAttentionBackendV1 ¶
源代码位于 vllm_gaudi/v1/attention/backends/hpu_attn.py
HPUAttentionMetadataV1 dataclass ¶
HPUAttentionbackend 的元数据。
源代码位于 vllm_gaudi/v1/attention/backends/hpu_attn.py
__init__ ¶
__init__(
block_list: Optional[Tensor],
block_mapping: Optional[Tensor],
block_usage: Optional[Tensor],
block_groups: Optional[Tensor],
alibi_blocks: Optional[Tensor],
is_prompt: bool,
block_size: int,
slot_mapping: Tensor,
attn_bias: Optional[Tensor],
seq_lens_tensor: Optional[Tensor],
context_lens_tensor: Optional[Tensor],
input_positions: Tensor,
seq_lens: Optional[list[int]] = None,
encoder_seq_lens: Optional[list[int]] = None,
encoder_seq_lens_tensor: Optional[Tensor] = None,
max_encoder_seq_len: Optional[int] = None,
cross_block_list: Optional[Tensor] = None,
cross_slot_mapping: Optional[Tensor] = None,
cross_block_mapping: Optional[Tensor] = None,
cross_block_groups: Optional[Tensor] = None,
cross_block_usage: Optional[Tensor] = None,
cross_attn_bias: Optional[Tensor] = None,
window_block_list: Optional[Tensor] = None,
window_slot_mapping: Optional[Tensor] = None,
window_block_mapping: Optional[Tensor] = None,
window_block_groups: Optional[Tensor] = None,
window_block_usage: Optional[Tensor] = None,
window_attn_bias: Optional[Tensor] = None,
chunked_slot_mapping: Optional[Tensor] = None,
chunked_attn_bias: Optional[Tensor] = None,
chunked_block_mapping: Optional[Tensor] = None,
chunked_block_list: Optional[Tensor] = None,
chunked_block_groups: Optional[Tensor] = None,
chunked_block_usage: Optional[Tensor] = None,
query_start_loc: Optional[Tensor] = None,
) -> None
make_decode_metadata classmethod ¶
make_decode_metadata(
block_list,
block_usage,
block_groups,
input_positions,
slot_mapping,
block_size,
window_block_list,
window_block_usage,
window_block_groups,
chunked_block_list,
chunked_block_usage,
chunked_block_groups,
query_start_loc=None,
)
源代码位于 vllm_gaudi/v1/attention/backends/hpu_attn.py
make_prefill_metadata classmethod ¶
make_prefill_metadata(
attn_bias,
block_list,
context_lens_tensor,
seq_lens_tensor,
slot_mapping,
block_size,
query_start_loc=None,
)