跳到内容

llmcompressor.modeling.llama4

SequentialLlama4TextMoe

SequentialLlama4TextMoe(
    original: Llama4TextMoe,
    config: Llama4Config,
    calibrate_all_experts: bool = True,
)

基类: MoECalibrationModule

Llama4TextMoe 的校准版本,它会解包专家以进行顺序处理。

此模块:1. 解包已打包的专家权重(3D -> 2D)以进行校准 2. 在校准期间可选地将所有 token 发送给所有专家 3. 保持解包状态(永久)以兼容 vLLM

源代码在 llmcompressor/modeling/llama4.py
def __init__(
    self,
    original: Llama4TextMoe,
    config: Llama4Config,
    calibrate_all_experts: bool = True,
):
    super().__init__()
    # Extract text config from multimodal config
    text_config: Llama4TextConfig = config.get_text_config()
    self.top_k = text_config.num_experts_per_tok
    self.hidden_dim = text_config.hidden_size
    self.num_experts = text_config.num_local_experts

    self.experts = SequentialLlama4TextExperts(text_config, original.experts)
    self.router = original.router
    self.shared_expert = original.shared_expert
    self.calibrate_all_experts = calibrate_all_experts