SequentialLlama4TextMoe(
original: Llama4TextMoe,
config: Llama4Config,
calibrate_all_experts: bool = True,
)
基类: MoECalibrationModule
Llama4TextMoe 的校准版本,它会解包专家以进行顺序处理。
此模块:1. 解包已打包的专家权重(3D -> 2D)以进行校准 2. 在校准期间可选地将所有 token 发送给所有专家 3. 保持解包状态(永久)以兼容 vLLM
源代码在 llmcompressor/modeling/llama4.py
| def __init__(
self,
original: Llama4TextMoe,
config: Llama4Config,
calibrate_all_experts: bool = True,
):
super().__init__()
# Extract text config from multimodal config
text_config: Llama4TextConfig = config.get_text_config()
self.top_k = text_config.num_experts_per_tok
self.hidden_dim = text_config.hidden_size
self.num_experts = text_config.num_local_experts
self.experts = SequentialLlama4TextExperts(text_config, original.experts)
self.router = original.router
self.shared_expert = original.shared_expert
self.calibrate_all_experts = calibrate_all_experts
|