跳到内容

llmcompressor.modeling

用于压缩工作流程的模型准备和融合实用程序。

提供用于准备模型进行压缩的工具,包括层融合、模块准备和模型结构优化。处理预压缩转换和高效压缩所需的架构修改。

模块

函数

center_embeddings

center_embeddings(embedding: Module)

将每个嵌入的均值设为零

参数

  • embedding

    (Module) –

    包含要居中嵌入的嵌入模块

源代码在 llmcompressor/modeling/fuse.py
def center_embeddings(embedding: torch.nn.Module):
    """
    Shift each embedding to have a mean of zero

    :param embedding: embedding module containing embeddings to center
    """
    if not hasattr(embedding, "weight"):
        raise ValueError(f"Cannot fuse norm of type {type(embedding)}")

    with align_module_device(embedding):
        weight_dtype = embedding.weight.dtype
        weight = embedding.weight.to(PRECISION)
        new_weight = weight - weight.mean(dim=-1, keepdim=True)
        new_weight = new_weight.to(weight_dtype)

    update_offload_parameter(embedding, "weight", new_weight)

fuse_norm_linears

fuse_norm_linears(norm: Module, linears: Iterable[Linear])

将归一化层的缩放操作融合到后续的线性层中。这对于确保归一化层和线性层之间的变换不变性很有用。

请注意,幺正变换(旋转)与归一化可交换,但与缩放不可交换

参数

  • norm

    (Module) –

    归一化层,其权重将融合到后续的线性层中

  • linears

    (Iterable[Linear]) –

    紧跟在归一化层之后的线性层

源代码在 llmcompressor/modeling/fuse.py
def fuse_norm_linears(norm: torch.nn.Module, linears: Iterable[torch.nn.Linear]):
    """
    Fuse the scaling operation of norm layer into subsequent linear layers.
    This useful for ensuring transform invariance between norm and linear layers.

    Note that unitary transforms (rotation) commute with normalization, but not scaling

    :param norm: norm layer whose weight will be fused into subsequent linears
    :param linears: linear layers which directly follow the norm layer
    """
    if not hasattr(norm, "weight"):
        raise ValueError(f"Cannot fuse norm of type {type(norm)}")

    for linear in linears:
        # NOTE: spinquant does this op in float64
        exec_device = get_execution_device(norm)
        with align_module_device(norm, exec_device), align_module_device(
            linear, exec_device
        ):
            weight_dtype = linear.weight.dtype
            new_weight = linear.weight.to(PRECISION) * norm.weight.to(PRECISION)
            new_weight = new_weight.to(weight_dtype)

        update_offload_parameter(linear, "weight", new_weight)

    new_norm_weight = torch.ones_like(norm.weight, device="cpu")
    update_offload_parameter(norm, "weight", new_norm_weight)