llmcompressor.modifiers.pruning.sparsegpt.base

类

SparseGPTModifier –

用于对模型应用一次性 SparseGPT 算法的修饰符

SparseGPTModifier

Bases: SparsityModifierBase

用于对模型应用一次性 SparseGPT 算法的修饰符

示例 yaml

test_stage:
  obcq_modifiers:
    SparseGPTModifier:
      sparsity: 0.5
      mask_structure: "2:4"
      dampening_frac: 0.001
      block_size: 128
      targets: ['Linear']
      ignore: ['re:.*lm_head']

生命周期

on_initialize
- register_hook(module, calibrate_module, "forward")
on_sequential_batch_end
- sparsify_weight
on_finalize
- remove_hooks()

参数

稀疏度
–

模型压缩到的稀疏度
稀疏度配置文件
–

可以设置为“owl”以使用离群值加权分层稀疏度（OWL），更多信息可在论文 https://arxiv.org/pdf/2310.05175 中找到
掩码结构
–

定义要应用的掩码结构的字符串。必须是 N:M 形式，其中 N、M 是定义自定义块形状的整数。默认为 0:0，表示非结构化掩码。
owl_m
–

OWL 中使用的离群值数量
owl_lmbda
–

OWL 中使用的 Lambda 值
块大小
–

用于确定一次通过中要压缩的列数
阻尼分数
–

施加到 H 的阻尼量，作为对角范数的百分比
保留稀疏度掩码
–

在应用 sparsegpt 时是否保留稀疏度掩码，这在从以前修剪过的模型开始时很有用，默认为 False。
卸载Hessian矩阵
–

设置为 True 可减少内存使用但增加运行时。
sequential_targets
–

在 SparseGPT 期间要压缩的层名称列表，或“ALL”以压缩模型中的所有层。targets 的别名
targets
–

在 SparseGPT 期间要压缩的层名称列表，或“ALL”以压缩模型中的所有层。sequential_targets 的别名
ignore
–

可选的模块类名称或子模块名称列表，即使它们与目标匹配，也不进行量化。默认为空列表。

方法

calibrate_module –

校准钩子，用于累积模块输入的 Hessian 矩阵
compress_modules –

稀疏化已校准的模块

calibrate_module

calibrate_module(
    module: Module,
    args: Tuple[Tensor, ...],
    _output: Tensor,
)

校准钩子，用于累积模块输入的 Hessian 矩阵

参数

module
(Module) –

正在校准的模块
args
(Tuple[Tensor, ...]) –

模块的输入，其中第一个元素是规范输入
_输出
(Tensor) –

未压缩的模块输出，未使用

源代码位于 llmcompressor/modifiers/pruning/sparsegpt/base.py

def calibrate_module(
    self,
    module: torch.nn.Module,
    args: Tuple[torch.Tensor, ...],
    _output: torch.Tensor,
):
    """
    Calibration hook used to accumulate the hessian of the input to the module

    :param module: module being calibrated
    :param args: inputs to the module, the first element of which is the
        canonical input
    :param _output: uncompressed module output, unused
    """
    # Assume that the first argument is the input
    inp = args[0]

    # Initialize hessian if not present
    if module not in self._num_samples:
        device = get_execution_device(module)
        self._hessians[module] = make_empty_hessian(module, device=device)
        self._num_samples[module] = 0

    # Accumulate hessian with input with optional offloading
    with self._maybe_onload_hessian(module):
        self._hessians[module], self._num_samples[module] = accumulate_hessian(
            inp,
            module,
            self._hessians[module],
            self._num_samples[module],
        )

compress_modules

compress_modules()

稀疏化已校准的模块

源代码位于 llmcompressor/modifiers/pruning/sparsegpt/base.py

def compress_modules(self):
    """
    Sparsify modules which have been calibrated
    """
    for module in list(self._num_samples.keys()):
        name = self._module_names[module]
        sparsity = self._module_sparsities[module]
        num_samples = self._num_samples[module]

        logger.info(f"Sparsifying {name} using {num_samples} samples")
        with torch.no_grad(), align_module_device(module), CompressionLogger(
            module
        ) as comp_logger:
            loss, sparsified_weight = sparsify_weight(
                module=module,
                hessians_dict=self._hessians,
                sparsity=sparsity,
                prune_n=self._prune_n,
                prune_m=self._prune_m,
                block_size=self.block_size,
                dampening_frac=self.dampening_frac,
                preserve_sparsity_mask=self.preserve_sparsity_mask,
            )
            comp_logger.set_loss(loss)

        update_offload_parameter(module, "weight", sparsified_weight)

        # self._hessians[module] already deleted by sparsify_weight
        del self._num_samples[module]

llmcompressor.modifiers.pruning.sparsegpt.base

SparseGPTModifier

`稀疏度`

`稀疏度配置文件`

`掩码结构`

`owl_m`

`owl_lmbda`

`块大小`

`阻尼分数`

`保留稀疏度掩码`

`卸载Hessian矩阵`

`sequential_targets`

`targets`

`ignore`

calibrate_module

`module`

`args`

`_输出`

compress_modules