llmcompressor.modifiers.quantization.calibration

函数

calibrate_input_hook –

用于校准输入激活的钩子。
calibrate_output_hook –

用于校准输出激活的钩子。
freeze_module_quantization –

校准完成后删除观察者。
initialize_observer –

初始化观察者模块并将其作为子模块附加。
update_weight_zp_scale –

将层标记为可校准，从而激活观察者。

calibrate_activations

calibrate_activations(
    module: Module, value: Tensor, base_name: str
)

通过调用模块的附加观察者来校准输入或输出激活。

参数

module
(Module) –

torch.nn.Module
base_name
(str) –

用于获取观察者、比例和零点的子字符串。
值
(Tensor) –

将传递给观察者的 torch.Tensor。

源代码位于 llmcompressor/modifiers/quantization/calibration.py

def calibrate_activations(module: Module, value: torch.Tensor, base_name: str):
    """
    Calibrate input or output activations by calling the a module's attached
    observer.

    :param module: torch.nn.Module
    :param base_name: substring used to fetch the observer, scales, and zp
    :param value: torch.Tensor to be passed to the observer

    """
    # If empty tensor, can't update zp/scale
    # Case for MoEs
    if value.numel() == 0:
        return

    field_name = "input" if base_name != "output" else "output"  # input,q,k,v,output
    args_attr = f"quantization_scheme.{field_name}_activations"
    quantization_args = getattr_chain(module, args_attr, None)

    calculate_qparams = True
    calculate_gparam = False

    if quantization_args is not None:
        if quantization_args.dynamic in (True, DynamicType.LOCAL):
            calculate_qparams = False
        if quantization_args.strategy == QuantizationStrategy.TENSOR_GROUP:
            calculate_gparam = True

    call_observer(
        module=module,
        base_name=base_name,
        value=value,
        should_calculate_gparam=calculate_gparam,
        should_calculate_qparams=calculate_qparams,
    )

calibrate_input_hook

calibrate_input_hook(module: Module, args: Any)

用于校准输入激活的钩子。将调用观察者在模块的前向传递中应用输入 QDQ 之前更新比例/零点。

源代码位于 llmcompressor/modifiers/quantization/calibration.py

def calibrate_input_hook(module: Module, args: Any):
    """
    Hook to calibrate input activations.
    Will call the observers to update the scales/zp before applying
    input QDQ in the module's forward pass.
    """
    args = args[0] if isinstance(args, tuple) else args
    calibrate_activations(module, value=args, base_name="input")

calibrate_output_hook

calibrate_output_hook(
    module: Module, _args: Any, output: Tensor
)

用于校准输出激活的钩子。将在应用输出 QDQ 之前调用观察者更新比例/零点。

源代码位于 llmcompressor/modifiers/quantization/calibration.py

def calibrate_output_hook(module: Module, _args: Any, output: torch.Tensor):
    """
    Hook to calibrate output activations.
    Will call the observers to update the scales/zp before applying
    output QDQ.
    """
    calibrate_activations(
        module,
        value=output,
        base_name="output",
    )
    output = forward_quantize(
        module=module,
        value=output,
        base_name="output",
        args=module.quantization_scheme.output_activations,
    )
    return output

call_observer

call_observer(
    module: Module,
    base_name: str,
    value: Optional[Tensor] = None,
    should_calculate_gparam: bool = False,
    should_calculate_qparams: bool = True,
)

使用提供的值调用模块的附加输入/权重/输出观察者。使用观察者的返回值更新模块的比例和零点。

参数

module
(Module) –

torch.nn.Module
base_name
(str) –

用于获取观察者、比例和零点的子字符串。
值
(Optional[Tensor], 默认值: None ) –

将传递给观察者以进行激活的 torch.Tensor。如果 base_name 为“weight”，则将使用模块的权重张量。

源代码位于 llmcompressor/modifiers/quantization/calibration.py

def call_observer(
    module: Module,
    base_name: str,
    value: Optional[torch.Tensor] = None,
    should_calculate_gparam: bool = False,
    should_calculate_qparams: bool = True,
):
    """
    Call a module's attached input/weight/output observer using a provided value.
    Update the module's scale and zp using the observer's return values.

    :param module: torch.nn.Module
    :param base_name: substring used to fetch the observer, scales, and zp
    :param value: torch.Tensor to be passed to the observer for activations. If
        base_name is "weight", then the module's weight tensor will be used
    """
    with align_module_device(module):
        if value is None and base_name == "weight":
            value = module.weight
        observer: Observer = getattr(module, f"{base_name}_observer")

        if should_calculate_gparam:
            global_scale = observer.get_global_scale(value)
            update_offload_parameter(module, f"{base_name}_global_scale", global_scale)

        if should_calculate_qparams:
            scale, zero_point = observer(value)
            update_offload_parameter(module, f"{base_name}_scale", scale)
            if hasattr(module, f"{base_name}_zero_point"):
                update_offload_parameter(module, f"{base_name}_zero_point", zero_point)

freeze_module_quantization

freeze_module_quantization(module: Module)

校准完成后删除观察者。

使用 model.apply(freeze_module_quantization) 应用于整个模型。

参数

module
(Module) –

要冻结量化的模块。

源代码位于 llmcompressor/modifiers/quantization/calibration.py

def freeze_module_quantization(module: Module):
    """
    deletes observers when calibration is complete.

    apply to full model with `model.apply(freeze_module_quantization)`

    :param module: module to freeze quantization for
    """
    scheme = getattr(module, "quantization_scheme", None)
    if not scheme:
        # no quantization scheme nothing to do
        return

    if module.quantization_status == QuantizationStatus.FROZEN:
        # nothing to do, already frozen
        return

    # remove observers
    for name in ("input", "weight", "output", "q", "k", "v"):
        obs_name = f"{name}_observer"
        if hasattr(module, obs_name):
            delattr(module, obs_name)

    module.quantization_status = QuantizationStatus.FROZEN

initialize_observer

initialize_observer(module: Module, base_name: str)

初始化观察者模块并将其作为子模块附加。观察者的名称从 quantization_args 中获取。然后使用该名称从注册表中加载观察者并将其附加到模块。观察者的名称使用提供的 base_name。

参数

module
(Module) –

正在向其附加观察者的 torch.nn.Module。
base_name
(str) –

用于命名观察器属性的字符串

源代码位于 llmcompressor/modifiers/quantization/calibration.py

def initialize_observer(
    module: Module,
    base_name: str,
):
    """
    Initialize observer module and attach as submodule.
    The name of the observer is fetched from the quantization_args.
    The name is then used to load the observer from the registry and attached
    to the module. The name of the observer uses the base_name provided.

    :param module: torch.nn.Module that the observer is being attached to
    :param base_name: str used to name the observer attribute

    """
    if base_name == "weight":
        arg_name = "weights"
    elif base_name == "output":
        arg_name = "output_activations"
    else:  # input, q, k, v
        arg_name = "input_activations"

    args: QuantizationArgs = getattr_chain(
        module, f"quantization_scheme.{arg_name}", None
    )
    if args is not None and args.dynamic is not True:
        observer = Observer.load_from_registry(
            args.observer, base_name=base_name, args=args, module=module
        )
        module.register_module(f"{base_name}_observer", observer)

update_weight_zp_scale

update_weight_zp_scale(module: Module)

将层标记为可校准，从而在每次前向传递时激活观察者以更新比例和零点。

使用 model.apply(update_weight_zp_scale) 应用于整个模型。

参数

module
(Module) –

要设置为校准的模块。
quantize_weights_upfront
–

在校准开始时自动运行权重量化。

源代码位于 llmcompressor/modifiers/quantization/calibration.py

def update_weight_zp_scale(module: Module):
    """
    marks a layer as ready for calibration which activates observers
    to update scales and zero points on each forward pass

    apply to full model with `model.apply(update_weight_zp_scale)`

    :param module: module to set for calibration
    :param quantize_weights_upfront: whether to automatically
       run weight quantization at the start of calibration
    """
    if getattr_chain(module, "quantization_scheme.weights", None) is None:
        return

    if getattr(module, "quantization_status", None) != QuantizationStatus.CALIBRATION:
        logger.warning(
            "Attempting to calibrate weights of a module not in calibration mode"
        )

    call_observer(module=module, base_name="weight")

llmcompressor.modifiers.quantization.calibration

calibrate_activations

`module`

`base_name`

`值`

calibrate_input_hook

calibrate_output_hook

call_observer

`module`

`base_name`

`值`

freeze_module_quantization

`module`

initialize_observer

`module`

`base_name`

update_weight_zp_scale

`module`

`quantize_weights_upfront`