跳到内容

llmcompressor.observers.base

  • Observer

    用于在观察后计算量化参数的观察者基类

Observer

Observer(
    base_name: str,
    args: QuantizationArgs,
    module: Optional[Module] = None,
    **observer_kwargs,
)

Bases: InternalModule, RegistryMixin

用于在观察权重、激活或注意力状态后计算量化参数的观察者基类。

示例

module = ...
observer = Observer.load_from_registry(observer, base_name="weight", args=...)
module.global_scale = observer.get_global_scale(module.weight)
scales, zero_points = observer(module.weight)

参数

  • base_name

    (str) –

    用于命名观察器属性的字符串

  • args

    (QuantizationArgs) –

    用于校准和量化观测值的量化参数

  • module

    (Optional[Module], 默认值: None ) –

    可选模块,附带量化参数。此参数是利用现有 qparams(例如 global_scale 或 g_idx)所必需的

  • **observer_kwargs

    观察器初始化关键字参数

方法

  • forward

    根据观测值计算更新后的缩放因子和零点

  • get_global_min_max

    为了计算全局缩放因子而根据观测值计算最小值和最大值

  • get_global_scale

    根据观测值计算更新后的全局缩放因子

  • get_min_max

    根据观测值计算最小值和最大值

源代码位于 llmcompressor/observers/base.py
def __init__(
    self,
    base_name: str,
    args: QuantizationArgs,
    module: Optional[torch.nn.Module] = None,
    **observer_kwargs,
):
    super().__init__()
    self.module = ref(module) if module is not None else None
    self.base_name = base_name
    self.args = args

    # populate observer kwargs
    self.args.observer_kwargs = self.args.observer_kwargs or {}
    self.args.observer_kwargs.update(observer_kwargs)

forward

forward(observed: Tensor) -> ScaleZpTuple

根据观测值(权重、激活或注意力状态)计算更新后的缩放因子和零点。

参数

  • observed

    (Tensor) –

    被观察的值

返回

  • ScaleZpTuple

    校准后的缩放因子和零点

源代码位于 llmcompressor/observers/base.py
@torch.no_grad
def forward(self, observed: torch.Tensor) -> ScaleZpTuple:
    """
    Calculate updated scales and zero points from observed value
    (weight, activation, or attention state).

    :param observed: value being observed
    :return: calibrated scale and zero point
    """
    scales, zero_points, _min, _max = self._forward_with_minmax(observed)
    return (scales, zero_points)

get_global_min_max abstractmethod

get_global_min_max(observed: Tensor) -> MinMaxTuple

为了计算全局缩放因子而根据观测值计算最小值和最大值

参数

  • observed

    (Tensor) –

    形状为 (num_observations, 1, group_size) 的值

返回

  • MinMaxTuple

    最小值和最大值,其形状为 (1, )

源代码位于 llmcompressor/observers/base.py
@abstractmethod
def get_global_min_max(self, observed: torch.Tensor) -> MinMaxTuple:
    """
    Calculate min and max values from observed value for the purposes of
    global scale calculation

    :param observed: value of shape (num_observations, 1, group_size)
    :return: minimum value and maximum value whose shapes are (1, )
    """
    raise NotImplementedError()

get_global_scale

get_global_scale(observed: Tensor) -> torch.Tensor

根据观测值(权重、激活或注意力状态)计算更新后的全局缩放因子。

参数

  • observed

    (Tensor) –

    被观察的值

返回

  • Tensor

    校准后的全局参数

源代码位于 llmcompressor/observers/base.py
@torch.no_grad
def get_global_scale(self, observed: torch.Tensor) -> torch.Tensor:
    """
    Calculate updated global scale from observed value
    (weight, activation, or attention state).

    :param observed: value being observed
    :return: calibrated global parameter
    """
    global_scale, _min, _max = self._get_global_scale_with_minmax(observed)
    return global_scale

get_min_max abstractmethod

get_min_max(observed: Tensor) -> MinMaxTuple

根据观测值计算最小值和最大值

参数

  • observed

    (Tensor) –

    形状为 (num_observations, *qparam_shape, group_size) 的值

返回

  • MinMaxTuple

    最小值和最大值,其形状为 (*qparam_shape, )

源代码位于 llmcompressor/observers/base.py
@abstractmethod
def get_min_max(self, observed: torch.Tensor) -> MinMaxTuple:
    """
    Calculate min and max values from observed value

    :param observed: value of shape (num_observations, *qparam_shape, group_size)
    :return: minimum value and maximum value whose shapes are (*qparam_shape, )
    """
    raise NotImplementedError()