将归一化层的缩放操作融合到后续的线性层中。这对于确保归一化层和线性层之间的变换不变性很有用。
请注意,幺正变换(旋转)与归一化可交换,但与缩放不可交换
参数
-
norm
(Module) – -
linears
(Iterable[Linear]) –
源代码在 llmcompressor/modeling/fuse.py
| def fuse_norm_linears(norm: torch.nn.Module, linears: Iterable[torch.nn.Linear]):
"""
Fuse the scaling operation of norm layer into subsequent linear layers.
This useful for ensuring transform invariance between norm and linear layers.
Note that unitary transforms (rotation) commute with normalization, but not scaling
:param norm: norm layer whose weight will be fused into subsequent linears
:param linears: linear layers which directly follow the norm layer
"""
if not hasattr(norm, "weight"):
raise ValueError(f"Cannot fuse norm of type {type(norm)}")
for linear in linears:
# NOTE: spinquant does this op in float64
exec_device = get_execution_device(norm)
with align_module_device(norm, exec_device), align_module_device(
linear, exec_device
):
weight_dtype = linear.weight.dtype
new_weight = linear.weight.to(PRECISION) * norm.weight.to(PRECISION)
new_weight = new_weight.to(weight_dtype)
update_offload_parameter(linear, "weight", new_weight)
new_norm_weight = torch.ones_like(norm.weight, device="cpu")
update_offload_parameter(norm, "weight", new_norm_weight)
|